Daniel Rodríguez

Software Defect Prediction in Software Engineering

In this talk, we present how data mining techniques are used to predict or rank error-prone modules in software engineering. Classifying or ranking software components according to their probability of being defective helps with the testing and maintenance phases of a project to, for example, allocate resources, prioritising modules to be tested or perform regression testing activities. We will review the publicly available datasets, machine learning techniques and some of the machine learning problems that we face during the process. On the one hand, from software engineering point of view, we need to deal with data quality and what software engineering metrics can used. On the other hand, form the data mining point of view, we may need to deal with feature selection and issues such noise and missing values, imbalanced data, and the evaluation measures of the machine learning algorithms and their comparison.