Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
7257873
Reference Type
Journal Article
Title
Comments on "Researcher Bias: The Use of Machine Learning in Software Defect Prediction"
Author(s)
Tantithamthavorn, C; Mcintosh, S; Hassan, AE; Matsumoto, K; ,
Year
2016
Is Peer Reviewed?
Yes
Journal
IEEE Transactions on Software Engineering
ISSN:
0098-5589
Publisher
IEEE COMPUTER SOC
Location
LOS ALAMITOS
Page Numbers
1092-1094
DOI
10.1109/TSE.2016.2553030
Web of Science Id
WOS:000388866100006
Abstract
Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.'s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the research group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity