您现在的位置: 学院首页 > 通知公告 > 正文



2019年02月25日 16:12  点击:[]


题目:An Approach for Validating the Quality of Datasets for Machine Learning





There are basically two ways for improving the accuracy of machine learning: building relevant machine learning models, and providing high quality datasets for training the models. Significant efforts have been made in designing powerful machine learning models. Furthermore, many open-source datasets have been created for machine learning research. However, research on assessing the impact of the quality of a dataset on the accuracy of a machine learning system has not received attention. In this paper, we present an experimental study to show how the quality of datasets impact the accuracy of machine learning models. We discovered a common problem in datasets that could greatly impact the accuracy of machine learning. This problem could also exist in many other machine learning systems, especially those that are developed using crowd-sourced datasets. This problem is difficult to detect using traditional validation approaches. We propose a novel technique based on metamorphic testing for validating a machine learning system together with its training and testing data. The key to metamorphic testing is to create tests that will adequately test the system. We propose an approach for creating such tests. The effectiveness of the proposed approach is demonstrated through a case study of automated classification of biological cell images.


丁俊华:美国北德克萨斯大学(University of North Texas)信息科学系教授1994毕业于中国地质大学,1997获南京大学计算机科学硕士学位20002004年分别获美国佛罗里达国际大学计算机科学硕士和博士学位。曾在美国贝克曼库尔特公司、美国强生公司工作,2007年后,在美国东卡大学计算机科学系和北德克萨斯大学从事科研和教学工作。发表论文70多篇,编写专著2本,主持美国自然科学基金6,在数据和信息质量分析、软件安全、计算法律和信息检索上具有丰富的经验。现为IEEE Transactions on Software EngineeringIEEE TSMC-A等杂志审稿人Information and Software Technology杂志编辑,美国自然科学基金评审人。欢迎全校对机器学习感兴趣的师生参加。

上一条:湖南理工学院2019年硕士研究生入学考试参加复试学生名单公示 下一条:2019 年湖南理工学院物理与电子科学学院高层次人才招聘公告