CLAIM: An Enhanced Machine Learning Technique for Discrepancy Report Analysis

Autor:	Myron Hecht, Phanitta Chomsinsap, J.C. Chen
Rok vydání:	2020
Předmět:	0209 industrial biotechnology Training set Computer science business.industry Small number 02 engineering and technology Machine learning computer.software_genre Imbalanced data Scheduling (computing) Software development process Subject-matter expert 020901 industrial engineering & automation 0202 electrical engineering electronic engineering information engineering Domain knowledge Labeled data 020201 artificial intelligence & image processing Artificial intelligence business computer
Zdroj:	2020 Annual Reliability and Maintainability Symposium (RAMS).
DOI:	10.1109/rams48030.2020.9153691
Popis:	CLAIM is a tool for analyzing and classifying discrepancy reports that allows for the incorporation of domain expert knowledge into a semi-supervised machine learning (ML) process (a semi-supervised learning uses a small number of manually labeled data and a much larger amount of unlabeled for training a machine learning algorithm). By using this domain knowledge, classification accuracy is higher than conventional ML approaches. The advantages are particularly apparent with small, imbalanced data sets that are quite common in discrepancy report data sets (an imbalanced data set has an unequal distribution of documents categories within each category). The CLAIM method is robust against human bias and can tolerate misclassifications of up to 20% of the training set. The increased accuracy of the CLAIM methodology makes ML a viable tool for safety, reliability, and software development process decision making. The modest human labor requirement enables use of the method under circumstances that previously made free text discrepancy report analysis infeasible due to resource, scheduling, and cost constraints.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::bf01e8a085654934119ef29f122ded34 https://doi.org/10.1109/rams48030.2020.9153691 Zobrazit plný text záznamu