CLAIM: An Enhanced Machine Learning Technique for Discrepancy Report Analysis
Autor: | Myron Hecht, Phanitta Chomsinsap, J.C. Chen |
---|---|
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
Training set Computer science business.industry Small number 02 engineering and technology Machine learning computer.software_genre Imbalanced data Scheduling (computing) Software development process Subject-matter expert 020901 industrial engineering & automation 0202 electrical engineering electronic engineering information engineering Domain knowledge Labeled data 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | 2020 Annual Reliability and Maintainability Symposium (RAMS). |
DOI: | 10.1109/rams48030.2020.9153691 |
Popis: | CLAIM is a tool for analyzing and classifying discrepancy reports that allows for the incorporation of domain expert knowledge into a semi-supervised machine learning (ML) process (a semi-supervised learning uses a small number of manually labeled data and a much larger amount of unlabeled for training a machine learning algorithm). By using this domain knowledge, classification accuracy is higher than conventional ML approaches. The advantages are particularly apparent with small, imbalanced data sets that are quite common in discrepancy report data sets (an imbalanced data set has an unequal distribution of documents categories within each category). The CLAIM method is robust against human bias and can tolerate misclassifications of up to 20% of the training set. The increased accuracy of the CLAIM methodology makes ML a viable tool for safety, reliability, and software development process decision making. The modest human labor requirement enables use of the method under circumstances that previously made free text discrepancy report analysis infeasible due to resource, scheduling, and cost constraints. |
Databáze: | OpenAIRE |
Externí odkaz: |