CLAIM: An Enhanced Machine Learning Technique for Discrepancy Report Analysis

Autor: Myron Hecht, Phanitta Chomsinsap, J.C. Chen
Rok vydání: 2020
Předmět:
Zdroj: 2020 Annual Reliability and Maintainability Symposium (RAMS).
DOI: 10.1109/rams48030.2020.9153691
Popis: CLAIM is a tool for analyzing and classifying discrepancy reports that allows for the incorporation of domain expert knowledge into a semi-supervised machine learning (ML) process (a semi-supervised learning uses a small number of manually labeled data and a much larger amount of unlabeled for training a machine learning algorithm). By using this domain knowledge, classification accuracy is higher than conventional ML approaches. The advantages are particularly apparent with small, imbalanced data sets that are quite common in discrepancy report data sets (an imbalanced data set has an unequal distribution of documents categories within each category). The CLAIM method is robust against human bias and can tolerate misclassifications of up to 20% of the training set. The increased accuracy of the CLAIM methodology makes ML a viable tool for safety, reliability, and software development process decision making. The modest human labor requirement enables use of the method under circumstances that previously made free text discrepancy report analysis infeasible due to resource, scheduling, and cost constraints.
Databáze: OpenAIRE