Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding

Autor:	Nathaniel Huber-Fliflet, Haozhen Zhao, Katie Jensen, Christian J. Mahoney, Shi Ye, Robert Neary
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences Predictive coding Computer Science - Artificial Intelligence business.industry Process (engineering) Computer science 05 social sciences Terabyte Machine learning computer.software_genre Computer Science - Information Retrieval 0506 political science Set (abstract data type) Artificial Intelligence (cs.AI) Component (UML) 0502 economics and business 050602 political science & public administration Artificial intelligence 050207 economics business computer Information Retrieval (cs.IR) Selection (genetic algorithm)
Zdroj:	IEEE BigData
DOI:	10.1109/bigdata.2018.8622075
Popis:	Training documents have a significant impact on the performance of predictive models in the legal domain. Yet, there is limited research that explores the effectiveness of the training document selection strategy - in particular, the strategy used to select the seed set, or the set of documents an attorney reviews first to establish an initial model. Since there is limited research on this important component of predictive coding, the authors of this paper set out to identify strategies that consistently perform well. Our research demonstrated that the seed set selection strategy can have a significant impact on the precision of a predictive model. Enabling attorneys with the results of this study will allow them to initiate the most effective predictive modeling process to comb through the terabytes of data typically present in modern litigation. This study used documents from four actual legal cases to evaluate eight different seed set selection strategies. Attorneys can use the results contained within this paper to enhance their approach to predictive coding. Comment: 2018 IEEE International Conference on Big Data
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::eb81cc3b1991ea2f2f1e913e6351ddcb https://doi.org/10.1109/bigdata.2018.8622075 Zobrazit plný text záznamu