Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

Autor:	Mark Goadrich, Jude W. Shavlik, Louis Oliphant
Rok vydání:	2006
Předmět:	Theoretical computer science business.industry Computer science Dimension (graph theory) computer.software_genre Machine learning Field (computer science) Domain (software engineering) First-order logic Information extraction Inductive logic programming Artificial Intelligence Search algorithm Artificial intelligence business Precision and recall computer Software
Zdroj:	Machine Learning. 64:231-261
ISSN:	1573-0565 0885-6125
DOI:	10.1007/s10994-006-8958-3
Popis:	Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least L of these K clauses" thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4cf45edc4030f20d4e7c7138ce17b37e https://doi.org/10.1007/s10994-006-8958-3 Zobrazit plný text záznamu Full text from SpringerLink