Zobrazeno 1 - 10
of 77
pro vyhledávání: '"Frank Wm. Tompa"'
Publikováno v:
ACM Transactions on Computing for Healthcare. 3:1-28
The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030852504
CLEF
CLEF
Mathematical Information Retrieval (MathIR) focuses on using mathematical formulas and terminology to search and retrieve documents that include mathematical content. To index mathematical documents, we convert each formula into a token list that is
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::69bdedd44bed37a9c55a2b2bd029a64f
https://doi.org/10.1007/978-3-030-85251-1_16
https://doi.org/10.1007/978-3-030-85251-1_16
Autor:
Frank Wm. Tompa, Besat Kassaie
Publikováno v:
DocEng
When information extraction programs (extractors) are applied to documents, they create relations that store facts found in the documents. In this work, we formalize and address the problem of keeping such extracted relations consistent with source d
Publikováno v:
Journal of Information Science. 45:443-459
Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools for detecting cross-lingual plagiarism. Given a suspici
Autor:
Besat Kassaie, Frank Wm. Tompa
Publikováno v:
DocEng
Information extraction programs (extractors) can be applied to documents to isolate structured versions of some content, that is, to create tabular records corresponding to facts found in the documents. If the data in an extracted table needs to be u
Autor:
Frank Wm. Tompa
Publikováno v:
DocEng
Scholarship in the humanities often requires the ability to search curated electronic corpora and to display search results in a variety of formats. Challenges that need to be addressed include transforming the texts into a suitable form, typically X
Publikováno v:
DocEng
Combining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for Tangent-3 on the math and text retrieval task at NTCIR-12, for example, have room for improvement, even though
Autor:
Andrew Kane, Frank Wm. Tompa
Publikováno v:
SIGIR
We examine search engine performance for rank-safe query execution using the WAND and state-of-the-art BMW algorithms. Supported by extensive experiments, we suggest two approaches to improve query performance: initial list thresholds should be used
Autor:
Grzegorz Drzadzewski, Frank Wm. Tompa
Publikováno v:
Knowledge and Information Systems. 47:697-732
The New York Times Annotated Corpus, the ACM Digital Library, and PubMed are three prototypical examples of document collections in which each document is tagged with keywords or phrases. Such collections can be viewed as high-dimensional document cu
Autor:
Frank Wm. Tompa, Andrew Kane
Publikováno v:
DocEng
A disk-based search system distributes a large index across multiple disks on one or more machines, where documents are typically assigned to disks at random in order to achieve load balancing. However, random distribution degrades clustering, which