Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

Autor: Aren Jansen, Rashmi Sankepally, Jiaul H. Paik, Douglas W. Oard, Jerome White
Rok vydání: 2015
Předmět:
Zdroj: HLT-NAACL
DOI: 10.3115/v1/n15-1061
Popis: Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.
Databáze: OpenAIRE