DART: Open-Domain Structured Data Record to Text Generation
Autor: | Xiangru Tang, Ankit Gupta, Rui Zhang, Nadia Irwanto, Nazneen Fatema Rajani, Amrit Rau, Abhinand Sivaprasad, Richard Socher, Chiachun Hsieh, Linyong Nan, Neha Verma, Aadit Vyas, Xi Victoria Lin, Yangxiaokang Liu, Yasin Tarabar, Jessica Pan, Dragomir R. Radev, Tao Yu, Faiaz Rahman, Caiming Xiong, Yi Chern Tan, Mutethia Mutuma, Pranav Krishna, Ahmad Zaidi |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Dart Parsing Information retrieval Computer Science - Computation and Language Computer science 02 engineering and technology Predicate (mathematical logic) 010501 environmental sciences Ontology (information science) computer.software_genre 01 natural sciences Annotation Tree (data structure) TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS 0202 electrical engineering electronic engineering information engineering Table (database) 020201 artificial intelligence & image processing computer Computation and Language (cs.CL) Sentence 0105 earth and related environmental sciences computer.programming_language |
Zdroj: | NAACL-HLT |
Popis: | We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart. NAACL 2021 |
Databáze: | OpenAIRE |
Externí odkaz: |