Popis: |
Factoid questions are questions that require short fact-based answers. Automatic generation (AQG) of factoid questions from a given text can contribute to educational activities, interactive question answering systems, search engines, and other applications. The goal of our research is to generate factoid source-question-answer triplets based on a specific domain. We propose a four-component pipeline, which obtains as input a training corpus of domain-specific documents, along with a set of declarative sentences from the same domain, and generates as output a set of factoid questions that refer to the source sentences but are slightly different from them, so that a question-answering system or a person can be asked a question that requires a deeper understanding and knowledge than a simple word-matching. Contrary to existing domain-specific AQG systems that utilize the template-based approach to question generation, we propose to transform each source sentence into a set of questions by applying a series of domain-independent rules (a syntactic-based approach). Our pipeline was evaluated in the domain of cyber security using a series of experiments on each component of the pipeline separately and on the end-to-end system. The proposed approach generated a higher percentage of acceptable questions than a prior state-of-the-art AQG system. |