Event Veridicality in Chinese

Autor: Yu-Yun Chang, 張瑜芸
Rok vydání: 2019
Druh dokumentu: 學位論文 ; thesis
Popis: 107
The central goal of this dissertation is to build a Chinese corpus annotated with readers’ veridicality judgments to news events (Chinese PragBank), and find out specific linguistic features for the machine learning models to predict veridicality automatically. Readers'' veridicality judgments are whether readers view an event described in a sentence as happening or not. For instance, in "The FBI alleged in court documents that Zazi had admitted having a handwritten recipe for explosives on his computer", do people believe that Zazi had a handwritten recipe for explosives? On the other hand, what do people infer if the sentence is "According to the FBI agents, there is relatively little evidence that Zazi had a handwritten recipe for explosives"? Automatically classifying veridicality of events is important to swift through the ever growing amount of information appearing online. However, most information extraction systems nowadays work roughly at the clause level, and would extract that "Zazi had a handwritten recipe for explosives" in both sentences given above. This dissertation aims at a better understanding and characterization of the context in which events are embedded, and how the context leads to human judgments of event veridicality. Currently, there is a veridicality dataset for English (English PragBank) but not for Chinese. Having built the Chinese corpus, it can be used to explore specific linguistic features in Chinese texts, and implement the features into machine learning models, Maximum Entropy (MaxEnt) model and Convolutional Neural Network (CNN) model. The goal is to explore how linguistic cues derived from theories can assist models in learning pragmatically, and whether there are any differences between English and Chinese readers while making veridicality judgments to news events. It is investigated that English and Chinese readers behave differently in some linguistic features. For example, if the speaker of an event is an authority (e.g., "The White House" or "The Judge"), Chinese speakers in Taiwan have lower confidence in believing the event happened, compared to English speakers. Other features (e.g., modality markers, tense and aspect, and statistic numbers) presents distinctions as well. While applying features into model training, the evaluation results report that deep learning models particularly trained on data with linguistic features have higher performance.
Databáze: Networked Digital Library of Theses & Dissertations