Compression-Based Algorithms for Deception Detection
Autor: | Travis L. Bauer, Christina L. Ting, Andrew Fisher |
---|---|
Rok vydání: | 2017 |
Předmět: |
021110 strategic
defence & security studies business.industry Computer science media_common.quotation_subject 0211 other engineering and technologies Conditional probability 02 engineering and technology Prediction by partial matching Deception Arithmetic coding Analytics 020204 information systems Normalized compression distance 0202 electrical engineering electronic engineering information engineering Stylometry business Algorithm Classifier (UML) media_common |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783319672168 SocInfo (1) |
Popis: | In this work we extend compression-based algorithms for deception detection in text. In contrast to approaches that rely on theories for deception to identify feature sets, compression automatically identifies the most significant features. We consider two datasets that allow us to explore deception in opinion (content) and deception in identity (stylometry). Our first approach is to use unsupervised clustering based on a normalized compression distance (NCD) between documents. Our second approach is to use Prediction by Partial Matching (PPM) to train a classifier with conditional probabilities from labeled documents, followed by arithmetic coding (AC) to classify an unknown document based on which label gives the best compression. We find a significant dependence of the classifier on the relative volume of training data used to build the conditional probability distributions of the different labels. Methods are demonstrated to overcome the data size-dependence when analytics, not information transfer, is the goal. Our results indicate that deceptive text contains structure statistically distinct from truthful text, and that this structure can be automatically detected using compression-based algorithms. |
Databáze: | OpenAIRE |
Externí odkaz: |