Fire Emergency Detection from Twitter Using Supervised Principal
Autor: | Mohammed Ahsan Raza Noori, Ritika Mehra |
---|---|
Rok vydání: | 2020 |
Předmět: |
Clustering high-dimensional data
050210 logistics & transportation business.industry Computer science Dimensionality reduction 05 social sciences Decision tree Pattern recognition 02 engineering and technology Support vector machine Statistical classification ComputingMethodologies_PATTERNRECOGNITION 0502 economics and business Principal component analysis 0202 electrical engineering electronic engineering information engineering Vector space model Unsupervised learning 020201 artificial intelligence & image processing Artificial intelligence business |
Zdroj: | ICIIS |
Popis: | Principal Component Analysis (PCA) is primarily a dimensionality reduction technique used in the area of unsupervised machine learning, while the use of PCA in the area of supervised machine learning is still in progress. In the field of supervised event detection from social media, PCA is not well explored by the researchers to avoid the curse of high dimensionality produced by the Vector Space Model (VSM). In this work, we proposed a supervised event detection system, which detect the occurrence of fire emergency from Twitter streaming data in near real-time using supervised PCA as a dimensional reduction technique. Our aim is to find the minimum number of Principal Components (PC’s) that can contribute towards achieving the highest classification performance. We used three machine learning algorithms for classification, Logistic Regression (LR), Support Vector Machine (SVM) and Decision Tree (DT). The performance of these algorithms in conjunction with their corresponding PC’s has been compared. Our experimental study has shown that LR outperforms the other two algorithms and achieves the highest accuracy of 91% using 710 PC’s out of 1,000 dimensions. From the results, LR as a classifier is used to build the actual system. To process high dimensional data in batch as well as in near real-time we used Apache Spark framework. |
Databáze: | OpenAIRE |
Externí odkaz: |