Separation of pulsar signals from noise with supervised machine learning algorithms
Autor: | Shantanu Desai, Suryarao Bethapudi |
---|---|
Rok vydání: | 2017 |
Předmět: |
FOS: Physical sciences
02 engineering and technology Machine learning computer.software_genre 01 natural sciences Electromagnetic interference Pulsar 0103 physical sciences 0202 electrical engineering electronic engineering information engineering AdaBoost 010303 astronomy & astrophysics Instrumentation and Methods for Astrophysics (astro-ph.IM) Feature ranking High Energy Astrophysical Phenomena (astro-ph.HE) Artificial neural network business.industry Astronomy and Astrophysics Perceptron Computer Science Applications Space and Planetary Science 020201 artificial intelligence & image processing Gradient boosting Artificial intelligence business Astrophysics - Instrumentation and Methods for Astrophysics Astrophysics - High Energy Astrophysical Phenomena Algorithm Classifier (UML) computer |
DOI: | 10.48550/arxiv.1704.04659 |
Popis: | We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value. Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computing |
Databáze: | OpenAIRE |
Externí odkaz: |