Separation of pulsar signals from noise with supervised machine learning algorithms

Autor: Shantanu Desai, Suryarao Bethapudi
Rok vydání: 2017
Předmět:
FOS: Physical sciences
02 engineering and technology
Machine learning
computer.software_genre
01 natural sciences
Electromagnetic interference
Pulsar
0103 physical sciences
0202 electrical engineering
electronic engineering
information engineering

AdaBoost
010303 astronomy & astrophysics
Instrumentation and Methods for Astrophysics (astro-ph.IM)
Feature ranking
High Energy Astrophysical Phenomena (astro-ph.HE)
Artificial neural network
business.industry
Astronomy and Astrophysics
Perceptron
Computer Science Applications
Space and Planetary Science
020201 artificial intelligence & image processing
Gradient boosting
Artificial intelligence
business
Astrophysics - Instrumentation and Methods for Astrophysics
Astrophysics - High Energy Astrophysical Phenomena
Algorithm
Classifier (UML)
computer
DOI: 10.48550/arxiv.1704.04659
Popis: We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value.
Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computing
Databáze: OpenAIRE