Data reduction for serial crystallography using a robust peak finder
Autor: | Anton Barty, Romain Letrun, Marco Kloos, Dominik Oberthuer, Dana Komadina, W. Brehm, Luca Gelisio, Adrian P. Mancuso, Brian Abbey, M. Galchenkova, Alireza Sadri, Henry N. Chapman, Grant Mills, Oleksandr Yefanov, Henry Kirkwood, Raphael de Wijn, Connie Darmanin, Marjan Hadian-Jazi, Mohammad Vakili |
---|---|
Rok vydání: | 2021 |
Předmět: |
0303 health sciences
Data processing Discretization Computer science Detector Robust statistics 02 engineering and technology 021001 nanoscience & nanotechnology Research Papers General Biochemistry Genetics and Molecular Biology 3. Good health Background noise 03 medical and health sciences Crystallography Bragg peak finding robust statistics ddc:540 Outlier data reduction Probability distribution serial crystallography 0210 nano-technology 030304 developmental biology Data reduction |
Zdroj: | Journal of applied crystallography 54(5), 1360-1378 (2021). doi:10.1107/S1600576721007317 Journal of Applied Crystallography |
ISSN: | 1600-5767 |
Popis: | This article focuses on the challenges of hit finding and data reduction in serial crystallography (SX). An effective and reliable Bragg-peak-finding method, called robust peak finder (RPF), has been developed. RPF is based on the principle of robust statistics and can be used for SX data analysis. A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of ‘robust statistics’ has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or ‘vetoing’ of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time. |
Databáze: | OpenAIRE |
Externí odkaz: |