A non-parametric cutout index for robust evaluation of identified proteins.

Autor: Serang O; Department of Neurobiology, Harvard Medical School, Boston, MA, USA., Paulo J, Steen H, Steen JA
Jazyk: angličtina
Zdroj: Molecular & cellular proteomics : MCP [Mol Cell Proteomics] 2013 Mar; Vol. 12 (3), pp. 807-12. Date of Electronic Publication: 2013 Jan 04.
DOI: 10.1074/mcp.O112.022863
Abstrakt: This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems.
Databáze: MEDLINE