OTHR-07. A new framework for missing value tolerant data integration

Autor: Hannah Voß, Simon Schlumbohm, Marcus Wurlitzer, Matthias Dottermusch, Philipp Neumann, Philip Barwikowski, Hartmut Schlüter, Christoph Krisp, Julia Neumann
Rok vydání: 2022
Předmět:
Zdroj: Neuro-Oncology. 24:i148-i148
ISSN: 1523-5866
1522-8517
DOI: 10.1093/neuonc/noac079.546
Popis: Dataset integration is common practice to overcome limitations, e.g., in statistically underpowered omics datasets. This is of particular importance when analyzing rare tumor entities. However, combining datasets leads to the introduction of biases, so called 'batch effects', which are due to differences in quantification techniques, laboratory equipment or used tissue type. A common problem is the missing quantification for features like gene transcripts or proteins within a dataset. These missing values can appear at random in a given dataset and also get introduced by combination of multiple datasets. Currently, strategies beyond common normalization for batch effect reduction are either missing entirely or are unable to handle absence of data points and therefore rely on error-prone data imputation. We introduce a framework that enables batch effect adjustments for combined datasets while avoiding data loss by appropriately handling missing values without imputation. The underlying idea is based on a matrix dissection approach, adjusting common information from the integrated dataset under guarantee of sufficient data presence. The strategy is implemented within the R environment and linked with popular software stacks that are built on top of R. Successful data adjustment is exemplarily shown for proteomic data generated by different quantification approaches and LC-MS/MS instrumentation setups.
Databáze: OpenAIRE