knnAUC: an open-source R package for detecting nonlinear dependence between one continuous variable and one binary variable
Autor: | Weichen Zhou, Jiucun Wang, Yi Wang, Zhenghong Yuan, Li Jin, Xiaoyu Liu, Meng Hao, Yanyun Ma, Yin Yao Shugart, Jie Liu, Momiao Xiong, Yi Li |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
AUC media_common.quotation_subject Binary number Nonlinear dependence Sample (statistics) Association analysis lcsh:Computer applications to medicine. Medical informatics Biochemistry Statistical power 03 medical and health sciences Structural Biology Classifier (linguistics) Cluster Analysis Humans lcsh:QH301-705.5 Molecular Biology Mathematics media_common Variables One binary dependent variable Receiver operating characteristic Sequence Analysis RNA Applied Mathematics R package Computational Biology One continuous variable Estimator Open source Computer Science Applications Nonlinear system 030104 developmental biology lcsh:Biology (General) lcsh:R858-859.7 Algorithm Software |
Zdroj: | BMC Bioinformatics, Vol 19, Iss 1, Pp 1-12 (2018) BMC Bioinformatics |
ISSN: | 1471-2105 |
DOI: | 10.1186/s12859-018-2427-4 |
Popis: | Background Testing the dependence of two variables is one of the fundamental tasks in statistics. In this work, we developed an open-source R package (knnAUC) for detecting nonlinear dependence between one continuous variable X and one binary dependent variables Y (0 or 1). Results We addressed this problem by using knnAUC (k-nearest neighbors AUC test, the R package is available at https://sourceforge.net/projects/knnauc/). In the knnAUC software framework, we first resampled a dataset to get the training and testing dataset according to the sample ratio (from 0 to 1), and then constructed a k-nearest neighbors algorithm classifier to get the yhat estimator (the probability of y = 1) of testy (the true label of testing dataset). Finally, we calculated the AUC (area under the curve of receiver operating characteristic) estimator and tested whether the AUC estimator is greater than 0.5. To evaluate the advantages of knnAUC compared to seven other popular methods, we performed extensive simulations to explore the relationships between eight different methods and compared the false positive rates and statistical power using both simulated and real datasets (Chronic hepatitis B datasets and kidney cancer RNA-seq datasets). Conclusions We concluded that knnAUC is an efficient R package to test non-linear dependence between one continuous variable and one binary dependent variable especially in computational biology area. Electronic supplementary material The online version of this article (10.1186/s12859-018-2427-4) contains supplementary material, which is available to authorized users. |
Databáze: | OpenAIRE |
Externí odkaz: |