Popis: |
Ordinal response variables abound in scientific and quantitative analyses, whose outcomes comprise a few categorical values that admit a natural ordering, so that their values are often represented by non-negative integers, for instance, pain score (0-10) or disease severity (0-4) in medical research. Ordinal variables differ from rational variables in that its values delineate qualitative rather than quantitative differences. In this thesis, we develop new statistical methods for variable selection in a high-dimensional cumulative link regression model with an ordinal response. Our study is partly motivated by the needs for exploring the association structure between disease phenotype and high-dimensional medical covariates. The cumulative link regression model specifies that the ordinal response of interest results from an order-preserving quantization of some latent continuous variable that bears a linear regression relationship with a set of covariates. Commonly used error distributions in the latent regression include the normal distribution, the logistic distribution, the Cauchy distribution and the standard Gumbel distribution (minimum). The cumulative link model with normal (logit, Gumbel) errors is also known as the ordered probit (logit, complementary log-log) model. While the likelihood function has a closed-form solution for the aforementioned error distributions, its strong nonlinearity renders direct optimization of the likelihood to sometimes fail. To mitigate this problem and to facilitate extension to penalized likelihood estimation, we proposed specific minorization-maximization (MM) algorithms for maximum likelihood estimation of a cumulative link model for each of the preceding 4 error distributions. Penalized ordinal regression models play a role when variable selection needs to be performed. In some applications, covariates may often be grouped according to some meaningful way but some groups may be mixed in that they contain both relevant and irrelevant variables, i.e., whose coefficients are non-zero and zero, respectively. Thus, it is pertinent to develop a consistent method for simultaneously selecting relevant groups and the relevant variables within each selected group, which constitutes the so-called bi-level selection problem. We have proposed to use a penalized maximum likelihood approach with a composite bridge penalty to solve the bi-level selection problem in a cumulative link model. An MM algorithm was developed for implementing the proposed method, which is specific to each of the 4 error distributions. The proposed approach is shown to enjoy a number of desirable theoretical properties including bi-level selection consistency and oracle properties, under suitable regularity conditions. Simulations demonstrate that the proposed method enjoys good empirical performance. We illustrated the proposed methods with several real medical applications. |