Popis: |
Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable $Y$ and a set of potential predictors $\mathbf X$. The binary response may represent, for example, the occurrence of some outcome of interest ($Y=1$ if the outcome occurred and $Y=0$ otherwise). When the dependent variable $Y$ represents a rare event, the logistic regression model shows relevant drawbacks. In order to overcome these drawbacks we propose the Generalized Extreme Value (GEV) regression model. In particular, we suggest the quantile function of the GEV distribution as link function, so our attention is focused on the tail of the response curve for values close to one. A sample of observations is said to contain a cure fraction when a proportion of the study subjects (the so-called cured individuals, as opposed to the susceptibles) cannot experience the outcome of interest. One problem arising then is that it is usually unknown who are the cured and the susceptible subjects, unless the outcome of interest has been observed. In these settings, a logistic regression analysis of the relationship between $\mathbf X$ and $Y$ among the susceptibles is no more straightforward. We develop a maximum likelihood estimation procedure for this problem, based on the joint modeling of the binary response of interest and the cure status. We investigate the identifiability of the resulting model. Then, we conduct a simulation study to investigate its finite-sample behavior, and application to real data. |