Popis: |
Mixture models can be used to approximate irregular densities or to model heterogeneity. ·When a density estimate is needed, then we can approximate any distribution on the real line using an infinite number of normals (Ferguson (1983)). On the other hand, when a mLxture model is used to model heterogeneity, there is a proper interpretation for each element of the modeL If the distributional assumptions about the components are met and the number of underlying clusters within the data is known, then in a Bayesian setting, to perform classification analysis and in general component specific inference, methods to undo the label switching and recover the interpretation of the components need to be applied. If latent allocations for the design of the Markov chain Monte Carlo (MCMC) strategy are included, and the sampler has converged, then labels assigned to each component may change from iteration to iteration. However, observations being allocated together must remain similar, and we use this fundamental fact to derive an easy and efficient solution to the label switching problem. We compare our strategy with other relabeling algorithms on univariate and multivariate data examples and demonstrate improvements over alternative strategies. When there is no further information about the shape of components and the number of clusters within the data, then a common theme is the use of the normal distribution as the "benchmark" components distribution. However, if a cluster is skewed or heavy tailed, then the normal distribution will be inefficient and many may be needed to model a single cluster. In this thesis, we present an attempt to solve this problem. We define a cluster to be a group of data which can be modeled by a unimodal density function. Hence, our intention is to use a family of univariate distribution funct ions, to replace the normal, for which the only constraint is unimodality. With this aim, we devise a new family of nonparametric unimodal distributions, which has large support over the space of univariate unimoda1 distributions. The difficult aspect of the Bayesian model is to construct a suitable MCMC algorithm to sample from the correct posterior distribution. The key will be the introduction of strategic latent variables and the use of the product space (Godsill (2001») view of reversible jump (Green (1995») methodology. We illustrate and compare our methodology against the classic mixture of normals using simulated and real data sets. To solve the label switching problem we use the new relabeling algorithm. |