Zobrazeno 1 - 10
of 153
pro vyhledávání: '"Zamar Ruben H"'
We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a proced
Externí odkaz:
http://arxiv.org/abs/2102.06851
Autor:
Raymaekers, Jakob, Zamar, Ruben H.
We study a framework of regularized $K$-means methods based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared through simulation and theoretical analysis. Based on the results, we
Externí odkaz:
http://arxiv.org/abs/2010.00950
Publikováno v:
BMC Bioinformatics, Vol 7, Iss 1, p 521 (2006)
Abstract Background Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human gen
Externí odkaz:
https://doaj.org/article/5c9183bfe2df4cebbed0d3ea1f31b404
Autor:
Raymaekers, Jakob, Zamar, Ruben H.
We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures such as the standard deviation and the range, our proposed scale avoids dampening the beneficial effect of in
Externí odkaz:
http://arxiv.org/abs/1912.10492
K means is a popular non-parametric clustering procedure introduced by Steinhaus (1956) and further developed by MacQueen (1967). It is known, however, that K means does not perform well in the presence of outliers. Cuesta-Albertos et al (1997) intro
Externí odkaz:
http://arxiv.org/abs/1906.08198
We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pears
Externí odkaz:
http://arxiv.org/abs/1808.06016
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different cla
Externí odkaz:
http://arxiv.org/abs/1707.00727
Two proteins are homologous if they have a common evolutionary origin, and the binary classification problem is to identify proteins in a candidate set that are homologous to a particular native protein. The feature (explanatory) variables available
Externí odkaz:
http://arxiv.org/abs/1706.06971
We consider the problem of multivariate location and scatter matrix estimation when the data contain cellwise and casewise outliers. Agostinelli et al. (2015) propose a two-step approach to deal with this problem: first, apply a univariate filter to
Externí odkaz:
http://arxiv.org/abs/1609.00402
Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The proposed method,
Externí odkaz:
http://arxiv.org/abs/1509.02564