Popis: |
The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters that are typically selected by Bayesian optimization or cross-validation. This requires repeated, costly, posterior inference. We provide an alternative for selecting good priors without carrying out posterior inference, building on the prior predictive distribution that marginalizes out the model parameters. We estimate virtual statistics for data generated by the prior predictive distribution and then optimize over the hyperparameters to learn ones for which these virtual statistics match target values provided by the user or estimated from (subset of) the observed data. We apply the principle for probabilistic matrix factorization, for which good solutions for prior selection have been missing. We show that for Poisson factorization models we can analytically determine the hyperparameters, including the number of factors, that best replicate the target statistics, and we study empirically the sensitivity of the approach for model mismatch. We also present a model-independent procedure that determines the hyperparameters for general models by stochastic optimization, and demonstrate this extension in context of hierarchical matrix factorization models. |