A novel method for modelling interaction between categorical variables

Autor: Rob Eisinga, Rense Nieuwenhuis, R.P. Konig, Ben Pelzer, Manfred te Grotenhuis, Alexander W. Schmidt-Catran
Rok vydání: 2017
Předmět:
Zdroj: International Journal of Public Health, 62, 427-431
International Journal of Public Health
International Journal of Public Health, 62, 3, pp. 427-431
ISSN: 1661-8556
Popis: Sweeney and Ulveling (1972) introduced weighted effect coding, where the estimates for categories of nominal and ordinal variables are deviations from the arithmetic mean, typically from a sample. This somewhat neglected parameterization is preferred over the well-known effect coding (ANOVA) if the data are unbalanced (i.e., when categories hold different numbers of observations) and was recently revived in this journal (te Grotenhuis et al. 2016). In this paper, we show that weighted effect coding can also be applied to regression models with interaction effects. The weighted effect coded interactions represent the additional effects over and above the main effects obtained from the model without these interactions. This is a useful alternative to effect coding when the data are unbalanced as in most observational data. In this contribution, we describe this novel parameterization and provide syntax, data, and examples in SPSS, R, and Stata on http://www.ru.nl/sociology/mt/wec/downloads. For didactical reasons we apply OLS regression models, but weighted effect coded interactions can be used in any generalized linear model. Throughout this text we use the word ‘interaction’, while other researchers prefer ‘moderation’. Interactions between categorical variables Dummy coded interaction When directional interaction hypotheses are tested and categorical (i.e., ordinal or nominal scaled) predictor variables are involved, dummy coding is often appropriate. In this parameterization the main effects relate to a particular subset of respondents and for the remaining subsets the dummy coded interaction effects reflect deviations from these main effects. To create dummy coded interaction variables one has to multiply the original, 0/1 coded, dummy variables (Hardy 1993). As an empirical example we will investigate to what extent the mean BMI differs across three age categories in a group of respondents with one or more children and in a childless group (Umberson et al. 2011). We use data on self-reported body length and weight, in three random samples (n = 3314) drawn from the Dutch population (aged 18–70) in 2000, 2005, and 2011 (Eisinga et al. 2002, 2012a, b). We created the dummy coded variables Childlessdc with code 1 for respondents with no children and code 0 for respondents with one or more children, Middledc (code 1 for the middle-aged and 0 for both young and older respondents) and Olderdc (1 for older and 0 for both young and middle-aged respondents). The dummy coded interaction variables Childlessdc × Middledc, and Childlessdc × Olderdc are multiplications of these dummy coded variables (see Table 1 and our website for details). First, we estimated the main effects without interaction (see Table 4, Model 1) and second, we added the two interaction variables (Table 4, Model 2). Note that the reference categories (a) respondents with children, (b) youngsters, and (c) childless youngsters are omitted from the two models, which means that their estimates are set to zero. Table 1 Coding scheme for the dummy coded main and interaction effects for the childless, middle-aged and older-aged (references/omitted categories are with children, young, and childless × young)
Databáze: OpenAIRE