Popis: |
This is a scale-construction study. Its aim is to develop a self-report measurement tool that combines collections of items into a composite score intended to measure three theoretical aspects of public sector creativity, allowing differentiation between public servants in these regards. These three aspects of creativity are based on a tentative theoretical model guided by exploratory research on public sector creativity hypothesizing its pragmatic nature characterized by incrementalism, reactivism, realism (Houtgraaf 2022; Houtgraaf et al. in review; Kruyen and Van Genugten 2017; Rangarajan 2008; Houtgraaf et al. 2021). These exploratory findings, in short, indicate that public servants’ creativity may vary from generating incremental ideas reactively that are mainly evaluated in terms of usefulness and feasibility. Initial scale items and dimensions are based on the research findings from these studies. Moreover, the scale includes a separate element that measures the application of different types of creative practices, as extant research indicates that public servants apply a wide range of practices in order to come up with novel and useful ideas (Houtgraaf et al. 2021; Houtgraaf 2022). The present study’s aims are thereby threefold. Firstly, it aims to develop a measurement tool that allows to adequately measure the degree of public servants’ individual creativity. Secondly, it aims to develop a measurement tool that measures the diversity and types of practices applied by public servants. Thirdly, it aims to develop a measurement tool that encompasses measures of the position of public servants’ creativity on three dimensions: incrementalism / radicalism (magnitude), reactivism / proactivism (trigger) and realism / idealism (perspective). Creativity can vary on these dimensions, indicating its degree of overarching pragmatism. The developed scale will allow for adequate assessment of public servants’ individual creativity, assessing meaningful differences between public servants’ creativity and indication where room for improvement may lay. N.B.: we do not intend to include the practicalism as a dimension of the scale, but rather as a useful separate aspect of measurement that provides insight on the types of practices applied, rather than the variation between respondents. We hypothesize that the three pragmatic dimensions form clusters that can be somewhat separated, but will correlate as they are caused (reflexive model assumption) by the same overarching second order construct: Hypothesized construct (second order latent variable): pragmatism Hypothesized dimension (latent variable) 1: incrementalism / radicalism (magnitude) Hypothesized dimension (latent variable) 2: reactivism / proactivism (trigger) Hypothesized dimension (latent variable) 3: realism / idealism (perspective) Design Plan Study type Scale construction – Creation of a multidimensional empirical self-report measurement tool that combines collections of items into a composite score intended to measure three theoretical aspects of public sector creativity, allowing differentiation between public servants in these regards (DeVellis 2012), through multi-wave data collection in which sets of items measuring latent variables are formulated, analyzed and adjusted in an iterative, sequential process based on various statistical analyses and theoretical considerations as outlined in our analyses section. Studies involve human subjects, but they have not been assigned to treatment groups. Is there any additional blinding in this study? N/A Study design The present study is a multi-wave scale construction study. The study focuses on the construction of a multidimensional five-point Likert scale measuring three aspects of creativity. These aspects are: 1) measurement of overall degree of creativity, 2) measurement of the position of public servants’ creativity on three dimensions of creativity indicating degree of pragmatism (Houtgraaf 2022; Houtgraaf 2021; Houtgraaf et al. 2021; Houtgraaf et al. in review) and 3) measurement of the diversity and types of practices applied (Houtgraaf 2022; Houtgraaf et al. 2021). Data are collected through three waves of data collection using different panels. Questionnaires are used in which sets of items taping latent variables (dimensions, constructs) are formulated and gradually adjusted based on various statistical analyses as outlined in our analyses section. The study will follow the generate, refine, test-and-retest (validation) logic as outlined below (see analyses section for more in-depth information). A) Januari 2022: Generating the scale based on four phases, namely: 1) identification of terminology and language regarding public sector creativity in over 500 diary entries containing data from public servants on their creativity, 2) formulation of initial items per aspect and hypothesized dimension of the construct based on diary data and input from practitioners, 3) expert review of the initial items by convenience sample of +/- 90 practitioners and 4 researchers on creativity and 4) pilot testing of the items on convenience sample of +/- 10 knowledge workers. These steps are taken to ensure content validity, meaning having the scale consist of the full universe of relevant items in relation to the dimensions, sub-dimensions and scale in order to have the items measure the content that it is intended to measure (DeVellis 2012). B) May 2022: Refining the scale using exploratory factor analysis on data from the Binnenlands Bestuur survey featuring +/- 1250 Dutch public sector knowledge workers that subscribed to the Binnenlands Bestuur (Dutch platform for public sector employees) newsletter. This step is taken to analyze the data, assess the initial validity and reliability of the sets of items and see what can be theoretically interpreted from its structure. C) September 2022: Testing the scale using confirmatory factor analysis on data from the Flitspanel survey featuring a representative sample of +/- 1500 Dutch public sector employees that applied to the Flitspanel (initiative by the Dutch Ministry of Domestic affairs, is the main Dutch internet panel for the (semi-)public sector with over 10.000 voluntary panel members see Flitspanel 2021; Hulzebosch et al. 2017). This step is taken to analyze the data and identify a model based on the structure of the data and determine to what degree it fits the exploratory theory on public sector creativity. D) November 2022: Testing the scale using confirmatory factor analysis on data from the LISS panel survey featuring a sample of +/- 750 public and private sector employees (LISS Panel 2022). This step is taken to analyze the data and test whether the previously identified model can again be identified within another sample and to check whether the scale is able to differentiate between public and private sector employees based on differences in means and increased spreads. The present study will be carried out together with, and take part in the data collection procedure of the larger research project ‘The Creative Public Servant’ (Dutch Organization for Scientific Research through the Open Competition contest under grant #27000931). The study will validate the measurement tool that encapsulates measurements of the abovementioned aspects and dimensions of creativity using EFA, CFA and additional analyses in R in order to construct a validated, multidimensional scale for measuring the three abovementioned aspects of public sector creativity. Respondents will be requested to self-report on the presented items. Items are presented in random sequence as grouped sequence of rather similar items pertaining to the dimensions might result in response patterns which would harm validity. Participants will be asked to rate on a five-point Likert scale. The questions are posed as “To what degree did you …”, “In my work I come up with …” and “Work-related ideas should be …” followed by the statements with scale response options [1 = not at all], [2 = rarely], [3 = neutral], [4 = quite a bit], [5 = very much]. Sampling Plan Existing Data Registration prior to creation of data Explanation of existing data N/A Data collection procedures Research populations will include respondents from target population of—predominantly public sector—working individuals in the Netherlands. The recruitment of participants will be outsourced to Binnenlands Bestuur, Flitspanel and LISS panel. Binnenlands Bestuur mainly reaches public sector knowledge workers using self-selection convenience sampling. Flitspanel guarantees a representative sample of public sector employees from the range of different types of public sector organizations (for example local governments, ministries and public executive agencies) using self-selection convenience sampling. LISS panel guarantees a stratified sample of private and public sector (knowledge worker) employees using self-selection convenience sampling. After completing a number of profiling questions covering basic socio-demographic information, participants continue to the rest of the survey. Only participants who fill in socio-demographic information will be allowed to continue with the survey. Participants by Flitspanel and LISS panel will be reimbursed for their participation in the study. They are paid by the panel companies. The payment is dependent on finishing the questionnaire. As it is a survey scale-construction, no personnel will interact with the study subjects apart from the convenience sampled practitioners for the expert review. We have received ethical approval from the ethical committee of the Institute of Management Research of Radboud University. Data management plans were approved upon by both Radboud University (host organization) and NOW (funder). Sample size The Binnenlands Bestuur sample consists of +/- 1250 respondents from their 6000 pool of public servants. Flitspanel sample consists of +/- 1500 respondents from their +/- 10.000 pool of public servants guaranteeing a representative sample of employees from the range of public sector. LISS panel sample consists of +/- 750 respondents from their 5000 household pool. Variables The study will focus on identification and construction of a multidimensional scale on creativity featuring the following variables. Firstly, it aims to develop a measurement tool that measures the overall degree of public servants’ creativity. Secondly, it aims to develop a measurement tool that measures the diversity and types of practices applied by public servants. Thirdly, it aims to develop a measurement tool that encompasses measures of three dimensions of creativity: incrementalism / radicalism (magnitude), reactivism / proactivism (trigger) and realism / idealism (perspective). Creativity can vary on these dimensions, indicating its degree of overarching pragmatism. Analysis Plan Statistical models We will conduct our analyses using statistical packages in R. The analyses will encompass the following steps: 1: Refining scale using Exploratory Factor Analyses on Binnenlands Bestuur data A: Accuracy: We check the accuracy of the data. We check whether the values or on the same 1-7 Likert score range. We check whether the items contain reverse formulated statements and reverse the scores. B: Missing values: We check whether there is missing data in the dataset. We check the percentage of missing data for both respondents and items. We exclude rows and columns when high percentages of the data are missing and identify the underlying reason for future questionnaires. We do not, however, expect significant amounts of data to be missing as the responses in the questionnaire are forced. C: Outliers: We check whether there are outliers. We assess whether these outliers distort the data used for our scale construction or provide insightful differentiation between respondents. If deemed necessary, outliers are excluded when exceeding the Mahalanobis cut-off point based on Chi-Square calculation in combination with number of columns/degrees of freedom or when they exceed the 3xSE threshold. D: Requirements and assumptions: We check whether the data meet the necessary requirements and assumptions in terms of additivity, linearity, normality, homogeneity, homoscedasticity and sample size for conducting an exploratory factor analysis. E: Rotation: We expect to apply oblique rotation (direct oblimin) because of our assumption that the factors (dimensions) will be correlated as they are part of the same pragmatic construct. F: Factor analyses: We run exploratory factor analyses to explain variation among relatively many manifest variables using relatively fewer new variables (factors as dimensions) in order to condense information by determining how many factors underly the data, define meaning of these factors and identifying well and poorly performing items. We screeplot eigenvalues and check for point of inflection. Moreover, we run parallel analysis to check whether the analysis suggests the same number of factors based on the actual and resampled data (DeVellis 2012). Finally, we check for Kaiser old and new criterion—respectively 1 and 0.7—of the eigenvalues but only use this as backdrop information as the use of Kaiser criterion is under debate as monte carlo investigations indicate that parallel analysis is more accurate (Zwick & Velicer 1986; Kaiser 1960). We check the loading table for simple structure. We check for non-loading items, assessing that loading should be higher than 0.3. We check for split-loading items that load 0.3 or higher on multiple factors. We exclude items based on loading related and theoretical considerations G: Reliability: We check for reliability of our identified factors using Cronbach’s Alpha, for which scores of 0.7+ are generally deemed an acceptable lower bound with rules of thumb being as follows: [0.90 = shortening] (DeVellis 2012; Nunnally 1978). So, if alpha scores exceed 0.90, we will look to effectively eliminate items based on weak internal consistency/inter-item correlations and the theoretical spectrum coverage of the items based on subdimensions in order to shorten the scale with the merit of reducing respondent burden. Moreover, we look for score without item suggestions to see whether items should be excluded, this in backdrop of the items’ theoretical relevance. Furthermore, as Cronbach’s Alpha generally underestimates reliability, we also check for greatest lower bound in case of low Cronbach’s Alpha. H: Nomological network: We assess a nomological network, thereby identifying the relation of the identified factors (dimensions) and check whether they behave as theory would hypothesize. For convergent criterion validity, we check the correlation between the items and dimensions with the most generally used scale for measuring overall creativity (Tierney and Farmer 2004), although this scale is not to be perceived as a gold standard per se as it is not constructed via a thorough scale-construction study and encompasses a short-scale? (DeVellis 2012). For convergent construct validity as the theoretical relation of our variables and dimensions to other variables (DeVellis 2012), we check for the correlation construct of intrinsic motivation, which is theoretically significantly positively related to creativity (Amabile 1996; Anderson et al. 2014). A connotation here is that our scale is per definition more nuanced because of its dimensions that only measure specific aspects of creativity, so this might explain possible lack of convergent validity. Furthermore, for discriminant construct validity, or the absence of correlations between measures of unrelated construct (DeVellis 2012), we check whether the dimensions and scales correlate with scales that are not expected to correlate, in this case gender and multiple scales measuring the perspectives on the degree of adequacy of cooperation with external parties and a scale on the degree of possibility for personal profiling during the corona crisis. This will allow for assessment whether the scale behaves in the way one would theoretically expect; correlating with variables that are theoretically related and not correlating with variables that are theoretically unrelated. I: Theoretic sensemaking: We check whether the identified factors make sense in relation to extant theory, the hypothesized dimensions and the construct. 2) Testing scale using Confirmatory Factor Analyses on Flitspanel data A: Accuracy: We check the accuracy of the data. We check whether the values or on the same 1-7 Likert score range. We check whether the items contain reverse formulated statements and reverse the scores. B: Missing values: We check whether there is missing data in the dataset. We check the percentage of missing data for both respondents and items. We exclude rows and columns when high percentages of the data are missing and identify the underlying reason for future questionnaires. If deemed necessary, we will use MICE on low percentage missing data. We do not, however, expect significant amounts of data to be missing as the responses in the questionnaire are forced. C: Outliers: We check whether there are outliers. We assess whether these outliers distort the data used for our scale construction or provide insightful spread. If deemed necessary, outliers are excluded when exceeding the Mahalanobis cut-off point based on Chi-Square calculation in combination with number of columns/degrees of freedom or when they exceed the 3xSE threshold. D: Requirements and assumptions: We check whether the data meet the necessary requirements and assumptions in terms of additivity, linearity, normality, homogeneity, homoscedasticity and sample size for conducting an exploratory factor analysis. E: Reflexive model: We test a reflexive model based on our previous EFA, assuming that the latent variables (dimensions and second order construct) cause the measured answers in form of the manifest variables. We set the scale by setting one of the pattern coefficients to 1 (marker variable) or latent variable to 1. ). We check the loading table for simple structure. We check for non-loading items, assessing that loading should be higher than 0.3. We check for split-loading items that load 0.3 or higher on multiple factors and remove these items. Variances should be positive, smaller than 1, and standard errors within acceptable range. We then check for model fit indices TLI and CFI, for which values above 0.9 are deemed acceptable and above 0.95 are deemed excellent. We also check for residuals statistics model fit indices using RMSEA and RMSR, for which values between 0.6 and 0.8 are deemed acceptable and lower than 0.6 are deemed excellent. We check modification indices for alternative addable paths and thus alternative models. F: Nomological network: For convergent criterion validity, we check the correlation between the items and dimensions with the most generally used scale for measuring overall creativity (Tierney and Farmer 2004), although this scale is not to be perceived as a gold standard per se as it is not constructed via a thorough scale-construction study and encompasses a short-scale (DeVellis 2012). For convergent construct validity as the theoretical relation of our variables and dimensions to other variables (DeVellis 2012), we check for the correlation construct of intrinsic motivation, which is theoretically significantly positively related to creativity (Amabile 1996; Anderson et al. 2014). A connotation here is that our scale is per definition more nuanced because of its dimensions that only measure specific aspects of creativity, so this might explain possible lack of convergent validity. Furthermore, for discriminant construct validity, or the absence of correlations between measures of unrelated construct (DeVellis 2012), we check whether the dimensions and scales correlate with scales that are not expected to correlate, in this case gender. G: Reliability: We check for reliability of our identified factors using Cronbach’s Alpha, for which scores of 0.7+ are generally deemed an acceptable lower bound with rules of thumb being as follows: [0.90 = shortening] (DeVellis 2012; Nunnally 1978). So, if alpha scores exceed 0.90, we will look to effectively eliminate items based on weak internal consistency/inter-item correlations and the theoretical spectrum coverage of the items based on subdimensions in order to shorten the scale with the merit of reducing respondent burden. Moreover, we look for score without item suggestions to see whether items should be excluded, this in backdrop of the items’ theoretical relevance. Furthermore, as Cronbach’s Alpha generally underestimates reliability, we also check for greatest lower bound in case of low Cronbach’s Alpha. 3) Re-testing scale using Confirmatory Factor Analyses on LISS panel data A: Accuracy: We check the accuracy of the data. We check whether the values or on the same 1-7 Likert score range. We check whether the items contain reverse formulated statements and reverse the scores. B: Missing values: We check whether there is missing data in the dataset. We check the percentage of missing data for both respondents and items. We exclude rows and columns when high percentages of the data are missing and identify the underlying reason for future questionnaires. If deemed necessary, we will use MICE on low percentage missing data. We do not, however, expect significant amounts of data to be missing as the responses in the questionnaire are forced. C: Outliers: We check whether there are outliers. We assess whether these outliers distort the data used for our scale construction or provide insightful differentiation between respondents. If deemed necessary, outliers are excluded when exceeding the Mahalanobis cut-off point based on Chi-Square calculation in combination with number of columns/degrees of freedom or when they exceed the 3xSE threshold. D: Requirements and assumptions: We check whether the data meet the necessary requirements and assumptions in terms of additivity, linearity, normality, homogeneity, homoscedasticity and sample size for conducting an exploratory factor analysis. E: Reflexive model: We run a reflexive model based on our previous EFA, assuming that the latent variables (dimensions and second order construct) cause the measured answers in form of the manifest variables. We set the scale by setting one of the pattern coefficients to 1 (marker variable) or latent variable to 1. ). We check the loading table for simple structure. We check for non-loading items, assessing that loading should be higher than 0.3. We check for split-loading items that load 0.3 or higher on multiple factors and remove these items. Variances should be positive, smaller than 1, and standard errors within acceptable range. We aim for four indicators per latent variables, or three if error variance do not covary, or two when error variances do not covary and loadings are set equal. We then check for model fit indices TLI and CFI, for which values above 0.9 are deemed acceptable and above 0.95 are deemed excellent. We also check for residuals statistics model fit indices using RMSEA and RMSR, for which values between 0.6 and 0.8 are deemed acceptable and lower than 0.6 are deemed excellent. We check modification indices for alternative addable paths and thus alternative models. F: Nomological network: For convergent criterion validity, we check the correlation between the items and dimensions with the most generally used scale for measuring overall creativity (Tierney and Farmer 2004), although this scale is not to be perceived as a gold standard per se as it is not constructed via a thorough scale-construction study and encompasses a short-scale (DeVellis 2012). For convergent construct validity as the theoretical relation of our variables and dimensions to other variables (DeVellis 2012), we check for the correlation construct of intrinsic motivation, which is theoretically significantly positively related to creativity (Amabile 1996; Anderson et al. 2014). A connotation here is that our scale is per definition more nuanced because of its dimensions that only measure specific aspects of creativity, so this might explain possible lack of convergent validity. Furthermore, for discriminant construct validity, or the absence of correlations between measures of unrelated construct (DeVellis 2012), we check whether the dimensions and scales correlate with scales that are not expected to correlate, in this case gender. G: Reliability: We check for reliability of our identified factors using Cronbach’s Alpha, for which scores of 0.7+ are generally deemed an acceptable lower bound with rules of thumb being as follows: [0.90 = shortening] (DeVellis 2012; Nunnally 1978). So, if alpha scores exceed 0.90, we will look to effectively eliminate items based on weak internal consistency/inter-item correlations and the theoretical spectrum coverage of the items based on subdimensions in order to shorten the scale with the merit of reducing respondent burden. Moreover, we look for score without item suggestions to see whether items should be excluded, this in backdrop of the items’ theoretical relevance. Furthermore, as Cronbach’s Alpha generally underestimates reliability, we also check for greatest lower bound in case of low Cronbach’s Alpha. Transformations We will assign integer codes for nominal outcome variables. We will recode reverse items. Other transformations that prove to be required will be explicitly named within the methods section of the article. Data exclusion As the study concerns scale construction, items or components that are shown to be significantly inconsistent or poor in measuring the dimensions and/or construct will be excluded from the scale based on the abovementioned analyses. These will be comprehensively reported on within the results section of the eventual article. Furthermore, based on what the data looks like, we decide whether it is necessary to exclude certain data. Exclusion of the data will be based on the steps of the analyses as outlined above. Participants who do not fill out the demographics questions will not be able to participate in the rest of the survey and will also be excluded from the data analysis. Missing data We check whether there is missing data in the dataset. We check the percentage of missing data for both respondents and items. We exclude rows and columns when high percentages of the data are missing and identify the underlying reason for future questionnaires. If deemed necessary, we will use MICE on low percentage missing data. We do not, however, expect significant amounts of data to be missing as the responses in the questionnaire are forced. References Amabile, T.M. 1996) Creativity in Context. Westview Press: Boulder. Anderson, N., & King, N. 1993. Innovation in organizations. In C. Cooper & I. Robertson (Eds.), International review of industrial and organizational psychology: 86-104. London: Wiley. Cattell, R.B. (1996). “The Screen Test for the Number of Factors”, Multivariate Behavioural Research, 1(1): 245-276. Cronbach, L.J. (1951). “Coefficient alpha and the internal structure of tests”, Psychometrika, 16 (1): 297-334. DeVellis, R.F. (2012). Scale Development : Theory and Applications. Sage Press: London. Flitspanel (2021). ‘Gebruikmaken van het Flitspanel?’. Accessed via the following link https://flitspanel.nl/opdrachtgevers/. Accessed on December 28th, 2021. Hayton, J.C., D.G. Allen and V. Scarpello (2004). “Factor Retention Decisions in Exploratory Factor Analysis: A tutorial on parallel analysis”, Organizational Research Methods, 7(2): 191-205. Houtgraaf, G. 2022. “Public Sector Creativity: Triggers, Practices and Ideas for Public Sector Innovations” Public Management Review. Early cite: https://www.tandfonline.com/doi/pdf/10.1080/14719037.2022.2037015 Houtgraaf, G., P.M. Kruyen, and S. Van Thiel. 2021. “Public Sector Creativity as the Origin of Public Sector Innovation: A taxonomy and future research agenda” Public Administration. Early cite: https://onlinelibrary.wiley.com/doi/epdf/10.1111/padm.12778 Houtgraaf, G., P.M. Kruyen, and S. Van Thiel. 2021. “Public Sector Creativity: Salient Stimulators and Inhibitors” in review at Public Management Review Kaiser, H.F. (1960). “ The Application of Electronic Computers to Factor Analysis”, Educational and Psychological Measurement, 20 (1): 141-151. Kruyen, P.M., and M. Van Genugten. 2017. “Creativity in local government: Definition and determinants” Public Administration, 95(3): 825-841. https://doi.org/10.1111/padm.12332 LISS Panel (2022). ‘Liss Panel’ Accessed via the following link: https://www.website.lisspanel.nl. Accessed on March 1st, 2022. Nunnally, J.C. (1978). Psychometric Theory. New York: McGraw-Hill. Rangarajan, N. 2008. “Evidence of Different Types of Creativity in Government: A Multimethod Assessment” Public Performance and Management Review 32 (1): 132–163. https://doi.org/10.2753/pmr1530-9576320106. Tierney, P. and Farmer, S.M. (2004). “The Pygmalion Process and Employee Creativity”, Journal of Management, 30(30): 413-432. Zwick, W R. and Velicer. W F. (1986). “Comparison of five rules for determining the number of components to retain”, Psychological Bulletin, 99, 432-442 |