Assessing Statistical Significance

The concept of significance is central to generating consumer insights. As market researchers we are tasked with providing data to decision makers that is both useful and meaningful. How we measure statistical significance depends upon the types of questions we ask and the associated data structure to support those questions.

Question types that generate categorical data lend themselves to the use of the chi-square statistic. Chi-square tests measure observed versus expected relationships. The greater the difference between what is observed compared to what is expected the higher the likelihood the difference can be considered statistically significant. In survey research we often compare two or more categorical variables in the form of a table consisting of row variables versus column variables. For example, comparing purchase frequency by gender or age category. The table below illustrates the relationship between training likelihood and role in the decision making process.

Training Likelihood * Decision Crosstabulation

Decision Final Influencer Not involved Total Training Likelihood Likely Count 141 180 40 361 % within Training Likelihood 39.1% 49.9% 11.1% 100.0% % within Decision 71.9% 52.9% 21.1% 49.7% Unsure Count 34 115 63 212 % within Training Likelihood 16.0% 54.2% 29.7% 100.0% % within Decision 17.3% 33.8% 33.2% 29.2% Not Likely Count 0.21 0.45 0.87 1.53 % within Training Likelihood 13.7% 29.4% 56.9% 100.0% % within Decision 10.7% 13.2% 45.8% 21.1% Total Count 1.96 2.4 1.9 7.26 % within Training Likelihood 27.0% 46.8% 26.2% 100.0% % within Decision 100.0% 100.0% 100.0% 100.0%

The chi-square tests whether or not there is an association between the two variables, training likelihood and decision making role for training in this case. Each cell has three components; cell count; row percentage and column percentage. If there were no relationship between the variables then each cell’s percentage values would be close to the respective percentage in the total for the row or column. In this example, 71.9% of the decision makers reported they were likely to attend training. This is 22-points higher than what would be expected (49.7%).

On the other end, 21.1% of those who are not involved in the training decision process reported they were likely to attend training. This is 28-points below the norm of 49.7%. The chi-square for this table is 139.5 with a significance of .000. Note that the magnitude of the chi-square statistic is impacted by the size of the table. By size we mean the number of respondents - the more respondents the larger the chi-square. Thus in surveys with several thousand respondents we would expect larger chi-square values. In these cases I recommend looking for values below .05 when assessing significance. In this example, there is a strong association between these two variables with training likelihood being greater for those who are closer to the decision making process.

Categorical variables are a key component in research, be it for a course evaluation survey or for measuring member satisfaction in a credit union survey. We can look at each variable individually, but deeper insight comes when we examine the inter-relationships between variables.