One-way analysis of variance (ANOVA), method developed by R.A. Fisher, is used to understand significant differences between means of three or more independent and unrelated groups. It is a helpful tool for us to explain our observations. ANOVA helps us determine if there is a significant differences in means between these unrelated groups.
Instead of doing multiple two-sample t-tests which could result in errors, ANOVA, in its simplest form, provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. For this reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.
These tests works with a null hypothesis where µ = group mean and k = number of groups. H = Hypothesis
We start by accepting the null hypothesis. If, the one-way ANOVA concludes that at least 2 groups have significantly different, we then accept the alternate hypothesis. However, the one-way ANOVA does not pinpoint the specific groups that are significantly unrelated to each other – it merely concludes if there were means from (at least) 2 groups that were significantly different from each other. To determine which specific groups differed from each other, we need to conduct a post-hoc test.
For example: Let’s consider the scores for university sports teams for the following years:
Once we have the data and perform an ANOVA single-factor test, we get the following results:
|Source of Variation||SS||df||MS||F||P-value||F crit|
One of the ways this can be interpreted – if the F value is greater than F crit, we understand that there is a significant difference between the 3 groups.
- Dependent variable should be measured at the interval or ratio level (i.e., they are continuous).
- Independent variable should consist of two or more categorical, independent groups.
- Independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves
- There should be no significant outliers.
- Dependent variable should be approximately normally distributed for each category of the independent variable.
- There needs to be homogeneity of variances. .
Kruskal Wallis Test
The Kruskal Wallis test is a version of the independent measures (one way) ANOVA that can be performed on ordinal (ranked) data.
Ordinal data is displayed below in the following table:
|Earings per year (in Thousand euros)||Triin||Darja||Maggi|
This table consists on the earnings per year by 3 groups Triin, Darja and Maggi. What we are asking here is:
Is there a difference between group 1, 2 and 3 using alpha +0.05?
To determine the answer to this, we perform a Kruskal Wallis test which consist of the following steps:
- Step 1: Define Null and Alternative Hypothesis. Hο = There is no difference between treatments. H1 = There is a difference between the treatments
- Step 2: State Alpha. The alpha level is usually at 0.05.
- Step 3: Calculate degrees of freedom. df = k – 1 where k is the number of groups therefore in this case df = 3-1 = 2
- Step 4: State decision rule. We now order all our data in ranks and check for the chi square table and base our decision rule. Here we have an alpha level of .05 in 2 degrees of freedom. Using this information we can find our critical value. Here our critical value is 5.99. Therefore if x² is greater than 5.99 we reject the null hypothesis and similarly if x² is not greater than 5.99 we will not reject the null hypothesis.
- Step 5: Calculate test statistic. In order to derive our calculations we must organize the data in ranks.
Group Original Score Ranks Maggi 39 1 Darja 40 2 Darja 42 3 Maggi 42 3 Maggi 43 5 Triin 45 6 Maggi 45 6 Triin 46 8 Darja 46 8 Triin 47 10 Triin 48 11 Triin 48 11 Triin 48 11 Darja 48 11 Triin 49 15 Maggi 49 15 Triin 50 17 Darja 50 17 Darja 51 19 Darja 52 20 Maggi 52 20 Darja 54 22 Maggi 55 23 Maggi 56 24 Ranks Triin Darja Maggi 6 2 1 8 3 3 10 8 5 11 11 6 11 17 15 11 19 20 15 20 23 17 22 24 TOTAL (T) 89 102 97
The entire data set is organized into ranks and we use the following formula: . Here N is the count of data and T is total of the ranks of an individual group. We derive the following conclusions:
H 4.550921659 D 0.995550612 adjusted H: 4.571 d.f.: 2 P value: 0.102
Since H is lesser than 5.99, we do not reject the null hypothesis.
- Step 6: State results and derive conclusion – There is no significant difference among the earnings of Darja, Triin and Maggi H = 4.5 (2, n=24), p>0.05
References: https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide.php http://en.wikipedia.org/wiki/Analysis_of_variance http://www.k2e.com/tech-update/tips/151-ranking-data-in-excel-without-sorting http://www.le.ac.uk/bl/gat/virtualfc/Stats/kruskal.html http://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance http://vassarstats.net/textbook/ch14a.html