Category Archives: Data Analysis – Inferential Statistics

Assignment: Summary on ANOVA & Kruskal Wallis test

Anova

One-way analysis of variance (ANOVA), method developed by R.A. Fisher, is used to understand significant differences between means of three or more independent and unrelated groups. It is a helpful tool for us to explain our observations. ANOVA helps us determine if there is a significant differences in means between these unrelated groups.

Instead of doing multiple two-sample t-tests which could result in errors, ANOVA, in its simplest form, provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. For this reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.


These tests works with a null hypothesis where µ = group mean and k = number of groups. H = Hypothesis

We start by accepting the null hypothesis. If, the one-way ANOVA concludes that at least 2 groups have significantly different, we then accept the alternate hypothesis. However, the one-way ANOVA does not pinpoint the specific groups that are significantly unrelated to each other – it merely concludes if there were means from (at least) 2 groups that were significantly different from each other. To determine which specific groups differed from each other, we need to conduct a post-hoc test.

For example: Let’s consider the scores for university sports teams for the following years:

Sport  Teams
YearUTTTUTUT
2012898
2013897
2014998
2015599
2016797
2017898
2018995
20194107
MEAN7.259.1257.375

Once we have the data and perform an ANOVA single-factor test, we get the following results:

SUMMARY
GroupsCountSumAverageVariance
UT8587.253.357
TTU8739.1250.125
TUT8597.3751.4107
ANOVA
Source of VariationSSdfMSFP-valueF crit
Between Groups17.583328.79165.39050.012893.4668
Within Groups34.25211.6309
Total51.833323

One of the ways this can be interpreted – if the F value is greater than F crit, we understand that there is a significant difference between the 3 groups.

Assumptions:

  1. Dependent variable should be measured at the interval or ratio level (i.e., they are continuous).
  2. Independent variable should consist of two or more categorical, independent groups. 
  3. Independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves
  4. There should be no significant outliers.
  5. Dependent variable should be approximately normally distributed for each category of the independent variable. 
  6. There needs to be homogeneity of variances. .

Kruskal Wallis Test

The Kruskal Wallis test is a version of the independent measures (one way) ANOVA that can be performed on ordinal (ranked) data.

For instance:
Ordinal data is displayed below in the following table:

Earings per year (in Thousand euros)TriinDarjaMaggi
Year
2012455052
2013465156
2014485239
2015494642
2016504045
2017485449
2018474255
2019484843
MEAN:47.62547.87547.625

This table consists on the earnings per year by 3 groups Triin, Darja and Maggi. What we are asking here is:

Is there a difference between group 1, 2 and 3 using alpha +0.05?

To determine the answer to this, we perform a Kruskal Wallis test which consist of the following steps:

  • Step 1:  Define Null and Alternative Hypothesis. Hο = There is no difference between treatments. H1 = There is a difference between the treatments
  • Step 2: State Alpha. The alpha level is usually at 0.05.
  • Step 3: Calculate degrees of freedom. df = k – 1 where k is the number of groups therefore in this case df = 3-1 = 2
  • Step 4: State decision rule. We now order all our data in ranks and check for the chi square table and base our decision rule. Here we have an alpha level of .05 in 2 degrees of freedom. Using this information we can find our critical value. Here our critical value is 5.99.  Therefore if x² is greater than 5.99 we reject the null hypothesis and similarly if x² is not greater than 5.99 we will not reject the null hypothesis.
  • Step 5: Calculate test statistic. In order to derive our calculations we must organize the data in ranks.
    GroupOriginal ScoreRanks
    Maggi391
    Darja402
    Darja423
    Maggi423
    Maggi435
    Triin456
    Maggi456
    Triin468
    Darja468
    Triin4710
    Triin4811
    Triin4811
    Triin4811
    Darja4811
    Triin4915
    Maggi4915
    Triin5017
    Darja5017
    Darja5119
    Darja5220
    Maggi5220
    Darja5422
    Maggi5523
    Maggi5624
    RanksTriinDarjaMaggi
    621
    833
    1085
    11116
    111715
    111920
    152023
    172224
    TOTAL (T)8910297

    The entire data set is organized into ranks and we use the following formula: Kruskal Wallis.  Here N is the count of data and T is total of the ranks of an individual group. We derive the following conclusions:

    H4.550921659
    D0.995550612
    adjusted H:4.571
    d.f.:2
    P value:0.102

    Since H is lesser than 5.99, we do not reject the null hypothesis.

  • Step 6: State results and derive conclusion – There is no significant difference among the earnings of Darja, Triin and Maggi H = 4.5 (2, n=24), p>0.05
References:

https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide.php
http://en.wikipedia.org/wiki/Analysis_of_variance

http://www.k2e.com/tech-update/tips/151-ranking-data-in-excel-without-sorting
http://www.le.ac.uk/bl/gat/virtualfc/Stats/kruskal.html
http://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance
http://vassarstats.net/textbook/ch14a.html