A model-free alternative to ANOVA.
To implement this idea quantitatively,
Where did this test statistic come from? It’s possible to show that the statistic is equivalent to \[ \frac{\sum_{i} n_{i}\left(\bar{R}_{i} - \bar{R}\right)^2}{\frac{1}{N - 1}\sum_{i, j} \left(R_{ij} - \bar{R}\right)^2} \] which compares the average rank in group \(i\) to the average rank overall, and standardizes by the overall variance of the ranks. The first formula is the one presented in the book, though, and it’s easier to calculate by hand.
Why not always use nonparametric ANOVA? If the data are actually normal, than this approach has less power than standard ANOVA. If you have doubts about validity, a safe approach is to try both. If the approaches approximately agree, default to standard ANOVA.
kruskal.test
function. It expects input in the same form as lm
in the earlier ANOVA examples. Below, we apply the test to the etch rate data. The \(p\)-value indicates that the groups have significantly different ranks, which is consistent with our previous findings.library(readr)
etch_rate <- read_csv("https://uwmadison.box.com/shared/static/vw3ldbgvgn7rupt4tz3ditl1mpupw44h.csv")
etch_rate$power <- as.factor(etch_rate$power) # want to think of power as distinct groups
kruskal.test(rate ~ power, data = etch_rate)
Kruskal-Wallis rank sum test
data: rate by power
Kruskal-Wallis chi-squared = 16.907, df = 3, p-value =
0.0007386