This shows you the differences between two versions of the page.
— |
help:pearsonsr [2011/02/17 16:06] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | [[http://www.sofastatistics.com/userguide.php | Contents]] | ||
+ | [[:help:stats_tests | Statistical Tests Available]] | ||
+ | |||
+ | ====== Correlation - Pearson's R Test ====== | ||
+ | |||
+ | The Pearson's R Correlation Test (also called the [[http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient | Pearson product-moment correlation coefficient]]) tells you how strong the linear correlation is for paired numeric data e.g. height and weight. You should never run this test without viewing a scatterplot and visually examining the basic shape of the relationship. The test could indicate a low linear correlation and yet the data could have a very strong and clear non-linear pattern e.g. a U shape. The other thing to look for is outliers. It might be that the relationship is diffuse except for a value or two that sit on the top right-hand corner of the grid e.g. if you were looking at gun ownership and national income. This test is not very resistant to outliers. | ||
+ | |||
+ | Looking at the patterns below, we have a very strong linear relationship (A), a less strong linear relationship (B), and a weaker linear relationship (C). (D) is barely there, (E) is a very strong non-linear relationship, and (F) is an otherwise weak relationship with an important outlier. It should be noted that even though (C) might have a reasonably high R, that for any given x-axis value there is a considerable spread of y values. You couldn't really consider x as a proxy for y e.g. if you were comparing the results of a cheap and quick measurement tool against the results of an expensive and time-consuming measurement tool. Yes, they produce results with a reasonably high level of correlation, but it is still quite a loose relationship. Many more examples of patterns can be found here: [[http://en.wikipedia.org/wiki/File:Correlation_examples.png | Correlation examples.png]]. | ||
+ | |||
+ | {{:help:scatterplot_patterns.gif|}} | ||
+ | |||
+ | You should always look at the scatter plot before interpreting the results. Sometimes completely different datasets can produce identical summary statistics (see [[http://en.wikipedia.org/wiki/Anscombe%27s_quartet | Anscombe's quartet]]). | ||
+ | |||
+ | Two key things to note about the test are the p value and R. The p value tells you if you can reject the null hypothesis or not - namely, the hypothesis that there is no linear relationship. The R value gives an indication of the strength of the relationship. If the value is 0.7, we look at R squared to see how much the change in one variable is explained by change in the other - in this case 0.49 or less than half. Once again, it is always important to look at the scatterplot when interpreting the findings. | ||
+ | |||
+ | The Pearson's R Correlation Test is only for data that is numerical and that is distributed adequately normally. If your data is ordinal or not adequately normal the appropriate alternative is the [[:help:spearmansr | Spearman's R Correlation Test]]. | ||
+ | |||
+ | [[http://www.sofastatistics.com/userguide.php | Contents]] | ||
+ | |||
+ | [[:help:stats_tests | Statistical Tests Available]] | ||
+ | |||
+ | [[:home | Wiki]] |