Most statistical procedures in CoStat (including Statistics : Correlation, Statistics : Descriptive, and parts of Statistics : Frequency Analysis and Statistics : Miscellaneous) assume that the data is normally distributed. Sometimes there are other assumptions; for example, standard ANOVAs assume that the variances of the subgroups are homogeneous. These assumptions allow the tests to make powerful inferences about the data.
For some datafiles, the assumptions are not valid. Several other tests have been devised ("nonparametric" tests) which do not make assumptions about the distribution of the data. Most of these tests rank the data and then do statistical tests with the ranked values. These tests are generally not as powerful (that is, not as good at rejecting the null hypothesis) as the traditional tests, but they are very useful when you can't use the traditional tests.
Unfortunately, there aren't replacement nonparametric tests for all of the traditional tests. CoStat has these options (on the Statistics : Nonparametric menu):
CoStat's manual has:
Statistics : Nonparametric : Rank Correlation
Correlation is a measure of the linear association of two independent variables (X1 and X2). This procedure is analogous to the Pearson product moment correlation coefficient, but it works with the ranks of the values in each column, so it makes no assumptions about the distribution of the values.
Related Procedures
Read the general description of Statistics : Nonparametric (page 333).
Statistics : Correlation (page 275) calculates the Pearson product moment correlation coefficient.
References
See Sokal and Rohlf (1981 and 1995) "Box 15.6 (1981) (or Box 15.7, 1995) Kendall's Coefficient of Rank Correlation, tau" and "Section 15.8 (1981 or 1995) Nonparametric for association" (for Spearman's Coefficient of Rank Correlation).
Data Format
The data file must have two or more columns. The correlation of all pairs of columns will be tested for the whole data file. Missing values (NaN's, page 70) are allowed; only missing values of either of the two columns currently being tested cause rejection of the row of data.
Options
Details
For both the Kendall and Spearman correlation tests, the test statistics are similar to the product moment correlation coefficient, r, and range from -1 to 1.
If n>40, the significance of Kendall's tau can be tested by calculating a test statistic, ts, which the procedure compares to tabulated values of Student's t distribution:
ts = tau / sqrt(2*(2*n+5)/(9*n*(n-1)))
where n is the number of data pairs.
If n>10, the significance of Spearman's r can be tested by calculating a test statistic, ts, which the procedure compares to tabulated values of Student's t distribution:
ts = r / sqrt( (1-r^2) / (n-2) )
If n<=10, Spearman's r must be compared to tabular values which are not included with CoStat, but can be found in Sokal and Rohlf (1995).
The Sample Run
Data for the sample run is from Sokal and Rohlf (Box 15.6, 1981; or Box 15.7, 1995): "Computation of rank correlation coefficient between the total length (Y1) of 15 aphid stem mothers and the mean thorax length (Y2) of their parthenogenetic offspring."
PRINT DATA
2000-08-04 14:11:40
Using: c:\cohort6\box156.dt
First Column: 1) Y1
Last Column: 2) Y2
First Row: 1
Last Row: 15
Y1 Y2
--------- ---------
8.7 5.95
8.5 5.65
9.4 6
10 5.7
6.3 4.7
7.8 5.53
11.9 6.4
6.5 4.18
6.6 6.15
10.6 5.93
10.2 5.7
7.2 5.68
8.6 6.13
11.1 6.3
11.6 6.03
For the sample run, use File : Open to open the file called box156.dt in the cohort directory and specify:
RANK CORRELATION (Kendall and Spearman Tests) 2000-08-04 14:13:05 Using: c:\cohort6\box156.dt Y1 Column: 1) Y1 Y2 Column: 2) Y2 Keep If: The test statistics, Kendall's tau and Spearman's r, are similar to the product moment correlation coefficient, r, ranging from -1 to 1. If the sample size is large enough (n>40 for tau and n>10 for r), additional test statistics can be calculated and compared to Student's t distribution (two-tailed, df=infinity). Otherwise, see specially tabulated critical values of tau in Table S in 'Statistical Tables' (F.J. Rohlf and R.R. Sokal, 1995). If P<=0.05, tau or r is significantly different from 0 and the values in the two columns probably are correlated. Y1 column: 1) Y1 Y2 column n Kendall tau P Spearman r P ------------------- ------- ------------- --------- ------------- --------- 2) Y2 15 0.49761335153 (n<=40) 0.64910714286 .0088 **
P is the probability that the variates are not correlated. The low P value (<=0.05) for this data set indicates that the two variates probably are correlated.