Article Text
Statistics from Altmetric.com
Objectives

Dealing with unpaired nonparametric data

Dealing with small samples of nominal data
In covering these objectives the following terms will be introduced:

χ^{2} test

Fisher's exact test

Yates's correction
In the previous article the basic principles behind comparing two groups was discussed. The t test was also shown to play an important part in this process when dealing with parametric data. As it is a versatile test, it can be used to compare independent groups and paired groups as well as give an estimate of a population mean when a sample is known. In contrast, when dealing with nonparametric data, several tests have to be used. Fortunately though the same principles apply.
As these nonparametric tests do not assume the distribution of the data is normal, they can be used on a much wider spectrum of results. The cost of this is they lack power and are mainly tests of significance. Consequently they will tell if a difference exists but not how big it is (table 1).
There are many nonparametric tests and choosing the most appropriate one can be difficult. Studying published papers can add to the confusion because tests are selected for a variety of reasons, including personal preferences. Furthermore, calculations are commonly carried out with the aid of computer software. As a result there is a risk that the unwary can produce a figure with a p value that is in fact meaningless because the wrong test has been used.
For those unfamiliar with statistics the way forward is to talk to someone who knows about the subject. To make these meetings more useful, the next two articles will concentrate on the common nonparametric tests used for two groups comparisons (table 2).
χ^{2} test
OVERVIEW
In the medical literature there are many examples of studies that have used nominal data. As described in article 1, this is where the data are divided into categories that do not have any inherent order.^{1} It is possible to see how the proportions of observations in different categories from a sample compare (“fit”) with that found in a population. Consequently this is known as “A goodness of fit” test. It is also possible to compare the distribution between samples as well as to test the association between variables. In all cases the systematic approach described already in this series is used (box 1).^{2} However, the z and t test is replaced with the χ^{2} test.
Goodness of fit
To demonstrate this consider the following example. Dr Egbert Everard, SpR in Emergency Medicine, is asked to determine if the type of complaints received by the Emergency Department of Deathstar General is similar to the national picture (table 3). To answer this he decides to use a χ^{2} “goodness of fit” test.
Box 1 System for statistical comparison of two groups

State the null hypothesis and the alternative hypothesis of the study

Select the level of significance

Establish the critical values

Select the groups

Choose and calculate the test statistic

Compare the calculated test statistic with the critical values

Express the chances of obtaining results at least this extreme if the null hypothesis is true
Key point 1
The χ^{2} test is a nonparametric test of the null hypothesis. It can be used on unpaired nominal data to determine:

A goodness of fit between a sample and a population

Comparison between two groups

Association between two variables
1 STATE THE NULL HYPOTHESIS AND THE ALTERNATIVE HYPOTHESIS OF THE STUDY
Having considered the problem, Egbert writes down the null hypothesis as:
“There is no significant difference between the type of complaints made by patients attending the Emergency Department of Deathstar General and those attending emergency departments nationally”.
The alternative hypothesis is the logical opposite of this—that is:
“There is a significant difference between the complaints made by patients attending the Emergency Department of Deathstar General and those attending emergency departments nationally”.
2 SELECT THE LEVEL OF SIGNIFICANCE
If the null hypothesis is correct, the distribution of complaints found in the Emergency Department of Deathstar General should be the same as that found nationally. Therefore the difference between them would be zero. However, there is bound to be some small variation simply due to chance. The question is how big a difference are we going to allow before we reject the idea that the two are all part of the same population?
We can answer this because the difference between a sample and population of nominal data has a χ^{2} distribution (fig 1). Egbert picks a significance level of 0.05 because, by convention, the outer 5% of the area under the curve is considered to be sufficiently away from the population mean as to represent values that cannot be attributed to chance variation. He now needs to determine the critical value for the χ^{2} statistic that demarcates this point (χ^{2}_{crit}).
3 ESTABLISH THE CRITICAL VALUES
As with the t distribution, the χ^{2} distribution changes shape depending upon the size of the sample. For mathematical reasons this is measured as the degrees of freedom rather than the actual size. When comparing a sample with a population, the degrees of freedom is one less than the number of categories in the sample. Therefore in this case, Egbert works out the degrees of freedom to be 4−1 = 3.
Using the table of χ^{2} statistics, a value of 7.82 or greater for χ^{2}_{crit} demarcates a right tail that has, at most, 5% of the area under the curve (fig 2). In other words, 7.82 separates the left 95% area of acceptance of the null hypothesis from the 5% area of rejection.
A sample of patients from Deathstar's Emergency Department can now be selected and the experimental χ^{2} statistic calculated (χ^{2}_{calc}).
4 SELECT THE SAMPLE
Using records for the previous year, Egbert finds out that the type of complaints made (table 4).
5 CHOOSE AND CALCULATE THE TEST STATISTIC
If the null hypothesis applies there is no difference between the national picture and Deathstar General. Consequently the expected number of complaints in each category (E) should be the same as those actually observed (O). The value for E in each category can be determined from the percentages provided for the population's categories. For example, in the national picture 15% of complaints are attributable to a misdiagnosis being made. Therefore if the null hypothesis is applied, 15% of patients attending the Emergency Department of Deathstar General should also make this type of complaint. The expected number would therefore be:
Carrying out the same process for each of the other categories, Egbert draws up a table of expected numbers of complaints if the null hypothesis applied (table 5).
To be able to use the χ^{2} test, the data need to have the following properties:

Only frequency data are used in the categories

Events are independent within a sample group. This means that paired data cannot be used and individual subjects only appear once in the table.

No expected value in the table is less than 1 and 80% have values over 5. The reason for this is small expected values can have a disproportionate effect on the overall χ^{2} test statistic, irrespective of the values in other cells

There is a logical basis for the group classification.
As these assumptions are valid in this study, Egbert can proceed with determining χ^{2}_{calc}.
An indicator of the validity of the null hypothesis would be the total differences between the observed (O) and expected (E) values in each category. However, if we simply added up each of these values, the overall result would be zero. This is because half the differences will be positive and the other half negative. To overcome this they are squared first. The χ^{2} statistic for each category is then derived by dividing this figure by E. Therefore (O−E)^{2}/E represents the test statistic for each category. The overall test statistic (χ^{2}_{calc}) is the sum of these separate category test statistics. This can be represented by the equation:
Egbert therefore determines the χ^{2} value for each category and derives the overall χ^{2} statistic (table 6).
Key point 2
The order of the categories has no effect on the value of χ^{2}. Only the size of the differences between the categories is important.
When carrying out this calculation by hand there are two checks to ensure a simple mistake has not been made:

The sum of the expected frequencies for all the categories should equal the total observed frequencies for all the categories (that is, 60 in this case).

In each column, the sum of all observed frequencies minus the sum of all expected frequencies must equal zero.
6 COMPARE THE CALCULATED TEST STATISTIC WITH THE CRITICAL VALUE
The calculated value of 43.43 lies beyond the critical value of 7.82. It therefore falls outside the area of accepting the null hypothesis.
7 EXPRESS THE CHANCES THAT THE NULL HYPOTHESIS IS IN KEEPING WITH THE DATA
As described previously, the p value is the probability of getting a difference equal or greater than that found in the study (that is, 43.43) if the null hypothesis was correct.^{2}
In contrast with the t and z statistics, only the right side of the χ distribution is used. This is because only large values of χ^{2} can reject the null hypothesis.
Key point 3
In other words χ^{2} tests are always one sided.^{3}
From the χ^{2} tables, it can be seen that the size of the tail from 43.43 to the right tip is less than 0.001. In other words, there is less than a 0.1% chance that a difference with a magnitude of 43.43, or larger, could have resulted if the null hypothesis was valid. Therefore the p value is <0.001.
When presenting the results it is important that they are interpreted in the light of the data. Consequently Egbert concludes that there is a significant difference between the types of complaints made by Deathstar's Emergency Department attendees and those found nationally, χ^{2} = 43.43, df 3, p <0.001. The difference is most marked in the complaints made regarding misdiagnoses (greater than expected) and waiting times (less than expected).
Box 2 Summary for calculating the p value using the χ^{2} test

Record the observed category frequencies from the data (O)

Calculate the expected values (E) for all category frequencies if the null hypothesis was true by E = row total × column total/grand total

For each cell calculate [O−E]^{2}/E

Add these values to obtain the test statistic χ^{2} where

Using tables of the χ^{2} distribution, determine the p value for the null hypothesis using the test statistic value and appropriate degree of freedom
Key point 4
When presenting the results of a χ^{2} analysis, the χ^{2} value, degree of freedom and p value should all be given along with an interpretation in the light of the data.
Comparing the distribution between two groups
A frequent application of the χ^{2} test in published work is comparing the distribution of proportions in two groups. To see how this works, consider the following example. Endora Lonely, an Emergency Physician at St Heartsinc is concerned that Egbert is not getting out enough. She therefore invites him to a meal at her flat. During the evening he talks about his interesting study regarding the emergency department's complaints. He wonders if they are similar to those at St Heartsinc. Amazed by this request she cancels the planned weekend away and, reluctantly, agrees to help the proposed study. As they will be comparing two unpaired groups of nominal data, they decide to use the χ^{2} test.
1 STATE THE NULL HYPOTHESIS AND THE ALTERNATIVE HYPOTHESIS OF THE STUDY
Having considered the problem, they write down the null hypothesis as:
“There is no significant difference between the type of complaints made by patients attending the Emergency Departments of Deathstar General and St Heartsinc”.
The alternative hypothesis is the logical opposite of this—that is:
“There is a significant difference between the complaints made by patients attending the Emergency Departments of Deathstar General and St Heartsinc”.
2 SELECT THE LEVEL OF SIGNIFICANCE
Following convention they pick a significance level of 0.05. They now need to determine the critical value for the χ^{2} statistic that demarcates this point (χ^{2}_{crit}). As described in the previous example, if the null hypothesis is correct, the distribution of complaints found in the two departments should be the same. However, there is bound to be some difference between them due to random variation but this has a χ^{2} distribution. This distribution can therefore be used to find the value of χ^{2}_{crit}.
3 ESTABLISH THE CRITICAL VALUES
With this type of comparison, the degrees of freedom is (r−1) × (c−1) where r and c are the number of rows and columns in the contingency table (table 7).
In this study, they calculate the degrees of freedom to be (2−1) × (4−1) = 3. The χ^{2} statistics tables indicate that this gives a χ^{2}_{crit} value of 7.82 or greater at the 5% level. In other words, 7.82 separates the left 95% area of acceptance of the null hypothesis from the right 5% area of rejection.
4 SELECT THE SAMPLE
Endora now checks the records from St Heartsinc to find out the type of complaints made by the emergency department patients (table 8).
5 CHOOSE AND CALCULATE THE TEST STATISTIC
If the null hypothesis applies there should not be a difference between the two departments. Consequently the expected number of complaints in each category (E) should be the same as those actually observed (O). The best estimate we have for the expected values comes from combining the values for two groups in each category so that an overall proportion can be calculated.
For example, the overall proportion of complaints attributable to a misdiagnosis is 45/100 (45%). Therefore if the null hypothesis applied, 45% of patients attending the Emergency Department of Deathstar General would also make this type of complaint. The expected number would therefore be:
Similarly the expected number of misdiagnosis complaints in St Heartsinc is:
Carrying out the same process for each of the other categories, they draw up a table of expected numbers of complaints if the null hypothesis applied (table 9). Again these data fulfil the assumptions made in using the χ^{2} test.
As the overall χ^{2} statistic is:
6 COMPARE THE CALCULATED TEST STATISTIC WITH THE CRITICAL VALUE
This calculated value is less than the critical value of 7.82. It therefore falls within the area of accepting the null hypothesis.
7 EXPRESS THE CHANCES THAT THE NULL HYPOTHESIS IS IN KEEPING WITH THE DATA
From the χ^{2} tables, it can be seen that the size of the tail from 3.84 to the right tip is greater than 0.05. Consequently Endora and Egbert conclude that there is no significant difference between the types of complaints made by patients attending the two emergency departments, χ^{2} = 3.84, df 4, p >0.05.
Association between variables
In the previous example we have seen how the χ^{2} test can be used to determine if the observed numbers in each category of the table differ from those expected if the null hypothesis was valid. The same process can be used to identify an association between the column and row variables. This is particularly useful when dealing with nominal data.
Egbert's next study demonstrates this. Having seen a number of life threatening complications resulting from central line insertion, Egbert is a keen supporter of the femoral approach. To try and convince his colleges at Deathstar General he carries out a retrospective study to see if there is any association between type of approach and life threatening complications.
1 STATE THE NULL HYPOTHESIS AND THE ALTERNATIVE HYPOTHESIS OF THE STUDY
The null hypothesis for this study is:
“The incidence of life threatening complications following central line insertion is independent of the approach”.
Again the alternative hypothesis is the logical opposite of this—that is:
“The incidence of life threatening complications following central line insertion is dependent on the approach”.
2 SELECT THE LEVEL OF SIGNIFICANCE
A significance level of 0.05 is chosen. If the null hypothesis was valid, the difference between the approaches would have a χ^{2} distribution. This distribution can therefore be used to find the value of χ^{2}_{crit} that corresponds to a significance level of 0.05.
3 ESTABLISH THE CRITICAL VALUES
With this type of comparison, the degrees of freedom are (r−1) × (c−1) where r and c are, respectively, the number of rows and columns in the contingency table (table 10).
In this study, the degrees of freedom are (2−1) × (3×1) = 2. The χ^{2} statistics tables indicate that this gives a χ^{2}_{crit} value of 5.99 or greater at the 5% level.
4 SELECT THE SAMPLE
From the emergency department records, Egbert determines the incidence of life threatening complications resulting from central line insertion in the past year (table 11).
5 CHOOSE AND CALCULATE THE TEST STATISTIC
The overall test statistic is calculated in the same way as described in the previous examples. Egbert therefore calculates the expected values for each category if the null hypothesis applied (table 12). Again these data fulfil the assumptions made in using the χ^{2} test.
As the overall χ^{2} statistic is therefore:
6 COMPARE THE CALCULATED TEST STATISTIC WITH THE CRITICAL VALUE
This calculated value is less than the critical value of 5.99. It therefore falls within the area of accepting the null hypothesis.
7 EXPRESS THE CHANCES THAT THE NULL HYPOTHESIS IS IN KEEPING WITH THE DATA
Egbert concludes that there is no association between life threatening complications and the type of central line approach used, χ^{2} = 2.12, df 2, p >0.05.
Dealing with small samples
The χ^{2} test statistic complies with the χ^{2} distribution provided the expected values are large enough. When dealing with small samples we can no longer make this assumption. A way of dealing with this problem is to merge categories so that the expected number in each is greater than 5. Obviously the type categories combined needs to be logical.
Another way of tackling this problem is to use Fisher's exact test.
FISHER'S EXACT TEST
The study by Rogers et al study demonstrates the use of this test. They investigated neurogenic pulmonary oedema in patients with low and high intracranial pressure (ICP).^{4} Part of the study involved determining if the patients also had normal or abnormal Pao_{2}/Fio_{2} ratios. The results were tabulated (table 13).
The data were not normally distributed and there are frequencies less than 5 in two of the cells. Analysis of the data was therefore carried out using Fisher's exact test. This produced a p value <0.001. Consequently there is less than 1 in 1000 chance that a difference this big, or greater, between these groups is attributable to chance. The null hypothesis was therefore rejected and it was concluded that patients with high ICP values are more likely to have abnormal Pao_{2}/Fio_{2} ratios compared with patients low ICP.
For those who are interested, Bland describes the method of calculating this test statistic.^{5} This is rarely necessary because it is typically worked out using a computer program. More commonly though we will come across this test when reading published articles. In these cases it is important to bear in mind the following points:

It determines the probability that the null hypothesis is correct—that is, both groups have the same proportion of outcome/conditions.

It is used when the number in one or more of the categories of the contingency table are less than 5.

It assumes the data are unpaired and only frequency data (not proportions) are used.
Box 3 Statistical trivia
Ronald Fisher was born in London 1870. He received a BA in astronomy from Cambridge in 1912 but in 1919 went to work at the Rothamsted Agricultural Experiment Station where he worked as a biologist and made many contributions to both statistics and genetics. He is quoted as saying, “To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of”.
YATES'S CORRECTION FOR CONTINUITY
The distribution of χ^{2} is continuous, yet nominal data are not. This can give rise to the false rejection of the null hypothesis in 2×2 tables unless adjustments are made. The correction entails the following alteration:
[(Observed frequency − expected frequency) − 0.5]^{2}/expected frequency
This has the effect of reducing the χ^{2} value and, consequently, decreasing the p value.
Though there is no precise rule when the Yates's correction should be applied, Altman recommends it is always used when dealing with 2×2 tables.^{6} In that way it has its biggest effect where the risk of bias is highest.
Key points 5

In 2×2 tables with small frequencies, Yates's correction can have marked effected on the p value.

This correction is not needed for larger tables or when the χ^{2} does not reach statistical significance.
Summary
There are many types of nonparametric test and several are frequently used in the medical journals. In trying to choose the correct test it is important to bear in mind the type of data you are dealing with and the nature of the question.
The χ^{2} test is a versatile and commonly used method for comparing distribution and looking for associations between groups of unpaired nominal data. It does however have its limitations, particularly when dealing with small samples. In these cases Yates's correction may need to be included, or the Fisher's exact test carried out instead, depending upon the data.
Quiz

Complete table 14.

Harrison et al studied the association between head injuries and five types of facial injury sustained by cyclists.^{7} How many degrees of freedom (df) would there be in calculating the χ^{2} statistic?

Egbert takes time off to coordinate a defibrillation course for personnel from St Heartsinc and Deathstar General. Having dispatched several instructors to hospital with third degree burns, he concludes that several of the doctors appear to be dangerous (table 15). Determine if there is a difference between the hospitals using a 5% level for the null hypothesis.

De Vos et al studied the impact of gender on do not attempt resuscitation (DNAR) orders in hospital (table 16).^{8} Determine if there is there an association between these variables using a 5% level for the null hypothesis.

One for you to try on your own. Stancin et al conducted a study on the acute psychosocial impact of paediatric orthopaedic trauma victims.^{9} They wanted to compare patients with (group 1) and without (group 2) an accompanying head injury. There were 80 patients in group 1 and 28 in group 2. To determine compatibility between the groups they studied whether the patient was white or not. The results showed there was 46 white patients in group 1 and 24 in group 2. Is there an association between head injury and race?
Answers

See table 1.

4
df = (number of rows − 1) × (number of columns − 1)
Therefore df = (5−1) × (2×1) = 4

There is no significant difference between the two hospitals; χ^{2} = 0.23, df 1, p >0.05.

Using the χ^{2} test the expected frequencies for each cell need to be calculated (table 17).
For each cell, the [observed frequency − expected frequency]^{2}/expected frequency ([O−E]^{2}/E ) now needs to be calculated and added together (table 18):
The test statistic is the sum of all the ([O−E]^{2}/E ) = 0.305. The degrees of freedom are (number of rows − 1) × (number of columns − 1) = 1. Using the χ^{2} distribution tables for this degree of freedom shows that the χ^{2} value is well above the 5% probability for the null hypothesis being valid. It was therefore concluded that there was no association between DNAR resuscitation orders and gender.
Further reading

Koosis D. A test of distributions. In: Statistics—a self teaching guide. 4th ed. New York: Wiley, 1997:209–29.

Stephens L. χ^{2} procedures. In: Theory and problems of beginning statistics. Schaum's outline series. New York: McGrawHill, 1998:249–71.

Swincow T. The χ^{2}tests. Statistics from square one. London: BMJ, 1983:43–3.
Acknowledgments
The authors would like to thank Jim Wardrope for his help and support.
Request Permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.