CONTINGENCY TABLES
June 18, 2020
CONTINGENCY TABLES
Today, I learned about contingency tables and their use in statistics and also the different types of table that are constructed.
To begin with, let's know what a contingency table is :
It is a table that summarises large data sets.
Let us consider the following table:
| Gender | Smokers | Non-Smokers | Total |
| Male | 72 | 44 | 116 |
| Female | 34 | 53 | 87 |
| Total | 106 | 97 | 203 |
So. this table summarises an otherwise huge data set describing the relation between smokers and non- smokers and their genders.
RELATIVE FREQUENCY CONTINGENCY TABLE
I will refer to Contingency Table as CT from now on.
This is another form of CT wherein % value for each cell is recorded in a table.
Lets consider the values of our previous table:
% value for cell x (x can be 1, 2, 3, 4....)= {(count value in cell x)/(Total no. surveyed)} * 100
Thus, the values in cells would be :
Cell 1 : 72/203*100 = 35.47%
cell 2 : 44/203*100 = 21.67%
Similarily, after finding for other cells, I reconstructed the table :
| Gender | Smoker | Non-smoker | Total |
| Male | 35.47% | 21.67% | 57.14% |
| Female | 16.75% | 26.11% | 42.86% |
| Total | 52.22% | 47.78% | 100% |
Here, in this table, 100% actually represents the 203 people that have been surveyed.
To summarise, Relative Frequency CT actually consists of percentages in cells.
JOINT DISTRIBUTION
It is the distribution of 2 variables and their corresponding percentages for our two recorded categorical variables.
The two variables we have considered in our example is: Gender and Smoking status
To calculate the joint distribution table, we find out percentage corresponding to Male smoker, Male non-smoker, Female smoker, Female non-smoker.
Let us reconsider our previous relative frequency ct.
| Gender | Smoker | Non-smoker | Total |
| Male | 35.47% | 21.67% | 57.14% |
| Female | 16.75% | 26.11% | 42.86% |
| Total | 52.22% | 47.78% | 100% |
The highlighted values represent the joint distribution for Male smoker, Male non-smoker, Female smoker, Female non-smoker.
Adding all the 4 values highlighted in blue will give 100 as the result.
This is because of only these 4 categories i.e Male smoker, Male non-smoker, Female smoker, Female non-smoker. are possible in the sample survey that we have conducted.
MARGINAL DISTRIBUTION
When we find the distribution of our 2 categorical variables individually, then these distributions are referred to as marginal distributions.
For example, the single distribution of smoking status and gender found from their joint distributions are called marginal distributions of smoking status and the marginal distribution of gender.
Considering the same RFCT (relative frequency CT) again,
| Gender | Smoker | Non-smoker | Total |
| Male | 35.47% | 21.67% | 57.14% |
| Female | 16.75% | 26.11% | 42.86% |
| Total | 52.22% | 47.78% | 100% |
Marginal distribution od smokers
Marginal distribution of gender
Adding any two coloured values (either purple together, or yellow together) will give a 100 again.
CONDITIONAL DISTRIBUTION
It is the distribution of one event, given that another event has occurred.
we divide the cell value by the total value of that row to find out the conditional dostribution
(For gender)
cell 1: 72/116*100= 62.07%
cell 2: 44/116*100= 37.93%
Similarily, computing other cells, i formed the table for conditional distribution of gender :
| Gender | Smoker | Non-smoker | Total |
| Male | 62.07% | 37.93% | 100% |
| Female | 39.08% | 60.92% | 100% |
Similarily, computing other cells, i formed the table for conditional distribution of smoking status:
| Gender | Smoker | Non-smoker |
| Male | 67.92% | 45.367% |
| Female | 32.08% | 54.64% |
| Total | 100% | 100% |
I hope what i learned today was helpful to you too.
I am also inserting the video from youtube where i learned these concepts.
This will surely help me to learn stats
ReplyDeletehehehe i hope so
DeleteClass 11 ki yaad aa gyi. Very well explained
ReplyDeletethanks man!
Delete