CONTINGENCY TABLES

June 18, 2020
CONTINGENCY TABLES

Today, I learned about contingency tables and their use in statistics and also the different types of table that are constructed.

To begin with, let's know what a contingency table is :

It is a table that summarises large data sets.

Let us consider the following table:
 Gender SmokersNon-Smokers Total
Male72 44 116 
 Female 3453 87 
 Total10697 

203 



So. this table summarises an otherwise huge data set describing the relation between smokers and non- smokers and their genders.

RELATIVE FREQUENCY CONTINGENCY TABLE
I will refer to Contingency Table as CT from now on. 

This is another form of CT wherein % value for each cell is recorded in a table. 
Lets consider the values of our previous table:
% value for cell x (x can be 1, 2, 3, 4....)=  {(count value in cell x)/(Total no. surveyed)} * 100
Thus, the values in cells would be :
Cell 1 : 72/203*100 = 35.47%
cell 2 : 44/203*100 = 21.67%
Similarily, after finding for other cells, I reconstructed the table :

 Gender
 SmokerNon-smoker Total 
 Male35.47% 21.67% 57.14% 
Female 16.75% 26.11% 42.86% 
 Total 52.22% 47.78% 100%

 
Here, in this table, 100% actually represents the 203 people that have been surveyed.
To summarise, Relative Frequency CT actually consists of percentages in cells.

JOINT DISTRIBUTION

It is the distribution of 2 variables and their corresponding percentages for our two recorded categorical variables.
The two variables we have considered in our example is: Gender and Smoking status

To calculate the joint distribution table, we find out percentage corresponding to Male smoker, Male non-smoker, Female smoker, Female non-smoker.
Let us reconsider our previous relative frequency ct.

 Gender
 SmokerNon-smoker Total 
 Male35.47% 21.67% 57.14% 
Female 16.75% 26.11% 42.86% 
 Total 52.22% 47.78% 100%

The highlighted values represent the joint distribution for Male smoker, Male non-smoker, Female smoker, Female non-smoker.
Adding all the 4 values highlighted in blue will give 100 as the result.
This is because of only these 4 categories i.e  Male smoker, Male non-smoker, Female smoker, Female non-smoker. are possible in the sample survey that we have conducted.

MARGINAL DISTRIBUTION

When we find the distribution of our 2 categorical variables individually, then these distributions are referred to as marginal distributions.
For example, the single distribution of smoking status and gender found from their joint distributions are called marginal distributions of smoking status and the marginal distribution of gender.

Considering the same RFCT (relative frequency CT) again, 

Gender
 SmokerNon-smoker Total 
 Male35.47% 21.67% 57.14% 
Female 16.75% 26.11% 42.86% 
 Total 52.22% 47.78% 100%

Marginal distribution od smokers

Marginal distribution of gender

Adding any two coloured values (either purple together, or yellow together) will give a 100 again.

CONDITIONAL DISTRIBUTION

It is the distribution of one event, given that another event has occurred.
we divide the cell value by the total value of that row to find out the conditional dostribution
(For gender)
cell 1: 72/116*100= 62.07%
cell 2: 44/116*100= 37.93%
Similarily, computing other cells, i formed the table for conditional distribution of gender :

 Gender
 SmokerNon-smoker Total 
 Male62.07%  37.93%100% 
Female 39.08% 60.92%100%

Similarily, computing other cells, i formed the table for conditional distribution of smoking status:

Gender
 SmokerNon-smoker 
 Male67.92% 45.367% 
Female 32.08% 54.64% 
 Total100%100%

I hope what i learned today was helpful to you too. 
I am also inserting the video from youtube where i learned these concepts.

Comments

Post a Comment