CONTINGENCY TABLES

June 18, 2020

CONTINGENCY TABLES

Today, I learned about contingency tables and their use in statistics and also the different types of table that are constructed.

To begin with, let's know what a contingency table is :

It is a table that summarises large data sets.

Let us consider the following table:

Gender	Smokers	Non-Smokers	Total
Male	72	44	116
Female	34	53	87
Total	106	97	203

So. this table summarises an otherwise huge data set describing the relation between smokers and non- smokers and their genders.

RELATIVE FREQUENCY CONTINGENCY TABLE

I will refer to Contingency Table as CT from now on.

This is another form of CT wherein % value for each cell is recorded in a table.

Lets consider the values of our previous table:

% value for cell x (x can be 1, 2, 3, 4....)= {(count value in cell x)/(Total no. surveyed)} * 100

Thus, the values in cells would be :

Cell 1 : 72/203*100 = 35.47%

cell 2 : 44/203*100 = 21.67%

Similarily, after finding for other cells, I reconstructed the table :

Gender	Smoker	Non-smoker	Total
Male	35.47%	21.67%	57.14%
Female	16.75%	26.11%	42.86%
Total	52.22%	47.78%	100%

Here, in this table, 100% actually represents the 203 people that have been surveyed.

To summarise, Relative Frequency CT actually consists of percentages in cells.

JOINT DISTRIBUTION

It is the distribution of 2 variables and their corresponding percentages for our two recorded categorical variables.

The two variables we have considered in our example is: Gender and Smoking status

To calculate the joint distribution table, we find out percentage corresponding to Male smoker, Male non-smoker, Female smoker, Female non-smoker.

Let us reconsider our previous relative frequency ct.

Gender	Smoker	Non-smoker	Total
Male	35.47%	21.67%	57.14%
Female	16.75%	26.11%	42.86%
Total	52.22%	47.78%	100%

The highlighted values represent the joint distribution for Male smoker, Male non-smoker, Female smoker, Female non-smoker.

Adding all the 4 values highlighted in blue will give 100 as the result.

This is because of only these 4 categories i.e Male smoker, Male non-smoker, Female smoker, Female non-smoker. are possible in the sample survey that we have conducted.

MARGINAL DISTRIBUTION

When we find the distribution of our 2 categorical variables individually, then these distributions are referred to as marginal distributions.

For example, the single distribution of smoking status and gender found from their joint distributions are called marginal distributions of smoking status and the marginal distribution of gender.

Considering the same RFCT (relative frequency CT) again,

Gender	Smoker	Non-smoker	Total
Male	35.47%	21.67%	57.14%
Female	16.75%	26.11%	42.86%
Total	52.22%	47.78%	100%

Marginal distribution od smokers

Marginal distribution of gender

Adding any two coloured values (either purple together, or yellow together) will give a 100 again.

CONDITIONAL DISTRIBUTION

It is the distribution of one event, given that another event has occurred.

we divide the cell value by the total value of that row to find out the conditional dostribution

(For gender)

cell 1: 72/116*100= 62.07%

cell 2: 44/116*100= 37.93%

Similarily, computing other cells, i formed the table for conditional distribution of gender :

Gender	Smoker	Non-smoker	Total
Male	62.07%	37.93%	100%
Female	39.08%	60.92%	100%

Similarily, computing other cells, i formed the table for conditional distribution of smoking status:

Gender	Smoker	Non-smoker
Male	67.92%	45.367%
Female	32.08%	54.64%
Total	100%	100%

I hope what i learned today was helpful to you too.

I am also inserting the video from youtube where i learned these concepts.

Search This Blog

Contingency tables

CONTINGENCY TABLES

Comments

Post a Comment