Unit_05_e-Portfolio_ActivityJaccard_Coefficient

The Jaccard coefficient, also known as the Jaccard similarity index, is a statistic used for comparing the similarity and diversity of sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets:

$$ J(A,B) = \frac{|A \cap B|}{|A \cup B|} $$

To calculate the Jaccard coefficient for the given pairs, we'll compare the test results ('Fever', 'Cough', 'Test-1', 'Test-2', 'Test-3', 'Test-4') for each individual. We'll consider 'Y' (Yes), 'P' (Positive), and 'A' (Abnormal) as positive indicators, and 'N' (No) and 'N' (Negative) as negative indicators.

For each pair, we'll count the number of tests where both individuals have the same result (either both positive or both negative) as the intersection, and the number of tests where they have different results or both have positive results as the union.

Let's calculate the Jaccard coefficient for each pair:

(Jack, Mary):
- Intersection (same results): Fever (Y), Cough (N)
- Union (different or same positive results): Fever (Y), Cough (N), Test-1 (P), Test-2, Test-3 (P), Test-4 (A)
- Jaccard coefficient = Size of Intersection / Size of Union
(Jack, Jim):
- Intersection (same results): Fever (Y), Test-2 (N), Test-3 (N), Test-4 (A)
- Union (different or same positive results): Fever (Y), Cough, Test-1, Test-2 (N), Test-3 (N), Test-4 (A)
- Jaccard coefficient = Size of Intersection / Size of Union
(Jim, Mary):
- Intersection (same results): Fever (Y)
- Union (different or same positive results): Fever (Y), Cough, Test-1, Test-2, Test-3 (P), Test-4 (A)
- Jaccard coefficient = Size of Intersection / Size of Union

Let's calculate these values.

The Jaccard coefficients for the given pairs are:

(Jack, Mary): 0.6
(Jack, Jim): 1.0
(Jim, Mary): 0.167 (approximately)

In [1]:

# Define the test results for each individual
jack = ['Y', 'N', 'P', 'N', 'N', 'A']
mary = ['Y', 'N', 'P', 'A', 'P', 'N']
jim = ['Y', 'P', 'N', 'N', 'N', 'A']

# Function to calculate Jaccard coefficient
def jaccard_coefficient(individual1, individual2):
    intersection = sum(1 for x, y in zip(individual1, individual2) if x == y)
    union = sum(1 for x, y in zip(individual1, individual2) if x != y or (x in ['Y', 'P', 'A'] and y in ['Y', 'P', 'A']))
    return intersection / union

# Calculate Jaccard coefficient for each pair
jack_mary = jaccard_coefficient(jack, mary)
jack_jim = jaccard_coefficient(jack, jim)
jim_mary = jaccard_coefficient(jim, mary)

jack_mary, jack_jim, jim_mary

Out[1]:

(0.6, 1.0, 0.16666666666666666)

Jaccard Coefficient Calculations¶