Chi-squared Test

What is the Chi-squared Test?

The Chi-squared test is a statistical hypothesis test used to determine whether there is a significant association between two categorical variables in a sample. It is based on comparing the observed frequencies in a contingency table with the expected frequencies that would occur if the variables were independent. The Chi-squared test is commonly used for feature selection in machine learning, as it can help identify the most relevant features for a given classification task.

Example of using the Chi-squared Test in Python

Here’s a simple example of performing a Chi-squared test using the scipy library in Python:

import numpy as np
from scipy.stats import chi2_contingency

# Sample contingency table
observed = np.array([[10, 20, 30], [20, 30, 20]])

# Perform the Chi-squared test
chi2, p_value, dof, expected = chi2_contingency(observed)

print("Chi-squared statistic:", chi2)
print("P-value:", p_value)
print("Degrees of freedom:", dof)
print("Expected frequencies:", expected)

This example demonstrates how to use the chi2_contingency function from the scipy.stats module to perform a Chi-squared test on a sample contingency table.

Additional resources on the Chi-squared Test: