In order to construct a chi-squared tests, we first need to pick up two columns in the original dataframe and analyze it by spliting them into two categories.
For the categorical variables, we choose to use the profession of donors and their household values recorded by the 1850 census. And the hypothesis test we try to construct is whether the profession of donors is farmers is related to the household values greater than $2000 or not. In order to compare these two categorical variables, we can first build a 2*2 matrix and record each donors’ data so that we can see the total count of each categorical variables.
# Creates an empty matrix with 2 rows and 2 columns
table_Profession_HouseholdValue <- matrix(nrow = 2, ncol = 2)
# Names the rows "value>=2000" and "value<2000"
row.names(table_Profession_HouseholdValue) <- c("Value>=2000", "value<2000")
# Names the columns "pro_f" and "pro_nf."
colnames(table_Profession_HouseholdValue) <- c("pro_f", "pro_nf")
table_Profession_HouseholdValue
#Create categorical variables and store initial value 0.
frich<-0
nfrich<-0
fnrich<-0
nfnrich<-0

And the table above is shown the initial status of our matrix.
After that, we want to use a if-else statement within a for-loop command run through the whole dataset of donors’ bio so that we can differentiate whether they are famers or not and whether their households have a value greater than $2000. And we can store the total numbers of each categorical variables by adding 1 when we find the corresponding donor.
for (i in 1:nrow(data)){
if (data$Census_Profession[i] =="farmer" && data$Census_Household_Value[i]>=2000){
#add 1 on frich if it meets the conditions
frich <-frich+1
#checking if the donor is farmer and household value less 2000 }
else if(data$Census_Profession[i] =="farmer" && data$Census_Household_Value[i]<2000){
#add 1 on fnrich if it meets the conditions
fnrich<-fnrich+1
#checking if the gender is not farmer and value greater than 2000 in every row }
else if(data$Census_Profession[i]!="farmer" && data$Census_Household_Value[i]>=2000){
#add 1 on nfrich if it meets the conditions
nfrich<-nfrich+1
}else {
#add 1 on milliterate if it meets the conditions
nfnrich<-nfnrich+1
}
}
After we finish the for-loop, we can now check out the categorical values and store them into the empty matrix we created before.
frich fnrich nfrich nfnrich #Store the values into the apporpriate cells of the matrix table_Profession_HouseholdValue[1,1]<-frich table_Profession_HouseholdValue[1,2]<-fnrich table_Profession_HouseholdValue[2,1]<-nfrich table_Profession_HouseholdValue[2,2]<-nfnrich table_Profession_HouseholdValue

Then we can run the chi-squared test of between profession and the household values.

Since the P-value we got is 1 which is much larger than the significance level of 0.05, so we fail to reject th null hypothesis that the whether donor’s profession is farmer is not related to their household values greater $2000 or not.