In class, we use the dataset of covid daily new cases and climate indicators-average temperature and try to test our hypothesis of the linear relationship between Covid-daily cases and averge temperature.
First, we want to upload and read the original dataset that we want to analysis. And to make sure we get the correct dataset, we would use the head() function to get a overview of the dataset.(especially when your dataset is extremely large!) Then We want to visualize the dataframe so that we can intuitively see the relationship between two varibles. Therefore, we use the plot() function to create a scatter plot including the independent variable average temperature and the dependent variable daily new Covid case.
From the plot, we can observe that the observations in two variables seems not to have a clear linear relationship. In order to test our hypothesis, we want to calculate the correlation coefficient squared(R^2) between the average temperature and daily cases so that we employ the the cor() function and times 2. We got a result around 0.105. This result indicates that our linear model fit the data pretty bad. And only 10.5% of the variation in average temperature could be explained by the independent variable new cases.