Reflection 3 – dcs 204

In lab 3, we mainly try to analyze donations to the Bates Fund for every class from 2020 to 1940. And in order to understand the main characteristics in our dataset, we use the exploratory data analysis (EDA) method to conduct our research. We first calculate the mean, max, min, range and standard deviation of the data set. After that, we want to understand the distribution of total donation more intuitively, so we visualize the data by plotting a histogram. From this data visualization, we can see that most of bars with high frequency lie at the left side of bucket which indicates less amount of donations and the large amount of donations is very rare. Overall, the distribution of total donations from 1940 to 2020 is skewed to the right which makes the median smaller than mean of the data set.

For the visualization decisions, we adjust the settings of this histogram by changing the binning and range of x-axis. In order to compare with the histogram of goal donations we plot later, we set the max of x-axis the same as the later one. Secondly, we adjust the binning of the histogram so that it looks better than the original settings and which corresponds to the histogram of goal donations.

Overall, I think histogram is a very useful visualization when you conducting exploratory data analysis. Hope to learn more tools of data analysis in the coming weeks!

Leave a comment Cancel reply