Geography 176A
Lab 02: COVID-19 Pandemic
Data
Question 1
Steps
1. See Data tab
2. Making a California Subset
caldata <- data %>%
filter(state=='California') %>%
group_by(date) %>%
mutate(newcases = cases - lag(cases)) %>%
ungroup()Simple, easy four lines of code to understand. Simply put, I took the data, filtered to California, grouped each county by date, created the new cases variable, and ungrouped the whole thing for later analysis.
3. Generating Two tables
table1 <- caldata %>%
group_by(county) %>%
summarise(cases=sum(cases)) %>%
arrange(-cases) %>%
head(5)
table2 <- caldata %>%
group_by(county) %>%
summarise(newcases=sum(newcases)) %>%
arrange(-newcases) %>%
head(5)
tables<-kable(table1, caption = 'Top 5 Cases Counts by County', col.names = c("County", "Cases"))
tables| County | Cases |
|---|---|
| Los Angeles | 24790654 |
| Riverside | 5010717 |
| Orange | 4599288 |
| San Bernardino | 4265567 |
| San Diego | 3939250 |
4. & 5. See data tab
Here we needed the first two rows to be skipped as the developer included two “title” rows which made the inputting of the data in R weird.
6. Exploring the Data
Using these basic functions we can determine which fields we want to join. For example, State is the same for both my caldata and popdata sets. Similarly, but not exactly the same, both data sets have a FIPS code, popdata refers to it as FIPStxt while caldata refers to it as just FIPS, and is only a 4 digit code when necessary (the 0 in some of the FIPS have been removed in the caldata set). We also know that there are 3,273 entries and 165 different variables to describe each entry.