Libraries needed

library(tidyverse)
library(knitr)
library(readxl)
library(zoo)

Data

data = read.csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
popdata = read_xls('../data/PopulationEstimates.xls', skip = 2)

Question 1

Steps

1. See Data tab

2. Making a California Subset

caldata <- data %>% 
  filter(state=='California') %>%
  group_by(date) %>% 
  mutate(newcases = cases - lag(cases)) %>% 
  ungroup()

Simple, easy four lines of code to understand. Simply put, I took the data, filtered to California, grouped each county by date, created the new cases variable, and ungrouped the whole thing for later analysis.

3. Generating Two tables

table1 <- caldata %>% 
  group_by(county) %>% 
  summarise(cases=sum(cases)) %>% 
  arrange(-cases) %>% 
  head(5)

table2 <- caldata %>% 
  group_by(county) %>% 
  summarise(newcases=sum(newcases)) %>% 
  arrange(-newcases) %>% 
  head(5)

tables<-kable(table1, caption = 'Top 5 Cases Counts by County', col.names = c("County", "Cases"))

tables

Top 5 Cases Counts by County
County	Cases
Los Angeles	24790654
Riverside	5010717
Orange	4599288
San Bernardino	4265567
San Diego	3939250

4. & 5. See data tab

Here we needed the first two rows to be skipped as the developer included two “title” rows which made the inputting of the data in R weird.

6. Exploring the Data

Using these basic functions we can determine which fields we want to join. For example, State is the same for both my caldata and popdata sets. Similarly, but not exactly the same, both data sets have a FIPS code, popdata refers to it as FIPStxt while caldata refers to it as just FIPS, and is only a 4 digit code when necessary (the 0 in some of the FIPS have been removed in the caldata set). We also know that there are 3,273 entries and 165 different variables to describe each entry.

Geography 176A

Lab 02: COVID-19 Pandemic