Load dataset

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
joined_data= read.csv("C:/Users/27977/Downloads/433/joined_data.csv")

Introduction:

I choose the dataset containing the unemployment rate, gdp and education level of each state in the US. It combines different categories of data from multiple aspects of the US. This dataset is interesting because the information is comprehensive, crossing from education to unemployment rate to reflect the true aspect of the US from a data perspective. These data are recorded on a monthly basis from Dec, 2020 to Nov. 2021, thus given a time span of development through data visualization.

head(joined_data)
##   X year_mon      state Administered_Dose1_Pop_Pct Series_Complete_Pop_Pct
## 1 1 Dec 2020    Alabama                          0                       0
## 2 2 Dec 2020     Alaska                          0                       0
## 3 3 Dec 2020    Arizona                          0                       0
## 4 4 Dec 2020   Arkansas                          0                       0
## 5 5 Dec 2020 California                          0                       0
## 6 6 Dec 2020   Colorado                          0                       0
##   value Population_over_25 High_School_or_Higher_Population
## 1 50536            3360058                          2926985
## 2 77640             484058                           452968
## 3 58945            4944540                          4331542
## 4 47597            2036456                          1781463
## 5 75235           26937872                         22636359
## 6 72331            3974943                          3672723
##   High_School_or_Higher_Pct Bachelor_or_Higher_Population
## 1                    0.8711                        885357
## 2                    0.9358                        146157
## 3                    0.8760                       1492158
## 4                    0.8748                        475367
## 5                    0.8403                       9428484
## 6                    0.9240                       1695602
##   Bachelor_or_Higher_Pct Advanced_Population Advanced_Pct Nominal_GDP_2021
## 1                 0.2635              337382       0.1004           243555
## 2                 0.3019               56574       0.1169            54020
## 3                 0.3018              561120       0.1135           400156
## 4                 0.2334              168182       0.0826           143438
## 5                 0.3500             3538760       0.1314          3290170
## 6                 0.4266              637777       0.1604           416937
##   Nominal_GDP_2020 Annual_GDP_Change Annual_GDP_Change_Pct
## 1           229831             13724                0.0387
## 2            52864              1156                0.0334
## 3           377476             22680                0.0450
## 4           131818             11620                0.0416
## 5          3189703            100467                0.0149
## 6           400041             16896                0.0338
##   Real_GDP_Growth_Rate_Pct GDP_per_capita_2021 GDP_per_capita_2020
## 1                    0.068               48475               46875
## 2                    0.054               69336               72263
## 3                    0.074               55954               51865
## 4                    0.069               47629               43691
## 5                    0.063               83213               80727
## 6                    0.079               72212               69475
##   Pct_of_National_2021 Pct_of_National_2020 unemployment_rate
## 1               0.0107               0.0106               4.7
## 2               0.0024               0.0025               6.5
## 3               0.0178               0.0173               6.8
## 4               0.0062               0.0061               4.9
## 5               0.1477               0.1462               9.3
## 6               0.0186               0.0183               6.9

Create the first graph

new=joined_data %>%
  filter(state==("Wisconsin")|state==("California")|state==("New York"))%>%
  filter(grepl("2021", year_mon))
new%>%
  mutate(year_mon =ordered(year_mon, levels = unique(year_mon))) %>%
  ggplot(aes(x=year_mon, y=unemployment_rate, group = state,colour=state))+
  geom_line()+
  theme_classic()+
  xlab("Date")+
  ylab("Unemployment Rate")+
  ggtitle("Unemployment Ratio of Selected States in US")+
  theme(legend.title = element_text(size=4),legend.text=element_text(size=4))
## Warning: Removed 6 row(s) containing missing values (geom_path).

  # ggplot(aes(x=year_mon, y=unemployment_rate, group = state,colour=state))+
  # geom_line()+
  # theme_classic()+
  # xlab("Date")+
  # ylab("Unemployment Rate")+
  # ggtitle("Unemployment Ratio of Each State in US")+
  # theme(legend.title = element_text(size=4),legend.text=element_text(size=4))
  # scale_x_discrete(limits = c("Dec 2020","Jan 2021",))

The first graph plots the unemployment ratio of selected states in the US during the covid over time. The unemployment rate has risen to an unprecedented level because of the pandemic, causing more than just the finance problem all over the world. By making this plot of the unemployment ratio, we could easily observe when the unemployment problem arose during the pandemic, at what time it arrived at a peak, and how the policy might had an influence on the unemployment ratio. To differentiate each state visually and observe the continual change across time, I used geom_line with different colors to each state, then used scale_x_discrete to avoid overlap of the x label and used theme to reduce the size of the legend.

Create the second graph

joined_data %>%
  ggplot(aes(x=Bachelor_or_Higher_Pct, y=reorder(state,Bachelor_or_Higher_Pct),fill=state))+
  geom_col()+
  labs(title = "Bachelor or higher degree rate of each state in the US",x="rate",y="state")+
  theme(panel.grid = element_blank(),legend.title = element_text(size=4),legend.text=element_text(size=4),legend.position="none")

The second graph shows the rate of bachelor or higher degree of each state in the US. It is notable that Massachussts and Colorado have significantly higher rates of bachelor or higher degree compared to other states. To easily identify and compare the ratio of each state, I used geom_col with different colors for each state and sorted the ratio by using the reorder parameter, then used theme to customize the size of the legend.