I wasn’t satisfied with the Shiny App that I made for Portfolio 2 and I felt that the dataset that I used didn’t offer anything important, so for Portfolio 3, I decided to switch things up and use a different dataset from

I’m currently working for the Minnesota Twins as a Business Strategy and Sales Operations intern, so in this portfolio, I’m going to do some visualizations of the hitting stats of one of the Twins hitters. The player that I chose is Byron Buxton, the center fielder of the Twins, since he’s currently one of the more recognizable on the Twins and he had a good start last season before suffering from injuries. The dataset can be downloaded from here and you must download the CSV file in order for the code to run properly:

In the 1st static visualization, I did a spray chart of all of the hits that Buxton had for 2021. I downloaded the dataset and drew a strike zone with the x and z coordinates. I then changed the pitch_type variable to the pitch descriptions. I then had 2 labels: “Feet Above Homeplate” for the y-coordinate, “Feet From Homeplate” for the x-coordinate. The 1st scatterplot contained all of the pitches that Buxton faced in 2021 and the 2nd scatterplot contained the pitches that Buxton got a hit off of. Based on the 2nd scatterplot, Buxton has great power hitting inside pitchers especially high and inside fastballs and low and inside sliders.

Buxton <- read_csv("")
##Drawing The Strike Zone
x <- c(-.95,.95,.95,-.95,-.95)
z <- c(1.6,1.6,3.5,3.5,1.6)

#store in dataframe
sz <- data.frame(x,z)

##Changing Pitch Names
pitch_desc <- Buxton$pitch_type

##Changing Pitch Names
pitch_desc[which(pitch_desc=='CH')] <- "Changeup"
pitch_desc[which(pitch_desc=='CU')] <- "Curveball"
pitch_desc[which(pitch_desc=='FC')] <- "Cutter"
pitch_desc[which(pitch_desc=='FF')] <- "Four seam"
pitch_desc[which(pitch_desc=='FS')] <- "Split Flinger"
pitch_desc[which(pitch_desc=='FT')] <- "Two-Seam"
pitch_desc[which(pitch_desc=='KC')] <- "Kuckle-Curve"
pitch_desc[which(pitch_desc=='SI')] <- "Sinker"
pitch_desc[which(pitch_desc=='SL')] <- "Slider"
## Warning: package 'viridis' was built under R version 4.1.3
## Loading required package: viridisLite
ggplot() +
  geom_path(data = sz, aes(x=x, y=z)) +
  coord_equal() +
  geom_point(data = Buxton, aes(x = plate_x, y = plate_z, size = release_speed, color = pitch_desc)) +
  scale_size(range = c(-1.0,2.5))+
  scale_color_viridis(discrete = TRUE, option = "C") +
  labs(size = "Speed",
       color = "Pitch Type",
       title = "Byron Buxton - Pitch Chart") +
  ylab("Feet Above Homeplate") +
  xlab("Feet From Homeplate") +
        plot.subtitle=element_text(face="plain", hjust= -.015, vjust= .09, colour="#3C3C3C", size = 12)) +
  theme(axis.text.x=element_text(vjust = .5, size=11,colour="#535353",face="bold")) +
  theme(axis.text.y=element_text(size=11,colour="#535353",face="bold")) +
  theme(axis.title.y=element_text(size=11,colour="#535353",face="bold",vjust=1.5)) +
  theme(axis.title.x=element_text(size=11,colour="#535353",face="bold",vjust=0)) +
  theme(panel.grid.major.y = element_line(color = "#bad2d4", size = .5)) +
  theme(panel.grid.major.x = element_line(color = "#bdd2d4", size = .5)) +
  theme(panel.background = element_rect(fill = "white")) 

hits <- Buxton %>%
  filter(events %in% c("single", "double", "triple", "home run"))
In this 2nd static visualization, I did a silhouette statistic plot based on the lecture notes and I used that to measure 2 variables: release_pos_x (horizontal Release Position of the ball measured in feet from the catcher’s perspective), and release_pos_z(Vertical Release Position of the ball measured in feet from the catcher’s perspective). I used k=5 for the k-mean values.

I also visualize the histogram of silhouette statistics within each cluster. The silhouette statistics for cluster 3 are generally higher than those for 1, 2, and 4 clusters, we can conclude that it is well-defined. Cluster 5 is right behind cluster 3 in terms of being well defined.

Buxton <- read_csv("savant_data.csv") %>%
  mutate(id = row_number())
cluster_Buxton <- function(penguins, K) {
  x <- Buxton %>%
    select(matches("release_speed|release_pos_x|release_pos_z")) %>%
  kmeans(x, center = K) %>%
    augment(Buxton) %>% # creates column ".cluster" with cluster label
    mutate(silhouette = silhouette(as.integer(.cluster), dist(x))[, "sil_width"])

cur_id <- 2
Buxton5 <- cluster_Buxton(Buxton, K = 5)
obs_i <- Buxton5 %>%
  filter(id == cur_id)
ggplot(Buxton5, aes(x = release_pos_x, y = release_pos_z, col = .cluster, size = silhouette)) +
  geom_point(data = obs_i, size = 5, col = "black") + 
  geom_point() +
  scale_color_brewer(palette = "Set2") +
  scale_size(range = c(5, 1))

ggplot(Buxton5) +
  geom_histogram(aes(x = silhouette), binwidth = 0.05) +
  theme(axis.text = element_text(size = 12)) +
  theme(axis.title = element_text(size = 20)) + 
  facet_grid(~ .cluster)