My data set is titled “UFC Fights (2010-2020) with Betting Odds” and comes from Kaggle. I decided to choose this dataset because I am an avid UFC fan and participated in organized sparring in high school. Additionally, I thought it was topical with the rise of gambling as legalization occurs in more states. I was interested in seeing if there were patterns of winners based on betting odds, and where the fights were occurring to see if the sport was growing more internationally.
I first made a histogram of the frequency of the winners’ betting odds in title fights and pivoted based on weight class. From this, unsurprisingly, I found that the majority of winners are the favorites. However, I thought it would more heavily favor the favorite, and was surprised to see the number of upsets. I was able to see that heavyweight had a lot of upsets, and the lighter weight classes like featherweight had comparatively less upsets. In preparing the data, I mutated a column to get the odds of the winner, as there were only columns for both fighters’ odds and the result. I also mutated a column to determine whether the winner was the favorite or the underdog. I do not think gambling odds are very intuitive (with the favorite’s odds being negative), so I decided to make it clear by adding colors of whether the winner was the favorite or an underdog to help the viewer. I also added a vertical line at 0 so the cutoff between favorite and underdog would be clear. Finally, I increased the tick marks on the x axis to make it more clear what value was what.
I next decided to make a strip plot of the host location over the years. I had to mutate the date variable so that it would be a format that R would understand. I decided to also make it a factor with levels based on the number of times that location hosted. This allowed me to organize the plot with the most frequent location at the top and the rarest at the bottom. I thought this would make it easier to quickly determine the most popular locations. From this plot, I learned that the most frequent location of the fights is the United States. I was surprised to see Canada in the 3 spot as I thought it was a more “rare” location. I also decided to highlight where the title fights were occurring. The United States dominated the number of times hosting.
Looking at the frequency of different countries hosting, I was able to see that only more recently had more “exotic” locations begun to host fights. Countries like Chile or Argentina did not host for their first time until 2018. The sport definitely seems to be growing internationally, and with the graph highlighting what locations have hosted, we can see that it really stretches the globe. However, there has not been any presence in Africa yet, which is somewhat surprising considering the amount of African fighters in the UFC.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## R_fighter = col_character(),
## B_fighter = col_character(),
## R_odds = col_double(),
## B_odds = col_double(),
## date = col_character(),
## location = col_character(),
## country = col_character(),
## Winner = col_character(),
## title_bout = col_logical(),
## weight_class = col_character(),
## gender = col_character()
## )
## Warning: package 'maptools' was built under R version 4.0.5
## Loading required package: sp
## Checking rgeos availability: FALSE
## Please note that 'maptools' will be retired by the end of 2023,
## plan transition at your earliest convenience;
## some functionality will be moved to 'sp'.
## Note: when rgeos is not available, polygon geometry computations in maptools depend on gpclib,
## which has a restricted licence. It is disabled by default;
## to enable gpclib, type gpclibPermit()