Vector Data

Manipulating and visualizing spatial vector data.

Kris Sankaran (UW Madison)
2023-01-10

Reading, Recording, Rmarkdown

  1. As mentioned previously, vector data are used to store geometric spatial data. Specifically, there are 7 types of geometric information that are commonly used, as given in the figure below.
include_graphics("../activities/figure/sf-classes.png")

  1. We can construct these geometric objects from scratch. For example, starting from the defining coordinates, we can use st_point to create a point object,
# make a point
p <- st_point(c(5, 2))
plot(p)

st_linestring to create a linestring,

# make a line
linestring_matrix <- rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2), c(3, 2))
p <- st_linestring(linestring_matrix)
plot(p)

and st_polygon to create a polygon.

# make a polygon
polygon_list <- list(rbind(c(1, 5), c(2, 2), c(4, 1), c(4, 4), c(1, 5)))
p <- st_polygon(polygon_list)
plot(p)

  1. Different geometries can be combined into a geometry collection, using sfc.
point1 <- st_point(c(5, 2))
point2 <- st_point(c(1, 3))
points_sfc <- st_sfc(point1, point2)
plot(points_sfc)

  1. Real-world vector datasets are more than just these geometries — they also associate each geometry with some additional information about each feature. We can add this information to the geometries above by associating each element with a row of a data.frame. This merging is accomplished by st_sf, using geometry to associate a raw st_geom each row of a data.frame.
lnd_point <- st_point(c(0.1, 51.5))                
lnd_geom <- st_sfc(lnd_point, crs = 4326)         
lnd_attrib = data.frame(                        
  name = "London",
  temperature = 25,
  date = as.Date("2017-06-21")
  )

lnd_sf = st_sf(lnd_attrib, geometry = lnd_geom)

Visualization

  1. Vector data can be directly plotted using base R. For example, suppose we want to plot the boundaries of India, within it’s local context. We can use the world dataset, provided by the spData package. Each row of the world object contains both the boundary of a country (in the geom column) and information about its location and population characteristics.
data(world)
head(world)
Simple feature collection with 6 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
Geodetic CRS:  WGS 84
# A tibble: 6 × 11
  iso_a2 name_…¹ conti…² regio…³ subre…⁴ type  area_…⁵     pop lifeExp
  <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <dbl>   <dbl>   <dbl>
1 FJ     Fiji    Oceania Oceania Melane… Sove…  1.93e4  8.86e5    70.0
2 TZ     Tanzan… Africa  Africa  Easter… Sove…  9.33e5  5.22e7    64.2
3 EH     Wester… Africa  Africa  Northe… Inde…  9.63e4 NA         NA  
4 CA     Canada  North … Americ… Northe… Sove…  1.00e7  3.55e7    82.0
5 US     United… North … Americ… Northe… Coun…  9.51e6  3.19e8    78.8
6 KZ     Kazakh… Asia    Asia    Centra… Sove…  2.73e6  1.73e7    71.6
# … with 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>,
#   and abbreviated variable names ¹​name_long, ²​continent,
#   ³​region_un, ⁴​subregion, ⁵​area_km2

This makes the plot, using dplyr to filter down to just the row containing the India geometry.

india_geom <- world %>%
  filter(name_long == "India") %>%
  st_geometry()

plot(india_geom)

  1. Using base R, we can also layer on several vector objects, using add = TRUE.
world_asia <- world %>%
  filter(continent == "Asia")

plot(india_geom, expandBB = c(0, 0.2, 0.1, 1), col = "gray", lwd = 3)
plot(st_union(world_asia), add = TRUE)

  1. We can also use tm_polygons in tmap. To change the coordinates of the viewing box, we can set the bbox (bounding box) argument.
bbox <- c(60, 5, 110, 40)
tm_shape(world_asia, bbox = bbox) +
  tm_polygons(col = "white") +
  tm_shape(india_geom) +
  tm_polygons()

  1. We can also encode data that’s contained in the vector dataset.
tm_shape(world_asia, bbox = bbox) +
  tm_polygons(col = "lifeExp") +
  tm_polygons()

  1. Even in this more complex setup, where we work with background images and vector data rather than standard data.frames, we can still apply the kinds of visual encoding ideas that we are familiar with. For example, we can still color code or facet by fields in the vector dataset. To illustrate, we revisit the bus route data from the last lecture and distinguish between buses operated by the cities of Madison vs. Monona. Before plotting, we fetch the underlying data.
Sys.setenv(MAPBOX_API_KEY="pk.eyJ1Ijoia3Jpc3JzMTEyOCIsImEiOiJjbDYzdjJzczQya3JzM2Jtb2E0NWU1a3B3In0.Mk4-pmKi_klg3EKfTw-JbQ")
basemap <- cc_location(loc= c(-89.401230, 43.073051), buffer = 15e3)
Preparing to download: 16 tiles at zoom = 12 from 
https://api.mapbox.com/v4/mapbox.satellite/
bus <- read_sf("https://uwmadison.box.com/shared/static/5neu1mpuh8esmb1q3j9celu73jy1rj2i.geojson")

tm_shape(basemap) +
  tm_rgb() +
  tm_shape(bus) +
  tm_lines(col = "#bc7ab3", size = 1)

Note that operator is the field containing information about which city is operating the buses. We can color code the routes by this attribute.

tm_shape(basemap) +
  tm_rgb() +
  tm_shape(bus) +
  tm_lines(col = "operator", size = 1) +
  tm_layout(legend.bg.color = "white")

Alternatively, we can facet.

tm_shape(basemap) +
  tm_rgb() +
  tm_shape(bus) +
  tm_lines(col = "operator", size = 1) +
  tm_facets("operator")

Citation

For attribution, please cite this work as

Sankaran (2023, Jan. 10). STAT 436 (Spring 2023): Vector Data. Retrieved from https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week07-02/

BibTeX citation

@misc{sankaran2023vector,
  author = {Sankaran, Kris},
  title = {STAT 436 (Spring 2023): Vector Data},
  url = {https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week07-02/},
  year = {2023}
}