Spatial Data Formats

An overview of common formats, with illustrative examples.

Kris Sankaran (UW Madison)
03-08-2021

Reading, Recording, Rmarkdown

  1. Spatial data come in two main formats: vector and raster. We’ll examine them in detail in the next few lectures, but this lecture motivates the high-level distinction and gives a few examples. It also shows how to read and write data to and from these formats.

Vector Data

  1. Vector data formats are used to store geometric information, like the locations of hospitals (points), trajectories of bus routes (lines), or boundaries of counties (polygons). It’s useful to think of the associated data as being spatially enriched data frames, with each row corresponding to one of these geometric features.

  2. Vector data are usually stored in .geojson, .wkt, .shp, or .topojson formats. Standard data.frames cannot be used because then important spatial metadata would be lost, like the Coordinate Reference System (to be explained in the fourth lecture this week).

  3. In R, these formats can be read using read_sf in the sf package. They can be written using the write_sf function. Here, we’ll read in a vector dataset containing the boundaries of lakes in Madison.

lakes <- read_sf("https://uwmadison.box.com/shared/static/duqpj0dl3miltku1676es64d5zmygy92.geojson")

lakes %>%
  dplyr::select(id, name, geometry)
Simple feature collection with 10 features and 2 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -89.54084 ymin: 42.94762 xmax: -89.17699 ymax: 43.2051
geographic CRS: WGS 84
# A tibble: 10 x 3
   id          name                                           geometry
   <chr>       <chr>                                     <POLYGON [°]>
 1 relation/1… Lake Mon… ((-89.37974 43.0714, -89.37984 43.07132, -89…
 2 relation/3… Lake Men… ((-89.46885 43.08266, -89.46864 43.08258, -8…
 3 relation/3… Upper Mu… ((-89.31364 43.04483, -89.31361 43.04464, -8…
 4 relation/4… Hook Lake ((-89.33198 42.94909, -89.33161 42.94866, -8…
 5 relation/4… Lake Win… ((-89.4265 43.05514, -89.4266 43.05511, -89.…
 6 relation/6… Lake Wau… ((-89.32949 42.99166, -89.32908 42.99187, -8…
 7 relation/7… Lake Keg… ((-89.2648 42.9818, -89.26399 42.98126, -89.…
 8 relation/9… Lower Mu… ((-89.28011 42.98486, -89.2794 42.98481, -89…
 9 way/214157… Goose La… ((-89.53957 42.97967, -89.53947 42.97955, -8…
10 way/287217… Brazee L… ((-89.18562 43.19529, -89.18533 43.19531, -8…
#write_sf(lakes, "output.geojson", driver = "GeoJSON")

We’ll discuss plotting in the next lecture, but for a preview, this is how you can visualize the lakes using ggplot2.

With a little extra effort, we can overlay the features onto public map backgrounds (these are often called “basemaps”).

  1. There is a surprising amount of public vector data available online. Using this query1, I’ve downloaded locations of all hospital clinics in Madison.
clinics <- read_sf("https://uwmadison.box.com/shared/static/896jdml9mfnmza3vf8bh221h9hlvh70v.geojson")

# how would you overlay the names of the clinics, using geom_text?
ggmap(satellite) +
  geom_sf(data = clinics, col = "red", size = 2, inherit.aes = FALSE) +
  coord_sf(crs = st_crs(3857))

Using this query, I’ve downloaded all the bus routes.

bus <- read_sf("https://uwmadison.box.com/shared/static/5neu1mpuh8esmb1q3j9celu73jy1rj2i.geojson")

ggmap(satellite) +
  geom_sf(data = bus, col = "#bc7ab3", size = .5, inherit.aes = FALSE) +
  coord_sf(crs = st_crs(3857))

For the boundaries of the lakes above, I used this query.

Many organizations prepare geojson data themselves and make it publicly available; e.g., the boundaries of rivers or glaciers. Don’t worry about how to visualize these data at this point — I just want to give some motivating examples.

Raster Data

  1. Raster data give a measurement along a spatial grid. You can think of them as spatially enriched matrices, where the metadata says where on the earth each entry of the matrix is associated with.

  2. Raster data are often stored in tiff format. They can be read in using the brick function in the raster library, and can be written using writeRaster.

shanghai <- brick("https://uwmadison.box.com/shared/static/u4na56w3r4eqg232k2ma3eqbvehfiaoq.tif")
shanghai
class      : RasterBrick 
dimensions : 8192, 8192, 67108864, 3  (nrow, ncol, ncell, nlayers)
resolution : 2.7e-06, 2.7e-06  (x, y)
extent     : 121.6565, 121.6786, 30.96283, 30.98494  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : https://uwmadison.box.com/shared/static/u4na56w3r4eqg232k2ma3eqbvehfiaoq.tif 
names      : u4na56w3r4eqg232k2ma3eqbvehfiaoq.1, u4na56w3r4eqg232k2ma3eqbvehfiaoq.2, u4na56w3r4eqg232k2ma3eqbvehfiaoq.3 
min values :                                  0,                                  0,                                  0 
max values :                              65535,                              65535,                              65535 
#writeRaster(shanghai, "output.tiff", driver = "GeoTIFF")
  1. Some of the most common types of public raster data are satellite images or derived measurements, like elevation maps. For example, the code below shows an image of a neighborhood outside Shanghai.
ggRGB(shanghai, stretch="lin")

There’s actually quite a bit of information in this image. We can zoom in…

ggRGB(shanghai, stretch="lin") +
  coord_fixed(xlim = c(121.66, 121.665), ylim = c(30.963, 30.968))

Here are is data on elevation in Zion national park.

f <- system.file("raster/srtm.tif", package = "spDataLarge")
zion <- raster(f) %>%
  as.data.frame(xy = TRUE)
ggplot(zion) +
  geom_raster(aes(x = x, y = y, fill = srtm)) +
  scale_fill_gradient(low = "white", high = "black") +
  coord_fixed()

Installation

  1. A note about R packages: for historical reasons, spatial data libraries in R reference a few command line programs, like gdal and proj. Since these command line programs are not themselves a part of R, they need to be installed before the corresponding R packages. The process will differ from operating system to operating system, and the experience can be frustrating, especially when the R packages don’t recognize the underlying system installation. I recommend following the instructions on this page and reaching out early if you have any issues.

  1. It can be constructed easily using the wizard↩︎