An overview of common formats, with illustrative examples.
knitr::opts_chunk$set(cache = FALSE, message = FALSE, warning = FALSE, echo = TRUE, eval = TRUE)
Vector data formats are used to store geometric information, like the locations of hospitals (points), trajectories of bus routes (lines), or boundaries of counties (polygons). It’s useful to think of the associated data as being spatially enriched data frames, with each row corresponding to one of these geometric features.
Vector data are usually stored in .geojson
, .wkt
, .shp
, or .topojson
formats. Standard data.frames cannot be used because then important spatial
metadata would be lost, like the Coordinate Reference System (to be explained in
the fourth lecture this week).
In R, these formats can be read using read_sf
in the sf
package. They can
be written using the write_sf
function. Here, we’ll read in a vector dataset
containing the boundaries of lakes in Madison.
lakes <- read_sf("https://uwmadison.box.com/shared/static/duqpj0dl3miltku1676es64d5zmygy92.geojson")
lakes %>%
dplyr::select(id, name, geometry)
Simple feature collection with 10 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -89.54084 ymin: 42.94762 xmax: -89.17699 ymax: 43.2051
Geodetic CRS: WGS 84
# A tibble: 10 × 3
id name geometry
<chr> <chr> <POLYGON [°]>
1 relation/1997948 Lake Monona ((-89.37974 43.0714, -89.37984 43.…
2 relation/3361036 Lake Mendota ((-89.46885 43.08266, -89.46864 43…
3 relation/3447583 Upper Mud Lake ((-89.31364 43.04483, -89.31361 43…
4 relation/4090134 Hook Lake ((-89.33198 42.94909, -89.33161 42…
5 relation/4231143 Lake Wingra ((-89.4265 43.05514, -89.4266 43.0…
6 relation/6183370 Lake Waubesa ((-89.32949 42.99166, -89.32908 42…
7 relation/7083727 Lake Kegonsa ((-89.2648 42.9818, -89.26399 42.9…
8 relation/9668475 Lower Mud Lake ((-89.28011 42.98486, -89.2794 42.…
9 way/21415784 Goose Lake ((-89.53957 42.97967, -89.53947 42…
10 way/28721778 Brazee Lake ((-89.18562 43.19529, -89.18533 43…
#write_sf(lakes, "output.geojson", driver = "GeoJSON")
We’ll discuss plotting in the next lecture, but for a preview, this is how you can visualize the lakes using ggplot2.
lakes <- lakes %>%
group_by(id) %>%
mutate(
longitude = st_coordinates(geometry)[1, 1],
latitude = st_coordinates(geometry)[1, 2]
)
tm_shape(lakes) +
tm_polygons(col = "#00ced1")
With a little extra effort, we can overlay the features onto public map backgrounds (these are often called “basemaps”).
# you can get your own at https://account.mapbox.com/access-tokens/
Sys.setenv(MAPBOX_API_KEY="pk.eyJ1Ijoia3Jpc3JzMTEyOCIsImEiOiJjbDYzdjJzczQya3JzM2Jtb2E0NWU1a3B3In0.Mk4-pmKi_klg3EKfTw-JbQ")
basemap <- cc_location(loc= c(-89.401230, 43.073051), buffer = 15e3)
Preparing to download: 16 tiles at zoom = 12 from
https://api.mapbox.com/v4/mapbox.satellite/
tm_shape(basemap) +
tm_rgb() +
tm_shape(lakes) +
tm_polygons(col = "#00ced1")
There is a surprising amount of public vector data available online. Using this query1, I’ve downloaded locations of all hospital clinics in Madison.
Using this query, I’ve downloaded all the bus routes.
For the boundaries of the lakes above, I used this query.
Many organizations prepare geojson data themselves and make it publicly available; e.g., the boundaries of rivers or glaciers. Don’t worry about how to visualize these data at this point — I just want to give some motivating examples.
Raster data give a measurement along a spatial grid. You can think of them as spatially enriched matrices, where the metadata says where on the earth each entry of the matrix is associated with.
Raster data are often stored in tiff
format. They can be read in using the
rast
function in the terra
library, and can be written using
writeRaster
.
shanghai <- rast("https://uwmadison.box.com/shared/static/u4na56w3r4eqg232k2ma3eqbvehfiaoq.tif")
#writeRaster(shanghai, "output.tiff", driver = "GeoTIFF")
Some of the most common types of public raster data are satellite images or derived measurements, like elevation maps. For example, the code below shows an image of a neighborhood outside Shanghai.
|---------|---------|---------|---------|
=========================================
|---------|---------|---------|---------|
=========================================
There’s actually quite a bit of information in this image. We can zoom in…
Here are is data on elevation in Zion national park.
gdal
and proj
. Since these
command line programs are not themselves a part of R, they need to be installed
before the corresponding R packages. The process will differ from operating
system to operating system, and the experience can be frustrating, especially
when the R packages don’t recognize the underlying system installation. I
recommend following the instructions on this
page and reaching out early if you have any
issues.
It can be constructed easily using the wizard↩︎
For attribution, please cite this work as
Sankaran (2023, Jan. 10). STAT 436 (Spring 2023): Spatial Data Formats. Retrieved from https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week07-01/
BibTeX citation
@misc{sankaran2023spatial, author = {Sankaran, Kris}, title = {STAT 436 (Spring 2023): Spatial Data Formats}, url = {https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week07-01/}, year = {2023} }