An overview of common formats, with illustrative examples.
Vector data formats are used to store geometric information, like the locations of hospitals (points), trajectories of bus routes (lines), or boundaries of counties (polygons). It’s useful to think of the associated data as being spatially enriched data frames, with each row corresponding to one of these geometric features.
Vector data are usually stored in .geojson
, .wkt
, .shp
, or .topojson
formats. Standard data.frames cannot be used because then important spatial metadata would be lost, like the Coordinate Reference System (to be explained in the fourth lecture this week).
In R, these formats can be read using read_sf
in the sf
package. They can be written using the write_sf
function. Here, we’ll read in a vector dataset containing the boundaries of lakes in Madison.
lakes <- read_sf("https://uwmadison.box.com/shared/static/duqpj0dl3miltku1676es64d5zmygy92.geojson")
lakes %>%
dplyr::select(id, name, geometry)
Simple feature collection with 10 features and 2 fields
geometry type: POLYGON
dimension: XY
bbox: xmin: -89.54084 ymin: 42.94762 xmax: -89.17699 ymax: 43.2051
geographic CRS: WGS 84
# A tibble: 10 x 3
id name geometry
<chr> <chr> <POLYGON [°]>
1 relation/1… Lake Mon… ((-89.37974 43.0714, -89.37984 43.07132, -89…
2 relation/3… Lake Men… ((-89.46885 43.08266, -89.46864 43.08258, -8…
3 relation/3… Upper Mu… ((-89.31364 43.04483, -89.31361 43.04464, -8…
4 relation/4… Hook Lake ((-89.33198 42.94909, -89.33161 42.94866, -8…
5 relation/4… Lake Win… ((-89.4265 43.05514, -89.4266 43.05511, -89.…
6 relation/6… Lake Wau… ((-89.32949 42.99166, -89.32908 42.99187, -8…
7 relation/7… Lake Keg… ((-89.2648 42.9818, -89.26399 42.98126, -89.…
8 relation/9… Lower Mu… ((-89.28011 42.98486, -89.2794 42.98481, -89…
9 way/214157… Goose La… ((-89.53957 42.97967, -89.53947 42.97955, -8…
10 way/287217… Brazee L… ((-89.18562 43.19529, -89.18533 43.19531, -8…
#write_sf(lakes, "output.geojson", driver = "GeoJSON")
We’ll discuss plotting in the next lecture, but for a preview, this is how you can visualize the lakes using ggplot2.
With a little extra effort, we can overlay the features onto public map backgrounds (these are often called “basemaps”).
Using this query, I’ve downloaded all the bus routes.
For the boundaries of the lakes above, I used this query.
Many organizations prepare geojson data themselves and make it publicly available; e.g., the boundaries of rivers or glaciers. Don’t worry about how to visualize these data at this point — I just want to give some motivating examples.
Raster data give a measurement along a spatial grid. You can think of them as spatially enriched matrices, where the metadata says where on the earth each entry of the matrix is associated with.
Raster data are often stored in tiff
format. They can be read in using the brick
function in the raster
library, and can be written using writeRaster
.
shanghai <- brick("https://uwmadison.box.com/shared/static/u4na56w3r4eqg232k2ma3eqbvehfiaoq.tif")
shanghai
class : RasterBrick
dimensions : 8192, 8192, 67108864, 3 (nrow, ncol, ncell, nlayers)
resolution : 2.7e-06, 2.7e-06 (x, y)
extent : 121.6565, 121.6786, 30.96283, 30.98494 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : https://uwmadison.box.com/shared/static/u4na56w3r4eqg232k2ma3eqbvehfiaoq.tif
names : u4na56w3r4eqg232k2ma3eqbvehfiaoq.1, u4na56w3r4eqg232k2ma3eqbvehfiaoq.2, u4na56w3r4eqg232k2ma3eqbvehfiaoq.3
min values : 0, 0, 0
max values : 65535, 65535, 65535
#writeRaster(shanghai, "output.tiff", driver = "GeoTIFF")
ggRGB(shanghai, stretch="lin")
There’s actually quite a bit of information in this image. We can zoom in…
ggRGB(shanghai, stretch="lin") +
coord_fixed(xlim = c(121.66, 121.665), ylim = c(30.963, 30.968))
Here are is data on elevation in Zion national park.
f <- system.file("raster/srtm.tif", package = "spDataLarge")
zion <- raster(f) %>%
as.data.frame(xy = TRUE)
ggplot(zion) +
geom_raster(aes(x = x, y = y, fill = srtm)) +
scale_fill_gradient(low = "white", high = "black") +
coord_fixed()
gdal
and proj
. Since these command line programs are not themselves a part of R, they need to be installed before the corresponding R packages. The process will differ from operating system to operating system, and the experience can be frustrating, especially when the R packages don’t recognize the underlying system installation. I recommend following the instructions on this page and reaching out early if you have any issues.
It can be constructed easily using the wizard↩︎