Learn the basic concepts for creating vega-lite plots, and see how the library supports interactivity.
Reading, Recording, Rmarkdown, Observable
Vega-Lite is the javascript equivalent of ggplot2. It provides a grammar of graphics, where layers can be composed on top of one another. The main reason for knowing vega-lite (opposed to only ggplot2) is that it can support flexible interaction. Also, it’s a good gateway toward understanding richer data visualization packages, like d3
.
We’re going to construct this graph step-by-step, just like in the ggplot2 lecture. I realize that this is a bit repetitive, but I want to make it obvious just how similar ggplot2 and vega-lite are.
The first thing to do is import the necessary libraries. The arquero library (aq
) makes it easier to manage data in javascript – the filtering and printing steps would be more difficult without it.
import {vl} from '@vega/vega-lite-api'
import { aq, op } from '@uwdata/arquero'
Now we can preview and filter the raw data.
= aq.fromCSV(await FileAttachment("gapminder.csv").text()) // I used the file upload button to add this file
data = data.filter(d => d.year == 2000) // filter to only 2000
data2000 .view() data2000
Let’s create a visual mark, without any encodings. Like in our first ggplot2
figure, the mark is collapsed into a single point at the origin. Notice the analogy betwen vega-lite’s markPoint()
and ggplot2’s geom_point()
.
.markPoint()
vl.data(data2000)
.render()
We can deconstruct the original figure to see data fields are mapped to visual properties,
We can build this up one step at a time. To encode the fertility and life expectancy fields, we use vl.x()
and vl.y()
in an encode()
call. One subtlety is that it’s necessary to specify the data type of each field. fieldQ
means that the field is “Quantitative”. We’ll review data types in the next lecture.
.markPoint()
vl.data(data2000)
.encode(
.x().fieldQ("fertility"), // what happens if you change this to fieldN()?
vl.y().fieldQ("life_expect")
vl
).render()
The same idea can be used to encode the country cluster and population fields. fieldN
specifies that the cluster field is a Nominal (i.e., categorical) variable.
Hint: How do you know which commands encode which fields? It’s often good to keep the vega-lite-api and vega-lite documentation close at hand.
.markPoint()
vl.data(data2000)
.encode(
.x().fieldQ("fertility"),
vl.y().fieldQ("life_expect"),
vl.color().fieldN("cluster"), // what happens if you change this to fieldO()?
vl.size().fieldQ("pop")
vl
).render()
We’ve made headway in our visualization task, but a more critical eye will help us improve the display. It will also give us an initial look at interactivity. The main things we ought to improve in this display are,
For (1), let’s replace the open rings with filled circles. We can pass options to markPoint
in the same way that we passed options to ggplot geom
layers. For (2), we can modify the default size range by adding on a .scale()
to the vl.size()
encoding – this is similar to adding a custom scale in ggplot2.
.markPoint({filled: true})
vl.data(data2000)
.encode(
.x().fieldQ("fertility"),
vl.y().fieldQ("life_expect"),
vl.color().fieldN("cluster"),
vl.size().fieldQ("pop").scale({range: [25, 1000]}) // try changing the ranges
vl
).render()
For (3), we can add a .title()
to fields for which we want to customize titles, and for (4), we can modify the tick count in the .legend()
for size
.
.markPoint({filled: true})
vl.data(data2000)
.encode(
.x().fieldQ("fertility").title("Fertility"),
vl.y().fieldQ("life_expect").title("Life Expectancy"),
vl.color().fieldN("cluster"),
vl.size().fieldQ("pop").scale({range: [25, 1000]}).legend({tickCount: 3}).title("Population"),
vl
).render()
Now to (5). The idea is to think of a tooltip as another type of encoding – just like coordinate positions and colors, a label is a way of transforming a field in an abstract dataset into a visual property we can observe on a screen.
.markPoint({filled: true})
vl.data(data2000)
.encode(
.x().fieldQ("fertility").title("Fertility"),
vl.y().fieldQ("life_expect").title("Life Expectancy"),
vl.color().fieldN("cluster"),
vl.size().fieldQ("pop").scale({range: [25, 1000]}).legend({tickCount: 3}).title("Population"),
vl.tooltip().fieldN("country"),
vl.order().fieldQ('pop').sort('descending') // if you remove this, large points prevent mouseover on the points below them
vl
).render()
One point I want you to take away from both this and the ggplot2 introductions is that building a plot is an iterative process. Once you develop your eye for good visual design, you will be able to take any initial plot and make it more effective.
Finally, for a taste of what’s to come, here’s a version of the same plot that let’s us compare many years over time.
Don’t worry about the code implementation, but in case you’re curious, it’s only a few extra lines over what we’ve already implemented.
{// define a slider selector object
let selectYear = vl.selectSingle("select").fields("year")
.init({year: 1955})
.bind(vl.slider().min(1955).max(2005).step(5).name("Year"))
// update the plot depending on the value of the slider
return vl.markPoint({filled: true})
.data(data)
.select(selectYear) // refers to the slider bar
.transform(vl.filter(selectYear))
.encode(
.x().fieldQ("fertility").title("Fertility").scale({domain: [0, 9]}), // hard code the axes ranges
vl.y().fieldQ("life_expect").title("Life Expectancy").scale({domain: [0, 90]}),
vl.color().fieldN("cluster"),
vl.size().fieldQ("pop").scale({domain: [0, 1200000000], range: [25, 1000]}).legend({tickCount: 3}).title("Population"),
vl.tooltip().fieldN("country"),
vl.order().fieldQ('pop').sort('descending')
vl
).render()
}