Faceting (Part 2)

A look at faceting in vega-lite.

Kris Sankaran true
02-02-2021

Reading, Recording, Notebook, Rmarkdown

In these notes, we’re going to see how to facet using vega-lite. We’re going to look at the weather data, which shows the precipitation and temperature for NYC and Seattle over the course of a year.

robservable("@krisrs1128/examples-of-faceting", include = 4, height = 280)

The most direct way to facet is to use the row and column encoding types. For example, here we are faceting histograms of temperature across the type of weather.

{
  const seattle = weather.filter(d => d.location == "Seattle");
  const colors = {
    domain: ['drizzle', 'fog', 'rain', 'snow', 'sun'],
    range: ['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52'] // weather themed colors
  };
  
  return vl.markBar()
    .data(seattle)
    .encode(
      vl.x().fieldQ('temp_max').title('Temperature (°C)'),
      vl.y().count().title("Number of Days"),
      vl.color().fieldN("weather").scale(colors),
      vl.column().fieldN("weather")
    )
    .width(175)
    .height(120)
    .render()
}
robservable("@krisrs1128/examples-of-faceting", include = 5, height = 200)

If we want, we can facet by both weather and city, using row and column to distinguish between variations along rows and columns.

{
  const colors = {
    domain: ['drizzle', 'fog', 'rain', 'snow', 'sun'],
    range: ['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52'] // weather themed colors
  };
  
  return vl.markBar()
    .data(weather)
    .encode(
      vl.x().fieldQ('temp_max').title('Temperature (°C)'),
      vl.y().count().title("Number of Days"),
      vl.color().fieldN("weather").scale(colors),
      vl.column().fieldN("weather"),
      vl.row().fieldN("location")
    )
    .width(175)
    .height(120)
    .render()
}
robservable("@krisrs1128/examples-of-faceting", include = 6)

There is an alternative approach which is more general. The approach above defined the facet within the encoding for the markBar. There are situations where we might want to facet a plot that is made from more than just one type of mark (e.g., a line overlaid on points). In this case, we will want to create a facet call that can be applied to a full plot specification. This is exactly analogous to facet_wrap and facet_grid, where the faceting operator can be applied to several geom layers simultaneously. Notice that we need to put the data call after the facet – the faceting won’t know to what it should apply otherwise (try moving the command up).

{
  const colors = {
    domain: ['drizzle', 'fog', 'rain', 'snow', 'sun'],
    range: ['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52']
  };
  
  return vl.markBar()
    .encode(
      vl.x().fieldQ('temp_max').title('Temperature (°C)'),
      vl.y().count().title("Number of Days"),
      vl.color().fieldN("weather").scale(colors)
    )
    .width(175)
    .height(120)
    .facet({column: vl.field("weather"), row: vl.field("location")})
    .data(weather)
    .render()
}
robservable("@krisrs1128/examples-of-faceting", include = 7)

Here’s an example that shows why this is useful: How do the minimum and maximum temperatures vary between NYC and Seattle? In addition to showing the minimum and maximum temperatures (a ribbon), we may want to show the midpoint between these as its own line. For just NYC, we can make this plot using vl.layer to superimpose the two marks,

{
  const nyc_ = weather.derive({temp_mid: d => 0.5 * (d.temp_min + d.temp_max)})
    .filter(d => d.location == "New York")
  
  // ribbon layer
  const tempMinMax = vl.markArea({opacity: 0.3}).encode(
    vl.x().month('date'), // averages months over several years
    vl.y().average('temp_max').title('Temperature (°C)'),
    vl.y2().average('temp_min')
  );

  // line layer
  const tempMid = vl.markLine().encode(
    vl.x().month('date'),
    vl.y().average('temp_mid')
  );

  // overlay
  return vl.layer(tempMinMax, tempMid)
    .data(nyc_)
    .render();
}
robservable("@krisrs1128/examples-of-faceting", include = 8, height = 220)

Now, within each facet, we want to overlay two types of marks. We can no longer specify .column() or .row() within the .encode() calls for each mark. Instead, we have to specify a facet at the end, after we’ve already overlayed the two elements.

{
  const weather_ = weather.derive({temp_mid: d => 0.5 * (d.temp_min + d.temp_max)});

  // same ribbon, colored by location
  const tempMinMax = vl.markArea({opacity: 0.3}).encode(
    vl.x().month('date'),
    vl.y().average('temp_max').title('Temperature (°C)'),
    vl.y2().average('temp_min'),
    vl.color().fieldN("location")
  );

  // same line, colored by location
  const tempMid = vl.markLine().encode(
    vl.x().month('date'),
    vl.y().average('temp_mid'),
    vl.color().fieldN("location")
  );

  return vl.layer(tempMinMax, tempMid)
    .width(300)
    .height(200)
    .facet({column: vl.field("location")})
    .data(weather_)
    .render();
}
robservable("@krisrs1128/examples-of-faceting", include = 9, height = 230)