STAT 436 (Spring 2023): Design Process Case Study

Kris Sankaran

Reading, Recording, Rmarkdown

How does a visualization expert go about creating a data visualization? Perhaps the most useful lesson from this reading is that good visualizations don’t materialize out of thin air – there is always a creative process involved, steps where it’s unclear what the final result will be, even for a data visualization genius like Shirley Wu. We’re lucky that she has documented this process for us, so that we might be able to take away a few lessons for our own reflection.
Her visualization, “655 Frustrations of Data Visualization”, is based on an online data visualization survey. It had 45 questions (“How many years have you been doing data visualization? What percent of your day is focused on data prep work? …). There are 981 responses, probably mostly submitted by the survey initiator’s internet following.

A few entries from the data visualization survey. The full data are publicly available [here](https://github.com/data-visualization-society/data_visualization_survey/blob/master/data/cleaned_survey_results_2017.csv).

Figure 1: A few entries from the data visualization survey. The full data are publicly available here.

Question Formulation

At the start of the project – before writing code – there is the problem of choosing a guiding question. Her initial question was “Why might people leave the field?” This was a timely question, because, after a few years of high activity and visibility in industry, data visualization seemed to be cooling down. However, this question was not directly answerable with the data at hand, so instead, she focused on the proxy question, “Do you want to spend more time or less time visualizing data in the future?”
- This is an important lesson: there is often a distinction between what we really want to know and what the data can tell us.
- We will revisit this theme of asking sharper questions in the next reading. Data analysis should be driven by curiosity about the world, not simply the data that happen to be conveniently accessible.
To see what within the data are relevant to this guiding question, she then conducted an exploratory analysis¹, studying the marginal distributions of all the available questions. For example, visualizing the “percentage of time” questions, it became clear that most people worked on a mix of multiple data-related tasks in their work – the particular mix might help understand whether people want to stay in the field.

Figure 2: Example exploratory displays of the ‘percentage of time’ questions on the data science survey.

The exploratory analysis also revealed that there are some questions that would not be useful for answering the guiding question. For example, many of the qualitative responses were not useful². It’s easy to feel responsible for visualizing all the data that are available, but this is not necessary. It’s far more important to focus on the guiding question(s).

Initial Design

The initial design answered whether there is a relationship between (a) the survey respondent wanting to do more data visualization in the future, and (b) the current fraction of time spent on design. This was visually encoded using a stacked barchart. However, the display was not that informative, because most respondents wanted to do more data visualization in the future³.

Figure 3: The initial design used a bar chart to see whether experience was related to interest in further work in data visualization.

In this situation, it seemed like perhaps additional context would help. However, the resulting faceted barchart was difficult to make sense of, again because any relationships between variables were weak or nonexistent.

Figure 4: The faceted barchart did not add much information relevant to the guiding question.

Redesigns

At this point, two significant changes to the design were made, one to the question, and one to the design. The question was reframed from “do you want to continue working in data visualization” to “do you experience any frustrations with data visualization.” This question is still related to the guiding question, but shows much more variation across respondents. For the design, the encoding changed from a stacked barchart to a beeswarm plot. Unlike the barchart, which aggregates responses into bins, the beeswarm makes it possible to see every single respondent.

A redesigned plot. The left and right panels separate respondents with and without frustrations, vertical position encodes job role, and color gives number of years of experience.

Figure 5: A redesigned plot. The left and right panels separate respondents with and without frustrations, vertical position encodes job role, and color gives number of years of experience.

A few more refinements were made. Instead of placing those with and without frustrations far apart on the page, they were rearranged to share the same \(x\)-axis⁴. Also, instead of coloring circles by years of experience, color was used to represent the percentage of the day spent on data visualization. Again, these changes reflect sharpening of both design and questions.
In the final version of the static display, a boxplot was introduced to summarize the most salient characteristics of each beeswarm. Then, instead of just plotting the points in two parallel regions, they were made to “rise” and “fall” off the boxplots, depending on whether the respondents experienced frustrations. This kind of visual metaphor takes the visualization to another level; it becomes more than functional, it becomes evocative.

Figure 6: The final version of the static visualization.

Interpretation

Only at this point is interactivity introduced into the visualization. The interactivity is simple – views transition into one another depending on selected questions – but provides an effective alternative to simply faceting all pairs of questions.

Figure 7: The visualization above with interactivity added in.

Finally, this interactive visualization is used for an extensive exploration. This is often when the effectiveness of a visualization can be evaluated. Ultimately, visualization should help inform our body of beliefs, guiding the actions we take (either in the short or long-term). If it’s hard to draw these sorts of inferences, then a visualization is not particularly functional.

Figure 8: A screenshot of notes from the designer’s exploration of the resulting visualization.

To guide the reader, this investigative work was then incorporated into the visualization. These additional details allow the visualization to stand alone, it becomes a self-explanatory intellectual artifact.

Conclusion

Wrapping up, the final visualization is clearly the culmination of substantial intellectual labor over the course of weeks (if not months). The result is both beautiful and informative. This is an ideal to strive for – the crafting of data visualizations that can guide discovery and change.
One final note. It’s often useful to study the development of projects that you find interesting. Sometimes, authors share their code on github, or earlier versions are available through technical reports or recorded talks. This additional context can shed light on the overall inspiration and intention of the project, and especially when starting out, imitation can be an effective strategy for learning.

Using vega-lite!↩︎
Are there really 100x more pie charts than scatterplots in industry??↩︎
Perhaps an instance of selection bias.↩︎
The less our eyes have to travel across the page to make a comparison, the more efficient the visualization.↩︎

Design Process Case Study

Question Formulation

Initial Design

Redesigns

Interpretation

Conclusion

Citation