Design Process Case Study
Tracing the refinement of questions and design.
Kris Sankaran (UW Madison)
2022-12-28
Reading, Recording, Rmarkdown
How does a visualization expert go about creating a data visualization?
Perhaps the most useful lesson from this reading is that good visualizations
don’t materialize out of thin air – there is always a creative process
involved, steps where it’s unclear what the final result will be, even for a
data visualization genius like Shirley Wu. We’re lucky that she has documented
this process for us, so that we might be able to take away a few lessons for our
own reflection.
Her visualization, “655 Frustrations of Data
Visualization”, is based on an online
data visualization survey. It had 45 questions (“How many years have you been
doing data visualization? What percent of your day is focused on data prep work?
…). There are 981 responses, probably mostly submitted by the survey
initiator’s internet following.
- At the start of the project – before writing code – there is the problem of
choosing a guiding question. Her initial question was “Why might people leave
the field?” This was a timely question, because, after a few years of high
activity and visibility in industry, data visualization seemed to be cooling
down. However, this question was not directly answerable with the data at hand,
so instead, she focused on the proxy question, “Do you want to spend more time
or less time visualizing data in the future?”
- This is an important lesson: there is often a distinction between what we
really want to know and what the data can tell us.
- We will revisit this theme of asking sharper questions in the next reading.
Data analysis should be driven by curiosity about the world, not simply the
data that happen to be conveniently accessible.
- To see what within the data are relevant to this guiding question, she then
conducted an exploratory analysis, studying the marginal
distributions of all the available questions. For example, visualizing the
“percentage of time” questions, it became clear that most people worked on a mix
of multiple data-related tasks in their work – the particular mix might help
understand whether people want to stay in the field.
- The exploratory analysis also revealed that there are some
questions
that would not be useful for answering the guiding question. For example, many
of the qualitative responses were not useful. It’s easy to feel responsible for
visualizing all the data that are available, but this is not necessary. It’s far
more important to focus on the guiding question(s).
Initial Design
- The initial design answered whether there is a relationship between (a) the
survey respondent wanting to do more data visualization in the future, and (b)
the current fraction of time spent on design. This was visually encoded using a
stacked barchart. However, the display was not that informative, because most
respondents wanted to do more data visualization in the future.
- In this situation, it seemed like perhaps additional context would help.
However, the resulting faceted barchart was difficult to make sense of, again
because any relationships between variables were weak or nonexistent.
Redesigns
- At this point, two significant changes to the design were made, one to the
question, and one to the design. The question was reframed from “do you want to
continue working in data visualization” to “do you experience any frustrations
with data visualization.” This question is still related to the guiding
question, but shows much more variation across respondents. For the design, the
encoding changed from a stacked barchart to a beeswarm plot. Unlike the
barchart, which aggregates responses into bins, the beeswarm makes it possible
to see every single respondent.
A few more refinements were made. Instead of placing those with and without
frustrations far apart on the page, they were rearranged to share the same
\(x\)-axis. Also, instead of coloring circles by
years of experience, color was used to represent the percentage of the day spent
on data visualization. Again, these changes reflect sharpening of both design
and questions.
In the final version of the static display, a boxplot was introduced to
summarize the most salient characteristics of each beeswarm. Then, instead of
just plotting the points in two parallel regions, they were made to “rise” and
“fall” off the boxplots, depending on whether the respondents experienced
frustrations. This kind of visual metaphor takes the visualization to another
level; it becomes more than functional, it becomes evocative.
Interpretation
- Only at this point is interactivity introduced into the visualization. The
interactivity is simple – views transition into one another depending on
selected questions – but provides an effective alternative to simply faceting
all pairs of questions.
- Finally, this interactive visualization is used for an extensive
exploration. This is often when the effectiveness of a visualization can be
evaluated. Ultimately, visualization should help inform our body of beliefs,
guiding the actions we take (either in the short or long-term). If it’s hard to
draw these sorts of inferences, then a visualization is not particularly
functional.
- To guide the reader, this investigative work was then incorporated into the
visualization. These additional details allow the visualization to stand alone,
it becomes a self-explanatory intellectual artifact.
Conclusion
Wrapping up, the final visualization is clearly the culmination of
substantial intellectual labor over the course of weeks (if not months). The
result is both beautiful and informative. This is an ideal to strive for – the
crafting of data visualizations that can guide discovery and change.
One final note. It’s often useful to study the development of projects that
you find interesting. Sometimes, authors share their code on github, or earlier
versions are available through technical reports or recorded talks. This
additional context can shed light on the overall inspiration and intention of
the project, and especially when starting out, imitation can be an effective
strategy for learning.
Citation
For attribution, please cite this work as
Sankaran (2022, Dec. 28). STAT 436 (Spring 2023): Design Process Case Study. Retrieved from https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week14-2/
BibTeX citation
@misc{sankaran2022design,
author = {Sankaran, Kris},
title = {STAT 436 (Spring 2023): Design Process Case Study},
url = {https://krisrs1128.github.io/stat436_s23/website/stat436_s23/posts/2022-12-27-week14-2/},
year = {2022}
}