A look at the origins of the field.
What is the purpose of data analysis?
Tracing the refinement of questions and design.
Some major themes from STAT 479, in a nutshell.
Interpreting neurons by finding optimal inputs
Analyzing feature activations across datasets
A first look at activations in a deep learning model.
An introduction to compositional feature learning.
Evaluating the fit at particular observations in Bayesian models.
Simulating data to evaluate model quality.
The relationship between exploratory analysis and model development.
Discovering richer structure in partial dependence profiles.
An introduction to partial dependence profiles.
An application to a gene expression dataset.
Once we've fit a topic model, how should we inspect it?
Data preparation and model fitting code for topics.
An overview of dimensionality reduction via topics.
More examples of dimensionality reduction using PCA and UMAP.
An overview of the UMAP algorithm.
Visualizing and interpreting PCA.
Linear dimensionality reduction using PCA.
Examples of high-dimensional data.
How reliable are the results of a clustering?
Diagnostics for the quality of a clustering.
Visualizing table values, ordered by clustering results.
Clustering data at multiple scales using trees.
An introduction to clustering and how to manage its output.
Visualization of hierarchical structure using containment.
A scalable network visualization strategy.
The most common network visualization strategy.
Typical tasks and example network datasets.
Some strategies for interactively visualizing spatial data.
The projection problem, and how to check your CRS.
Storing spatially gridded information in rasters.
Manipulating and visualizing spatial vector data.
An overview of common formats, with illustrative examples.
Navigating across related time series.
Summaries of relationships between and within time series.
Approaches for visualizing seasonality.
Vocabulary for describing visual structure in time series.
A data structure for managing time series data.
A crash course on entity resolution, plus some other tips.
Which columns might help us understand extreme values?
Techniques to identify extreme values.
A deeper look at missing data, imputation, and characterization.
A look at how visualization can help characterize missing data.
An extended example of tidying a real-world dataset.
Using `separate`, `mutate`, and `summarise` to derive new variables for downstream visualization.
Tools for reshaping data into tidy format.
The definition of tidy data, and why it's often helpful for visualization.
A look at real-world examples of dynamic linking.
Combining faceting with dynamic queries.
An introduction to details-on-demand.
Using visualization to support query building.
A look at a fundamental building block fo interactive visualization.
An extended example of faceting with data summaries.
Adapting the small multiples principle to fields that are not exactly parallel.
A look at faceting in vega-lite.
Using small multiples to create information dense plots.
Examples of marks and their encodings in both ggplot2 and vega-lite.
Tying together the introductions to ggplot2 and vega-lite, using the common language of encodings.
Learn the basic concepts for creating vega-lite plots, and see how the library supports interactivity.
A discussion of ggplot2 terminology, and an example of iteratively refining a simple scatterplot.
How this course is structured, and how to follow along.