Microbial Response to Antibiotics

Introduction

library("knitr")
library("tidyverse")
library("phyloseq")
library("treelapse")

# minimal theme for ggplots
theme_set(theme_bw())
min_theme <- theme_update(
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.ticks = element_blank(),
  legend.title = element_text(size = 8),
  legend.text = element_text(size = 6),
  axis.text = element_text(size = 6),
  axis.title = element_text(size = 8),
  strip.background = element_blank(),
  strip.text = element_text(size = 8),
  legend.key = element_blank()
)
opts_chunk$set("fig.width" = 6.5, "fig.height" = 3.5)

data(abt)

In this vignette, we study the impact of antibiotics on microbial ecology, using data from (Dethlefsen and Relman 2011). Even non-scientists know that antibiotics kill bacteria – here we will try to describe the effects of these time courses on microbial ecology in a little more precision. For example, how quickly do the bacteria die off, does the population return to the initial states, and if so how long does that take? Also, are certain microbes more / less resistant to antibiotics, and do we see any kinds of competition or shifts in the the community’s position on the adaptive landscape?

We will answer these questions by studying time series of microbial abundances while the their host goes through a few separate periods of antibiotic treatment. The series can be arranged along the taxonomic tree; we can represent each node by either its sum or average across descendants. Instead of identifying each node with the full time series, we can alternatively conisder the vector of averages during the different periods (pre, during, and post-antibiotic) and see how these averages change across the different periods and across different subtrees.

Data Background

We give some preliminary views of the data, just to get some intuition about it. If you are familiar with this (type of) study, you can safely skip this section.

First, we look at how many samples there are, how many microbes were found, and what types of features were collected about the samples.

abt

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 2582 taxa and 162 samples ]
## sample_data() Sample Data:       [ 162 samples by 3 sample variables ]
## tax_table()   Taxonomy Table:    [ 2582 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 2582 tips and 2581 internal nodes ]

mapping <- sample_data(abt)
summary(mapping)

##  ind      condition       time      
##  D:56   Pre Cp :33   Min.   : 1.00  
##  E:52   1st Cp :15   1st Qu.:14.00  
##  F:54   1st WPC:18   Median :27.50  
##         Interim:49   Mean   :27.52  
##         2nd Cp :14   3rd Qu.:41.00  
##         2nd WPC:20   Max.   :56.00  
##         Post Cp:13

Evidently, there are about fifty timepoints collected across three individuals. Each timepoint is associated with a condition: before, between, or after different antibiotic time courses (Cp, for ciprofloxacin). Unfortunately, the treatments were not given at the same time for each person, see the figure below.

ggplot(mapping) +
  geom_tile(aes(x = time, y = ind, fill = condition)) +
  scale_fill_brewer(palette = "Set3")

If we had time be very careful, we might want to find a way to align these series. Instead, we will just analyze these three subjects separately.

Turning over to the RSV (microbial) abundances, we notice that the counts are highly skewed. We’re finding that the vast majority of microbes have very small abundnaces.

hist(asinh(as(otu_table(abt), "matrix")), main = "Raw RSV Counts")

This is actually pretty standard in microbiome data; we’ll use the filtering and transformation functions in phyloseq to down to the ~300 most abundant taxa and work with asinh transformed counts[^The asinh transformation is like taking logs, but is less agressive at smaller values. This is evident from the representation, $asinh(x) = (x + )].

abt <- abt %>%
  filter_taxa(function(x) sd(x) > 7.5, prune = TRUE) %>%
  transform_sample_counts(asinh)
abt

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 327 taxa and 162 samples ]
## sample_data() Sample Data:       [ 162 samples by 3 sample variables ]
## tax_table()   Taxonomy Table:    [ 327 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 327 tips and 326 internal nodes ]

hist(as(otu_table(abt), "matrix"), main = "Processed RSV Counts")

These are relatively generic checks we would do with any microbiome data set. Next, we consider some preparatory work specific to treelapse.

Data Preparation

Before we can produce any of the treelapse figures, we will need to generate the following data.

An edgelist specifying the taxonomy of microbes in the sample.
A values dataset giving the

tip_values <- t(get_taxa(abt))

taxa <- tax_table(abt) %>%
  as("matrix")
taxa <- gsub("_1", "", taxa)
taxa <- gsub("_2", "", taxa)
taxa <- gsub("uncultured", "", taxa)
taxa[taxa == ""] <- NA

incertae_ix <- which(taxa == "Incertae Sedis", arr.ind = TRUE)
for (parent in c("Erysipelotrichi_Erysipelotrichales", "Lachnospiraceae", "Ruminococcaceae")) {
  cur_parent_ix <- which(taxa == parent, arr.ind = TRUE)
  cur_ix <- incertae_ix[incertae_ix[, "row"] %in% cur_parent_ix[, "row"], ]
  taxa[cur_ix] <- paste0("Incerate Sedis_", parent)
}

edges <- taxa_edgelist(taxa)

subjects <- unique(mapping$ind)
values <- list()
for (i in seq_along(subjects)) {
  cur_ix  <- mapping$ind == subjects[i]
  for (fun in c("sum", "mean")) {
    cur_fun <- get(sprintf("tree_%s", fun))
    cur_values <- tree_fun_multi(edges, tip_values[cur_ix,, drop = FALSE], cur_fun)

    values <- c(
      values,
      list(
        data.table(
          "subject" = subjects[i],
          "type" = fun,
          "time" = mapping$time[cur_ix][cur_values$row],
          cur_values
        )
      )
    )
  }
}

## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50 
## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50 
## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50 
## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50 
## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50 
## Processing sample 10 
## Processing sample 20 
## Processing sample 30 
## Processing sample 40 
## Processing sample 50

values <- rbindlist(values)

cur_subject <- "D"
time_data <- values %>%
  filter(subject == cur_subject, type == "sum") %>%
  select(time, unit, value) %>%
  arrange(unit, time)

conditions <- mapping %>%
  filter(ind == cur_subject) %>%
  select(time, condition) %>%
  unique()

## Warning in class(x) <- c(subclass, tibble_class): Setting class(x) to multiple
## strings ("tbl_df", "tbl", ...); result will no longer be an S4 object

Visualization

Tree Sums

display_opts <- list(
  "margin" = list("ts_right" = 30, "ts_left" = 35, "tree_right" = 15, "tree_left" = 15),
  "size_min" = 1,
  "size_max" = 10
)
timebox_tree(
  time_data,
  edges,
  display_opts = display_opts
)

treebox(
  time_data,
  edges,
  display_opts = display_opts
)

Tree Averages

time_data <- values %>%
  filter(subject == cur_subject, type == "mean") %>%
  select(time, unit, value) %>%
  arrange(unit)

display_opts$size_min <- 0.5
timebox_tree(
  time_data,
  edges,
  display_opts = display_opts
)

treebox(
  time_data,
  edges,
  display_opts = display_opts
)

Per-Condition Sankey

condition_values <- values %>%
  left_join(conditions) %>%
  group_by(subject, unit, type, condition) %>%
  summarise(value = mean(value))

## Joining, by = "time"

sankey_data <- condition_values %>%
   filter(subject == cur_subject, type == "sum") %>%
   as.data.frame() %>%
   select(condition, unit, value)
colnames(sankey_data)[1] <- "group"
sankey_data$group <- gsub(" ", "", sankey_data$group)

doi_sankey(
  sankey_data,
  edges,
  width = 1000,
  display_opts = list(
    "leaf_width" = 5
  )
)

Microbial Response to Antibiotics

Kris Sankaran

2020-01-02