library("knitr")
library("tidyverse")
library("phyloseq")
library("treelapse")
# minimal theme for ggplots
theme_set(theme_bw())
min_theme <- theme_update(
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6),
axis.text = element_text(size = 6),
axis.title = element_text(size = 8),
strip.background = element_blank(),
strip.text = element_text(size = 8),
legend.key = element_blank()
)
opts_chunk$set("fig.width" = 6.5, "fig.height" = 3.5)
data(abt)
In this vignette, we study the impact of antibiotics on microbial ecology, using data from (Dethlefsen and Relman 2011). Even non-scientists know that antibiotics kill bacteria – here we will try to describe the effects of these time courses on microbial ecology in a little more precision. For example, how quickly do the bacteria die off, does the population return to the initial states, and if so how long does that take? Also, are certain microbes more / less resistant to antibiotics, and do we see any kinds of competition or shifts in the the community’s position on the adaptive landscape?
We will answer these questions by studying time series of microbial abundances while the their host goes through a few separate periods of antibiotic treatment. The series can be arranged along the taxonomic tree; we can represent each node by either its sum or average across descendants. Instead of identifying each node with the full time series, we can alternatively conisder the vector of averages during the different periods (pre, during, and post-antibiotic) and see how these averages change across the different periods and across different subtrees.
We give some preliminary views of the data, just to get some intuition about it. If you are familiar with this (type of) study, you can safely skip this section.
First, we look at how many samples there are, how many microbes were found, and what types of features were collected about the samples.
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 2582 taxa and 162 samples ]
## sample_data() Sample Data: [ 162 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 2582 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 2582 tips and 2581 internal nodes ]
## ind condition time
## D:56 Pre Cp :33 Min. : 1.00
## E:52 1st Cp :15 1st Qu.:14.00
## F:54 1st WPC:18 Median :27.50
## Interim:49 Mean :27.52
## 2nd Cp :14 3rd Qu.:41.00
## 2nd WPC:20 Max. :56.00
## Post Cp:13
Evidently, there are about fifty timepoints collected across three individuals. Each timepoint is associated with a condition: before, between, or after different antibiotic time courses (Cp, for ciprofloxacin). Unfortunately, the treatments were not given at the same time for each person, see the figure below.
ggplot(mapping) +
geom_tile(aes(x = time, y = ind, fill = condition)) +
scale_fill_brewer(palette = "Set3")
If we had time be very careful, we might want to find a way to align these series. Instead, we will just analyze these three subjects separately.
Turning over to the RSV (microbial) abundances, we notice that the counts are highly skewed. We’re finding that the vast majority of microbes have very small abundnaces.
This is actually pretty standard in microbiome data; we’ll use the filtering and transformation functions in phyloseq
to down to the ~300 most abundant taxa and work with asinh
transformed counts[^The asinh transformation is like taking logs, but is less agressive at smaller values. This is evident from the representation, $asinh(x) = (x + )].
abt <- abt %>%
filter_taxa(function(x) sd(x) > 7.5, prune = TRUE) %>%
transform_sample_counts(asinh)
abt
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 327 taxa and 162 samples ]
## sample_data() Sample Data: [ 162 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 327 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 327 tips and 326 internal nodes ]
These are relatively generic checks we would do with any microbiome data set. Next, we consider some preparatory work specific to treelapse.
Before we can produce any of the treelapse
figures, we will need to generate the following data.
values
dataset giving thetaxa <- tax_table(abt) %>%
as("matrix")
taxa <- gsub("_1", "", taxa)
taxa <- gsub("_2", "", taxa)
taxa <- gsub("uncultured", "", taxa)
taxa[taxa == ""] <- NA
incertae_ix <- which(taxa == "Incertae Sedis", arr.ind = TRUE)
for (parent in c("Erysipelotrichi_Erysipelotrichales", "Lachnospiraceae", "Ruminococcaceae")) {
cur_parent_ix <- which(taxa == parent, arr.ind = TRUE)
cur_ix <- incertae_ix[incertae_ix[, "row"] %in% cur_parent_ix[, "row"], ]
taxa[cur_ix] <- paste0("Incerate Sedis_", parent)
}
edges <- taxa_edgelist(taxa)
subjects <- unique(mapping$ind)
values <- list()
for (i in seq_along(subjects)) {
cur_ix <- mapping$ind == subjects[i]
for (fun in c("sum", "mean")) {
cur_fun <- get(sprintf("tree_%s", fun))
cur_values <- tree_fun_multi(edges, tip_values[cur_ix,, drop = FALSE], cur_fun)
values <- c(
values,
list(
data.table(
"subject" = subjects[i],
"type" = fun,
"time" = mapping$time[cur_ix][cur_values$row],
cur_values
)
)
)
}
}
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
## Processing sample 10
## Processing sample 20
## Processing sample 30
## Processing sample 40
## Processing sample 50
cur_subject <- "D"
time_data <- values %>%
filter(subject == cur_subject, type == "sum") %>%
select(time, unit, value) %>%
arrange(unit, time)
## Warning in class(x) <- c(subclass, tibble_class): Setting class(x) to multiple
## strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
condition_values <- values %>%
left_join(conditions) %>%
group_by(subject, unit, type, condition) %>%
summarise(value = mean(value))
## Joining, by = "time"
Dethlefsen, Les, and David A Relman. 2011. “Incomplete Recovery and Individualized Responses of the Human Distal Gut Microbiota to Repeated Antibiotic Perturbation.” Proceedings of the National Academy of Sciences 108 (Supplement 1): 4554–61.