background-image: url("") background-size: cover .center[ # Microbiome Studies: Design, Infrastructure, and Inference ### MSTP M2 Journal Club <br/> <br/> <br/> <br/> .large[Kris Sankaran | [krisrs1128.github.io/LSLab](krisrs1128.github.io/LSLab) | 28 November 2022 | <img src="figure/mstp.png" width=150/> ] ] --- ### Goals of Microbiome Analysis 1. A microbiome is an ecosystem at the microbial level. It can be described by taxonomic composition, cellular interactions, and biochemical environment. 1. We now have tools for quantitatively describing the structure of microbiomes across a variety of environments, including within human body sites. 1. What is the microbiome's function in human health and disease? Many studies try to answer this question, starting with structural descriptions. --- class: middle .center[ ## Motivating Examples ] --- ### Microbiome and Obesity .pull-left[Transplanting the microbiomes of obese individuals into germ-free mice causes dramatic weight gain [1; 2; 3].] .pull-right[ <div class="figure" style="text-align: center"> <img src="figure/mice-obesity.png" alt="Figure from [4]." width="407" /> <p class="caption">Figure from [4].</p> </div> ] --- ### Inflammatory Bowel Disease .pull-left[ A dysbiotic microbiome has been implicated in IBD, and microbiome-based interventions are actively being studied [5; 6]. ] .pull-right[ <div class="figure" style="text-align: center"> <img src="figure/ibd.jpeg" alt="Figure from [6]." width="374" /> <p class="caption">Figure from [6].</p> </div> ] --- ### Infant Microbiome Development .pull-left[ 1. An infant's microbiome is seeded during birth, and it goes through major reorganization during the first few years of life [7; 8]. 1. The structure of this microbiome is known to influence immune system development (this is the essence of the "Hygiene Hypothesis"). ] .pull-right[ <div class="figure" style="text-align: center"> <img src="figure/infant-microbiome.jpeg" alt="Figure from [7]." width="600" /> <p class="caption">Figure from [7].</p> </div> ] --- ### Gut-Brain Axis .pull-left[ 1. Animal models have given evidence of an association between the microbiome and the brain [9; 10]. 1. In humans, the microbiome has been linked to autism and Parkinson's Disease, though the direction of causality is unknown. 1. There are indications that biochemical products from microbes can penetrate the blood-brain barrier. ] .pull-right[ <div class="figure" style="text-align: center"> <img src="figure/zebrafish.svg" alt="Figure from [10]." width="400" /> <p class="caption">Figure from [10].</p> </div> ] --- class: middle .center[ ## Data Sources ] --- ### 16S rRNA Sequencing 1. The 16S rRNA region is conserved across Bacteria, but variable enough to be used to distinguish between closely related strains. 1. By selectively sequencing this region, we can investigate the taxonomic composition of a microbiome. 1. Since it doesn't require full genome sequencing, it is cheap both in terms of cost and computation. <img src="figure/16s-workflow.png" width="750" style="display: block; margin: auto;" /> --- ### Metagenomics 1. Instead of sequencing just one gene, we can sequence _all_ microbial genes present in a sample. 1. Genes more directly inform function, and variation in taxa might obfuscate functional unity. 1. We typically align reads to a reference database, but it is also possible to assemble reads into metagenomes de novo. <img src="figure/metagenome-workflow.png" width="750" style="display: block; margin: auto;" /> --- ### Metatranscriptomics 1. A gene may be present without being expressed. For this level of detail, it is necessary to use metatranscriptomic sequencing. 1. We may see large expression changes in response to environmental stimuli even if taxonomic compositions remain stable. <img src="figure/metagenome-workflow.png" width="750" style="display: block; margin: auto;" /> --- ### Metabolomics 1. We can study the small molecules present in a sample using metabolomics. These are potential microbial inputs / products. 1. Spectroscopy creates quantitative signatures of the molecules present in a sample. Each peak in the signature corresponds to one molecular subcomponent. <img src="figure/metabolome-workflow.png" width="750" style="display: block; margin: auto;" /> --- class: middle .center[ ## Computational Infrastructure ] --- ### Interactivity 1. Before starting analysis, it is worthwhile to create a data structure that can be easily inspected and manipulated. 1. This interactive data structure bridges raw sequencing outputs and more formal statistical analysis. 1. `phyloseq` is a powerful container for microbiome data [11], and `MultiAssayExperiment` can encapsulate several omics simultaneously [12]. ```r library(phyloseq) data(GlobalPatterns) GlobalPatterns ``` ``` ## phyloseq-class experiment-level object ## otu_table() OTU Table: [ 19216 taxa and 26 samples ] ## sample_data() Sample Data: [ 26 samples by 7 sample variables ] ## tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ] ## phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ] ``` --- ```r library(genefilter) f <- filterfun(kOverA(10, 5)) gp_subset <- filter_taxa(GlobalPatterns, f, prune = TRUE) plot_heatmap(gp_subset) ``` <img src="20221128_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ```r library(MultiAssayExperiment) data(miniACC) miniACC ``` ``` ## A MultiAssayExperiment object of 5 listed ## experiments with user-defined names and respective classes. ## Containing an ExperimentList class object of length 5: ## [1] RNASeq2GeneNorm: SummarizedExperiment with 198 rows and 79 columns ## [2] gistict: SummarizedExperiment with 198 rows and 90 columns ## [3] RPPAArray: SummarizedExperiment with 33 rows and 46 columns ## [4] Mutations: matrix with 97 rows and 90 columns ## [5] miRNASeqGene: SummarizedExperiment with 471 rows and 80 columns ## Functionality: ## experiments() - obtain the ExperimentList instance ## colData() - the primary/phenotype DataFrame ## sampleMap() - the sample coordination DataFrame ## `$`, `[`, `[[` - extract colData columns, subset, or experiment ## *Format() - convert into a long or wide DataFrame ## assays() - convert ExperimentList to a SimpleList of matrices ## exportClass() - save data to flat files ``` --- ### Annotation 1. We don't have to limit our data to raw sequencing outputs -- we can draw useful taxonomic, genomic, or metabolomic annotation from public databases. 1. Commonly used databases include, - Bacterial Taxonomy: `greengenes`, `MGRast` - Gene and Gene Pathways: `ReactomePA`, `rbioapi` - Metabolites: `GNPS`, `biodbHmdb` --- ### Replicability and Reproducibility 1. Literate programming refers to the use of code interspersed with descriptive text. Reports written in this way emphasize that code is written for people -- not machines -- and support study reproducibility. 1. Special emphasis should be placed on being able to rerun and understand analysis scripts. Statistical corrections are important, but secondary to ensuring exact reproducibility. 1. It is worth sharing both raw and preprocessed study data, as well as the code used to generate them. - Raw reads, deposited in the NCBI's sequencing read archive - Integrated `phyloseq` or `MultiAssayExperiment` objects --- ```r library(curatedMetagenomicData) curatedMetagenomicData("AsnicarF_2017.relative_abundance", dryrun = FALSE) ``` ``` ## $`2021-10-14.AsnicarF_2017.relative_abundance` ## class: TreeSummarizedExperiment ## dim: 298 24 ## metadata(0): ## assays(1): relative_abundance ## rownames(298): ## k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli ## k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_bifidum ## ... ## k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus|s__Streptococcus_gordonii ## k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Aerococcaceae|g__Abiotrophia|s__Abiotrophia_sp_HMSC24B09 ## rowData names(7): superkingdom phylum ... genus species ## colnames(24): MV_FEI1_t1Q14 MV_FEI2_t1Q14 ... MV_MIM5_t2M14 ## MV_MIM5_t3F15 ## colData names(22): study_name subject_id ... lactating curator ## reducedDimNames(0): ## mainExpName: NULL ## altExpNames(0): ## rowLinks: a LinkDataFrame (298 rows) ## rowTree: 1 phylo tree(s) (10430 leaves) ## colLinks: NULL ## colTree: NULL ``` --- class: middle .center[ ## Statistical Methods ] --- ### Differential Analysis 1. We can quantify the strength of the association between taxonomic abundances and environmental features (e.g., treatment vs. control, deliberately perturbed timepoints). 1. The measurements for any one taxon can be quite skewed and sparse, so classical normal approximations are often quite poor. A number of tests that are designed specifically for the microbiome context [13; 14] 1. It is often worth trying several testing strategies and ensuring they give relatively consistent results. --- .pull-left[ ```r library(MicrobiotaProcess) data(mouse.time.mpse) mouse.time.mpse %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=time, first.test.alpha = 0.01) %>% mp_plot_diff_boxplot() ``` ] .pull-right[ <img src="20221128_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] --- ### Latent Structure 1. In contrast to taxon-level inference, we may be interested in variation across entire microbiota profiles. 1. Dimensionality reduction methods build a "map" across samples, where samples that have similar omic profiles are placed closer to one another. 1. This is useful because features are correlated. For example, though there may be many taxa present across samples, there may only be a few underlying compositional profiles. --- ```r ord <- ordinate(gp_subset) plot_ordination(gp_subset, ord, col = "SampleType") ``` <img src="20221128_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ### Network Inference and Hotspots 1. A network across features can provide a scaffolding for differential analysis -- for example, it is more informative to look for differentially expressed gene pathways than just genes. 1. Often, we need to construct the network from scratch. Most methods for this rely on a "guilt-by-association" principle: taxa that are always present or absent together may be part of some ecologically meaningful module. --- ```r make_network(gp_subset, type = "taxa", max.dist = 0.6) %>% plot_network() ``` <img src="20221128_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ### Longitudinal Analysis 1. Like the epigenome, the microbiome is dynamic, constantly changing in response to environmental shifts. 1. In studies where we have several measurements per host, we should model change across neighboring timepoints, rather than assuming all samples are independent. 1. Some useful packages are [15; 16; 17; 18]. --- <div class="figure" style="text-align: center"> <img src="figure/timeomics-overview.jpeg" alt="Figure from [18]." width="800" /> <p class="caption">Figure from [18].</p> </div> --- ### Questions for Discussion 1. Can you imagine any ways in which the microbiome may influence a system in which you are already interested? 1. Brainstorm a specific aim for a grant that investigates the role of the microbiome in your system of interest. What data sources or analysis strategies might be most important to your study, and why? --- ### References [1] P. J. Turnbaugh and J. I. Gordon. "The core gut microbiome, energy balance and obesity". In: _The Journal of physiology_ 587.17 (2009), pp. 4153-4158. [2] P. J. Turnbaugh, M. Hamady, T. Yatsunenko, et al. "A core gut microbiome in obese and lean twins". In: _nature_ 457.7228 (2009), pp. 480-484. [3] V. K. Ridaura, J. J. Faith, F. E. Rey, et al. "Gut microbiota from twins discordant for obesity modulate metabolism in mice". In: _Science_ 341.6150 (2013), p. 1241214. [4] A. W. Walker and J. Parkhill. "Fighting obesity with bacteria". In: _Science_ 341.6150 (2013), pp. 1069-1070. --- ### References [5] K. L. Glassner, B. P. Abraham, and E. M. Quigley. "The microbiome and inflammatory bowel disease". In: _Journal of Allergy and Clinical Immunology_ 145.1 (2020), pp. 16-27. [6] M. Lee and E. B. Chang. "Inflammatory Bowel Diseases (IBD) and the microbiome—Searching the crime scene for clues". In: _Gastroenterology_ 160.2 (2021), pp. 524-537. [7] M. R. Olm, D. Dahan, M. M. Carter, et al. "Robust variation in infant gut microbiome assembly across a spectrum of lifestyles". In: _Science_ 376.6598 (2022), pp. 1220-1223. [8] P. Ferretti, E. Pasolli, A. Tett, et al. "Mother-to-infant microbial transmission from different body sites shapes the developing infant gut microbiome". In: _Cell host & microbe_ 24.1 (2018), pp. 133-145. --- ### References [9] J. F. Cryan and S. K. Mazmanian. "Microbiota-brain axis: Context and causality". In: _Science_ 376.6596 (2022), pp. 938-939. [10] J. J. Bruckner, S. J. Stednitz, M. Z. Grice, et al. "The microbiota promotes social behavior by modulating microglial remodeling of forebrain neurons". In: _PLoS biology_ 20.11 (2022), p. e3001838. [11] P. J. McMurdie and S. Holmes. "phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data". In: _PloS one_ 8.4 (2013), p. e61217. [12] M. Ramos, L. Schiffer, A. Re, et al. "Software for the integration of multiomics experiments in bioconductor". In: _Cancer research_ 77.21 (2017), pp. e39-e42. --- ### References [13] N. Zhao, J. Chen, I. M. Carroll, et al. "Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test". In: _The American Journal of Human Genetics_ 96.5 (2015), pp. 797-807. [14] S. Wang, T. T. Cai, and H. Li. "Hypothesis testing for phylogenetic composition: a minimum-cost flow perspective". In: _Biometrika_ 108.1 (2021), pp. 17-36. [15] J. D. Silverman, H. K. Durand, R. J. Bloom, et al. "Dynamic linear models guide design and analysis of microbiota studies within artificial human guts". In: _Microbiome_ 6.1 (2018), pp. 1-20. [16] V. Bucci, B. Tzen, N. Li, et al. "MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses". In: _Genome biology_ 17.1 (2016), pp. 1-17. --- ### References [17] D. Ruiz-Perez, J. Lugo-Martinez, N. Bourguignon, et al. "Dynamic bayesian networks for integrating multi-omics time series microbiome data". In: _Msystems_ 6.2 (2021), pp. e01105-20. [18] A. Bodein, M. Scott-Boyer, O. Perin, et al. "timeOmics: an R package for longitudinal multi-omics data integration". In: _Bioinformatics_ 38.2 (2022), pp. 577-579.