Skip to main content Link Search Menu Expand Document (external link)

Lectures and activities in this course will draw from the following collection of readings. The readings draw from the statistics, machine learning, and computational molecular biology communities. We have included both modern references and classic papers in genomics data analysis. Based on student interests, we will choose 1 - 3 papers a week.

Table of contents
  1. Scientific Context
  2. Data Management
    1. Data Structures
    2. Processing and Visualization
  3. Hypothesis Testing
    1. Overviews
    2. Proposals
  4. Prediction and Dimensionality Reduction
    1. Unsupervised
    2. Supervised
  5. Generative Modeling
  6. Networks
  7. Multi-Study Analysis
    1. Meta-Analysis
    2. Causality, Bayes, and Transfer Learning

Scientific Context

  1. Excerpt from “The Song of the Cell”
  2. Computational principles and challenges in single-cell data integration
  3. Integrative single-cell analysis
  4. Statistics requantitates the central dogma

Data Management

Data Structures

  1. Muon: multimodal omics analysis framework
  2. Software for the Integration of Multiomics Experiments in Bioconductor

Processing and Visualization

  1. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R
  2. Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq
  3. Fast, sensitive and accurate integration of single-cell data with Harmony
  4. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

Hypothesis Testing

Overviews

  1. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments
  2. A practical guide to methods controlling false discoveries in computational biology

Proposals

  1. Clipper: p-value-free FDR control on high-throughput data from two conditions
  2. Exaggerated false positives by popular differential expression methods when analyzing human population samples
  3. High-sensitivity pattern discovery in large, paired multiomic datasets
  4. A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
  5. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments

Prediction and Dimensionality Reduction

Unsupervised

  1. Bayesian group factor analysis with structured sparsity
  2. Dimension reduction techniques for the integrative analysis of multi-omics data
  3. Multi-Omics Factor Analysis – framework for unsupervised integration of multi-omics data sets
  4. A Generic Multivariate Framework for the Integration of Microbiome Longitudinal Studies with Other Data Types
  5. Jointly Embedding Multiple Single-Cell Omics Measurements
  6. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types
  7. Integrated principal components analysis
  8. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations
  9. Alignment of spatial genomics and histology data using deep Gaussian processes
  10. Inference after latent variable estimation for single-cell RNA sequencing data
  11. SPEAR: a Sparse Supervised Bayesian Factor Model for Multi-omic Integration
  12. Quantifying common and distinct information in single-cell multimodal data with Tilted-CCA

Supervised

  1. Sparse partial least squares regression for simultaneous dimension reduction and variable selection
  2. Feature selection for data integration with mixed multiview data
  3. Cooperative learning for multiview analysis
  4. Efficient multivariate linear mixed model algorithms for genome-wide association studies
  5. mixOmics: An R package for ‘omics feature selection and multiple data integration
  6. Joint association and classification analysis of multi-view data
  7. Multimodal Single-Cell Translation and Alignment with Semi-Supervised Learning
  8. Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis

Generative Modeling

  1. Simultaneous deep generative modelling and clustering of single-cell genomic data
  2. Joint probabilistic modeling of single-cell multi-omic data with totalVI
  3. Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression
  4. Spatially Aware Dimension Reduction for Spatial Transcriptomics
  5. Dynamic Bayesian Networks for Integrating Multi-omics Time Series Microbiome Data
  6. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO
  7. Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
  8. FreeHi-C spike-in simulations for benchmarking differential chromatin interaction detection
  9. Multimodal Learning with Deep Boltzmann Machines

Networks

  1. Inferring Interaction Networks From Multi-Omics Data
  2. Network modeling in biology: statistical methods for gene and brain networks
  3. Computational inference of gene regulatory networks: Approaches, limitations and opportunities
  4. Interpretation of network-based integration from multi-omics longitudinal data
  5. Network Modeling of Topological Domains using HI-C Data
  6. Integrative Spatial Single-cell Analysis with Graph-based Feature Learning

Multi-Study Analysis

Meta-Analysis

  1. Multiple-laboratory comparison of microarray platforms
  2. Meta-analysis methods for multiple related markers: Applications to microbiome studies with the results on multiple α‐diversity indices
  3. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses
  4. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models
  5. Accessible, curated metagenomic data through ExperimentHub

Causality, Bayes, and Transfer Learning

  1. A Unified Framework for Association Analysis with Multiple Related Phenotypes
  2. Anchor regression: Heterogeneous data meet causality
  3. Data denoising with transfer learning in single-cell transcriptomics
  4. scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning
  5. Integration and transfer learning of single-cell transcriptomes via cFIT
  6. INFIMA leverages multi-omics model organism data to identify effector genes of human GWAS variants