PSL Intensive Week

Machine Learning in Genomics

As part of the special “transverse program” of PSL, we organize with the help of PSL a special 1 week course on the topic of “Machine Learning in Genomics”. This course is open in priority to Master 2 students of PSL. It is also open to other students (Master and PhD) and researchers, subject to availability.

Important: Master students should check with their master’s administration if this course can be used to validate one of their master course.

Dates and location

Dates: 29 mars to 2 avril.

Location: Paris-Dauphine University.

Pre-register to the Courses

Pre-registration is free but mandatory.

PSL students have priority if they pre-register before January 31st.

Courses Content

This week-long course will be split into three types of classes: theory, practice and project. During the project session, students will work in small groups toward a case study of practical importance. The final examination of the course will be a short presentation of the projects.


Day 1: 29/03, Morning:

  • Biologists: 9h-12h Andrei Zinovyev introduction to machine learning
  • Math/Machine Learning students: 9h-12h Veronique Stoven introduction to genomics

Day 1: 29/03, Afternoon: Flora Jay, Jean Cury (population genetics)

  • 14h-15h Intro to population genetics problematics and datasets, focus on selection or demographic inference, presentation of inference methods based on summary statistics versus SNP data (ABC, NN).
  • 15h-17h hands-on session
  • 17h-18h Presentation of the project for final evaluation and creation of groups

Day 2: 30/03, Morning: Camille and Franklin (disease variant identification)

  • 9h-10h: Introduction to genomic medicine and population sequencing, problematics of functional variant identification for human health, introduction to decision trees, random forests and neural networks and applications to genetic disease variants identification, validation methods
  • 10-12h: hands-on session using random forests and NNs to train classifiers and identify candidate genetic variants involved in human diseases

Day 2: 30/03, Afternoon: Laura and Anais (multi-omics integration: dimensionality reduction)

  • 14h-15h Introduction on multi-omics integration in biology, multi-omics dimensionality reduction (special focus Matrix factorisation, small picture of AE) and main existing tools.
  • 15h-17h hands-on session using MOFA to integrate multi-omics data
  • 17h-18h Groups working on project for final evaluation

Day 3: 31/03, Morning: Laura and Anais (multi-omics integration: Networks)

  • 9h-10h Network science introduction + main networks in biology (inference + measured networks) classical measures and algorithms + RWR + MOGAMUN
  • 10h-12h hands-on session multiplex topology, communities, RWR and visualisation with muxviz

Day 3: 31/03, Afternoon:

  • 14h-18h Q&A projects and general questions

Day 4: 01/04, Morning: Chloe and Vivien (feature selection, GWAS)

  • 9h-10h Genome-Wide Association Studies, multiple hypotheses testing, lasso
  • 10h-12h Hands-on session part 1: GWAS of two Arabidopsis thaliana phenotypes

Day 4: 01/04, Afternoon: Chloe and Vivien

  • 14h-15h Other regularizers (elastic net, graph-based regularization, multitask approaches)
  • 15h-17h Hands-on session part 2: GWAS of two Arabidopsis thaliana phenotypes
  • 17h-18h Groups working on project for final evaluation

Day 5: 02/04, Morning: Paul (image analysis, combining images with other omics)

  • 9h-10h : Introduction to imaging for the study of gene expression - Microscopes, in situ hybridization, gene reporters… Methods for extracting information from large image datasets, coupling single cell RNASeq with microscopy. Data integration as a 1) semi-supervised learning problem 2) optimal transport problem 3) domain translation with autoencoders
  • 10h-12h data integration from multiple heterogeneous datasets or inference of spatio-temporal dynamics from single cell RNASeq

Day 5: 02/04, Afternoon: Paul with talk from Thomas Walter

  • 14h-15h Thomas Walter
  • 15h-17h data integration from multiple heterogeneous datasets or inference of spatio-temporal dynamics from single cell RNASeq
  • 17h-18h Groups working on project for final evaluation