Machine learning preparatory week @PSL
Discord server
The afternoon are dedicated to practical sessions using Python. Students will be on their own under a weak supervision from the teachers. Students can use the discord server chat to communicate, share information, codes, data and help each other during these session.
These practical sessions will necessitate the use of Python 3 with the standard Scipy ecosystem, Scikit-learn and Pytorch. They will make use of Jupyter notebooks. The easiest way to proceed is to have a gmail account and make use of a remote Google Colab to run the notebooks.
Expected Program
This program is a first draft. It can change. The basic pattern is: course in the morning and labs in the afternoon. Check for updates with the teachers on the first day. You are required to bring your computers for the practical sessions.
Day 1 (Wednesday August 28, 2024):
- 9:00–10:30: (course) Machine learning: recent successes.
- 11:00-12:30: (course) Introduction to machine learning.
- 14:00-17:30: (lab session) Introduction to Python and Numpy for data sciences.
Day 2 (Thursday August 29, 2024):
- 9:00–10:30: (course) Machine learning models (linear, trees, neural networks).
- 11:00-12:30: (course) Scikit-learn: estimation/prediction/transformation.
- 14:00-17:30: (lab session) Practice of Scikit-learn.
Day 3 (Friday August 30, 2024):
- 9:00-12:30: (course) The linear model, optimization
- 14:00-17:30: (lab session) Logistic regression with gradient descent.
Day 4 (Monday September 2, 2024):
- 9:00-10:30: (course) Introduction to Deep-Learning
- 11:00-12:30: (course) Introduction to unsupervised learning
- 14:00-17:00 (lab session) Practical session
Day 5 (Tuesday September 3, 2024):
- Dario Colazzo (course/lab session) Spark for ML, part 1 and 2
Lectures
Machine learning part
- Machine learning: history, application, successes
- Introduction to machine learning
- Supervised machine learning models
- Scikit-learn: estimation and pipelines
- Optimization for linear models
- Optimization for machine learning
- Deep learning: convolutional neural networks
- Unsupervised learning
Spark and Machine Learning
Practical works
Links open Colab notebooks. You may also clone this repository and work locally.
- Wednesday: Python basics and the Corrected notebook
- Thursday: Practice of Scikit-learn
- Preliminaries
- intro (corrected)
- basic principles (corrected)
- SVM
- Regression Forests (corrected)
- PCA
- Clustering
- GMM
- Validation (corrected)
- Pipeline
- Friday: Optimization and the Corrected notebook
- Monday: Classification with PyTorch and GPUs
Teachers
- Come Fiegel (ENSAE)
- Hugo Richard (Criteo)
- Dario Colazzo (Dauphine Université)
- Thierry Kirat (Dauphine Université)
Acknowledgements
The slides and notebooks were originally written by Pierre Ablin, Mathieu Blondel and Arthur Mensch.
Some material of this course was borrowed and adapted:
- The slides from “Deep learning: convolutional neural networks” are adapted from Charles Ollion and Olivier Grisel’s advanced course on deep learning (released under the CC-By 4.0 license).
- The first notebooks of the scikit-learn tutorial are taken from Jake Van der Plas tutorial.
License
All the code in this repository is made available under the MIT license unless otherwise noted.
The slides are published under the terms of the CC-By 4.0 license.