As part of the special “transverse program” of PSL, the DHAI group organizes with the help of PSL a special 1 week course on the topic of “Digital Humanities Meet Artificial Intelligence”. This course is open in priority to Master 2 students of PSL. It is also open to other students (Master and PhD) and researchers, subject to availability.

This intensive training will cover theoretical, numerical and applicative topics at the intersection between these two fields. The structure of the course will be quite innovative, since it will interplay theoretical course, practical sessions with computer labs and projects in small groups.

Important: Master students should check with their master’s administration if this course can be used to validate one of their master course.

Dates and location

Dates: March 29 to April 2.

Location: TBA

Pre-register to the Courses

Pre-registration is free but mandatory.

PSL students have priority if they pre-register before January 31st.

Courses Content

This week-long course will be split into three types of classes: theory, practice and project. During the project session, students will work in small groups toward a case study of practical importance. The final examination of the course will be a short presentation of the projects.

List of courses:

Course 1: Léa Saint-Raymond, Quantitative data analysis and cartography.
Course 2: Mathieu Aubry, Computer vision for the humanities.
Course 3: Béatrice Joyeux-Prunel, Art History and AI: Data Science applied to Visual Contagions.
Course 4: Jean-Baptiste Camps and Thierry Poibeau, Introduction to Digital Philology and natural language processing.
Course 5: Ségolène Albouy and Matthieu Husson, History of astronomy and AI.

The practical sessions will feature:

Python programming for AI.
Digital Sources and Tools for the Humanities.

Examples of projects include:

Computer vision and digital heritage.
Computational table exploration for Alfonsine astronomy.
Automatic discovery of chronologic and geographic circulation of artistic styles.
Improving the GROBID software for semantic description of semi-structured data.
…

Schedule

TBA.

Detailed contents

Course 1: Léa Saint-Raymond, Quantitative data analysis and cartography:

This course offers training in computational data analysis. Students will receive the following theoretical and practical basics:

Basic statistics and econometrics
Factor analysis
Network Analysis
Cartography

At the end of this training, students will be able to explore a corpus of data and analyze it from a quantitative and relational perspective. They will master the following software:

R
QGis
Gephi
and, for the exploratory part, Palladio

The project will take as its starting point an exhaustive corpus of exhibitions that took place in Toulouse between 1907 and 1939. The students will question the social and prosopographic logics of the exhibitors (the “artistes méridionaux”), analyze the sale price of the exhibited works, and conduct a textual and geographical analysis of the titles.

Course 2: Mathieu Aubry, Computer vision for the humanities:

Introduction to Computer Vision with a specific focus on Deep Learning. We will introduce the basic principles of Machine Learning and Neural Networks for Computer Vision applications. We will outline the specific difficulties of applications to historical and artistic data, standard use cases in digital humanity (image search, document segmentation, image recognition) and examples of specific projects on artwork price prediction, historical watermark recognition, pattern recognition and discovery in artwork datasets.

Course 3: Béatrice Joyeux-Prunel, Art History and AI: Data Science applied to Visual Contagions:

How do images circulate; what makes a visual blockbuster? Is it possible to study whether in globalisation, over the long term, cultures have converged - and to study it through images? We often speak of cultural globalisation as early as the 19th century, which accelerated after 1945, and that the regime of circulation of information and images has accelerated even more since the Internet. But little is known about the channels, factors, and speed of this globalisation. Are there times, places or cultures that have resisted cultural homogenisation? It is also difficult to explain why one image circulates more than another. Art history has been very interested in styles; but it does not deal much with globalisation other than through case studies. Today, the availability of unprecedented visual corpuses, whose metadata allow us to know where and when an image was seen, printed, or purchased, makes it possible to attempt a global analysis of these phenomena.
In the course Professor Joyeux-Prunel will show the digital strategies that have been deployed over the last ten years to study the global circulation of images; it will evaluate their results, their limits, and what remains to be done.
The workshop proposed in connection with the course aims at participating in “what remains to be done”. From digitized corpuses of illustrated prints and images, and/or internet images from social networks, and the application of vision algorithms (identification of duplicates, of styles, of patterns), we propose to the group to identify the circulation of patterns on a worldwide (or at least European) scale. Then, using statistical and data visualisation methods, it will be a question of analizing the history, the geography and the factors that may contribute to this global circulation of styles and visual patterns. Depending on the results obtained by the group, a prediction experiment could then be tested: predicting a potentially (statistically) “global” “successful” picture, which will be launched on social networks to test their effectiveness.

Course 4: Jean-Baptiste Camps and Thierry Poibeau, Introduction to Digital Philology and natural language processing:

For the “automatic language processing” part, we will present the notion of named entities, a well-known notion, which plays an important role for many applications in Digital Humanities. We will show that the analysis of entities is far from being a solved problem, the systems still lacking robustness in the face of the great diversity of data. Finally, we will present an overview of the analysis methods used (symbolic and statistical), with their respective advantages and disadvantages.
The computational philology course will focus on an introduction to stylometry, and more specifically to supervised methods. It will introduce general notions on the measure of style, and will present how it is possible to build author profiles to attribute disputed or anonymous texts, or to try to disentangle individual contributions in collaborative works.

Course 5: Ségolène Albouy and Matthieu Husson, History of astronomy and AI:

History of astronomy and AI: an overview: From Delambre (Histoire de l’astronomie ancienne, Paris,1817) through Neugebauer (History of Ancient Mathematical Astronomy, Berlin, 1975), the history of astronomy produced in Europe has a long relation to quantitative methods. Towards the end of the 1960s, this association intensified with the use of computers to assess ancient observations and computation methods (Poulle and Gingerich, 1968). Today, the rise of digital humanities and its coupling with AI opens new possibilities explored by different research projects. Based on a historiographical overview, this session will illustrate the current challenges emerging at the interface of history of astronomy, AI and Digital humanities with respect to artificial vision, text analysis and reenactment of ancient computations. We will discuss how AI can tackle ambitious challenges relating to issues as diverse as mathematical questions, genealogy of sources, transcription, diagram vectorization, and more.
A step aside:$ from document to data: Based on various examples from the previous session, we will explore how different datasets, exploitable by artificial intelligence algorithms, can be built to address, from historical sources, questions of various nature. We will evoke the stakes related to the transformation of a material document to its various digital representations: from the point of view of its acquisition, its modeling, its encoding, its “augmentation” or even its “simulation”. Finally, we will discuss what constitutes pertinent training datasets from a human sciences perspective as well as a machine learning perspective.

PSL Intensive Week

Dates and location

Pre-register to the Courses

Courses Content

Schedule

Detailed contents

Organizers