Several image processing tools developed in the AdMire project make use of artificial intelligence based on machine learning. Examples of such tools are the super resolution and foreground extraction. What is one of the most important aspects when dealing with artificial intelligence based on machine learning? In contrast to what many may think, it is not only in the design of efficient models. A key issue is in the access to suitable datasets to train the model.

Any model can be as good as the data that has been used to train it. Training a machine learning model requires a particular attention when capturing data. It should take into account many different configurations, and it should be as large as possible. Although training a model with a small number of samples is faster and could lead to sufficient accuracy in some cases, in most cases it will result in inefficient solutions because of lack of diversity in the dataset.

In the AdMiRe project, the main challenge is to have a dataset consisting of high-resolution video, with corresponding ground-truth where necessary. To reach excellent extraction of the foreground for instance, the shape of the silhouette of the extracted person should be as precise as possible without missing any part of the person, nor including any portions of the background. Furthermore, the dataset should contain content in line with typical use cases for which the tools are designed, in as many different conditions as possible, such as various illumination conditions, different types of cameras, different backgrounds, different postures of the foreground person, different actions and so on.

.

Example of a typical content in the dataset produced by EPFL for AdMiRe project. This video has been captured with a webcam, has a resolution of 1280 X 720 pixels, compressed by Apple ProRes to a bit rate of 65 Mb/s and a framerate of 29 f/s.

Lionel Desarzens / EPFL