CERN Academic Training subtitles' post-processing with MLLP

Project name

Correct automatically transcribed videos with MLLP - Machine Learning Language Processing tool

Project description

There are thousands of videos recorded at CERN via the IT recording and transcoding infrastructure.

We have colleagues who can’t hear. Elementary diversity awareness requires that we equip all CERN-made videos with subtitles. Moreover, prolonged teleworking due to COVID-19 deprived people with hearing impediments from speaker’s lips’ and body-language reading and guessing. This is why the need for automatic live transcription became more urgently needed.

A Call for  Tender process recently ended with the selection of MLLP (Machine Learning Language Processing) for a live and offline transcription and translation. The tool offers possibilities of great quality subtitles of the CERN lectures.

Still, like every automatic transcription, post-processing is necessary to equip the lectures' backlog with totally flaw-less subtitles.

The student will use the MLLP interface to fix the automatically-generated subtitles. S/He may also make a proposal for the integration of a dictionary of terms, specific to a given discipline, e.g. the word “luminosity” in the particle physics world, means something different from common english usage.

Required skills

Excellent (C2) level of english, good knowledge of physics and computing terms, familiarity with media technologies for the editing part, team spirit in the work with service managers and organisational skills for dispatching the post-processed lectures to the sponsors for checking.

Good knowledge of french can be an asset, if time permits to also check the french transcription. English is an absolute priority because the lectures are given in english. The french text is a result of automatic translation.

Learning experience

All the lectures are a wealth of knowledge and information on themselves.
The MLLP tool has excellent operational quality and performance, developed by highly competent developers.
All interactions with users and service managers will be very rewarding for future professional engagements.

Project duration

2-3 months

Project area

Data Analytics Learning

Contact for further details

Maria Dimou

References

  1. The Academic Training lecture index https://indico.cern.ch/category/72/
  2. About MLLP https://ttp.mllp.upv.es/index.php?page=faq
  3. Example of MLLP transcribed video for post-processing https://video-player-sec.web.cern.ch/?mode=contribution&year=2022&id=1050132&origin=ceph

CERN group

IT-CDA

Status

Cancelled Submitted by Maria Dimou on Monday, January 24, 2022 - 17:38.
Student info
CERN supervisor

Maria Dimou

Thesis
Defence status
other