CERN Academic Training subtitles' post-processing with MLLP
Project name
Correct automatically transcribed videos with MLLP - Machine Learning Language Processing toolProject description
There are thousands of videos recorded at CERN via the IT recording and transcoding infrastructure.
We have colleagues who can’t hear. Elementary diversity awareness requires that we equip all CERN-made videos with subtitles. Moreover, prolonged teleworking due to COVID-19 deprived people with hearing impediments from speaker’s lips’ and body-language reading and guessing. This is why the need for automatic live transcription became more urgently needed.
A Call for Tender process recently ended with the selection of MLLP (Machine Learning Language Processing) for a live and offline transcription and translation. The tool offers possibilities of great quality subtitles of the CERN lectures.
Still, like every automatic transcription, post-processing is necessary to equip the lectures' backlog with totally flaw-less subtitles.
The student will use the MLLP interface to fix the automatically-generated subtitles. S/He may also make a proposal for the integration of a dictionary of terms, specific to a given discipline, e.g. the word “luminosity” in the particle physics world, means something different from common english usage.
Required skills
Excellent (C2) level of english, good knowledge of physics and computing terms, familiarity with media technologies for the editing part, team spirit in the work with service managers and organisational skills for dispatching the post-processed lectures to the sponsors for checking.Good knowledge of french can be an asset, if time permits to also check the french transcription. English is an absolute priority because the lectures are given in english. The french text is a result of automatic translation.
Learning experience
All the lectures are a wealth of knowledge and information on themselves.The MLLP tool has excellent operational quality and performance, developed by highly competent developers.
All interactions with users and service managers will be very rewarding for future professional engagements.
Project duration
2-3 monthsProject area
Data Analytics LearningContact for further details
Maria DimouReferences
- The Academic Training lecture index https://indico.cern.ch/category/72/
- About MLLP https://ttp.mllp.upv.es/index.php?page=faq
- Example of MLLP transcribed video for post-processing https://video-player-sec.web.cern.ch/?mode=contribution&year=2022&id=1050132&origin=ceph
CERN group
IT-CDAStatus
Cancelled Submitted by Maria Dimou on Monday, January 24, 2022 - 17:38.Maria Dimou