Deploy subtitles as a service for CERN videos

Project name

Deploy subtitles as a service for CERN videos

Project description

2021 version of the proposal

After the Tender selection process completes for a live and offline transcription and translation product,
help to post-process and automate the  enhancement of the lectures' backlog with subtitles.
See here categories of candidate lectures.



2020 version of the proposal (TECH approved by IT and HR, rejected by the service manager):

The CERN IT Collaboration, Devices & Applications group and in particular sections Digital Repositories (IT CDA/DR) and Integrated Collaboration (IT CDA/IC), run many highly visible and popular services, which enable researchers/institutions to share and preserve their research data, software and publications as well as meet, present and record lectures, projects, plans and decisions of academic content and very large experiment collaborations.

The CERN Document Server (CDS) is the official document repository for the laboratory and annually serves around 2 million visitors. There are thousands of videos recorded at CERN via the CDA/IC recording and transcoding infrastructure. They are uploaded and viewable via CDS or the recent videos' portal.

We need to equip all CERN-made videos with subtitles. This project is about turning the transcription software, to be selected by the relevant CERN CDA service managers, into a scalable service that automatically introduces and displays subtitles in CDS for the CERN community. The process should be well integrated with our new video player and the set-up should allow to apply text corrections by the content owner (lecture, meeting, conference organiser) in a functional way.

Required skills

Programming, computer system management, media technologies, excellent (C2) level of english and french, team spirit in the work with service managers and organisational skills for the dissemination of the workflow to the content owners.

Learning experience

Programming, data management and storage methods used at CERN, some notions of computational and educational linguistics, diversity aspects, tips from the media world, work with very large collaborations, attractive web design.

Project duration

12-14 months

Project area

Data Analytics Learning

Contact for further details

Maria Dimou


Investigation on subtitles internal note (requires CERN login).

CERN group



Submitted Submitted by Maria Dimou on Wednesday, February 5, 2020 - 09:30.
Thesis type