e-learning - IT Collaboration, Devices & Applications - Insert subtitles in video tutorials

Project name

Insert subtitles in existing tutorials of the CERN Document Server e-learning collection

Project description

Use a free tool to convert existing plaintext files, containing the exact script of our short online e-learning videos, into .vtt files, in view of introducing subtitles.

Method:

  1. Click on each Indico event from The List below. It contains the link to the Recording and the script as attached file with .txt extension.
  2. Copy the .txt file locally.
  3. Open a text converter and enter the file you just copied. The exact workflow in full detail is HERE. There is a video tutorial about the process in https://indico.cern.ch/e/737353/. The basic steps involve:
    1. Open http://www.aegisub.org/ to watch the video and manually select where to chop. Aegisub will convert the <script>.txt to script.srt
    2. Use webvtt.org to convert the <script>.srt into subtitles_en.vtt
    3. Optional: Upload the <video>.mp4 of the video (follow the CDS link available from the same Indico page where you found the script) to https://cdslabs-qa.cern.ch/  Example: https://cdslabs-qa.cern.ch/record/2241210 Watch the video on cdslabs-qa.cern.ch to check that the subtitles correspond to the correct time period in the video, hence the timestamps values in the .vtt file are correct. NB! cdslabs-qa is a temporary playground and can be unavailable! Anyway, aegisub gives sufficient warranty that the synchronisation of frames and subtitles is correct.
    4. Check for strange characters via this tool. Strange characters can break the CDS index, so they should not be left in by mistake.
  4. Upload the newly created .vtt file to the same event for CDS re-publishing. THIS ticket explores how this is done. This event contains the conclusions of the investigation on publishing and the exact filepaths in CERN MediaArchive and fields in CDS.

We cannot equip the videos containing 2 separate channels (camera & slides) with subtitles before the collection is moved from the current CDS location  to https://videos.cern.ch/ at the timescale decided by the CDS team. Nevertheless, we do have the .vtt files ready. Conclusions after completion and alternative methods to produce subtitles can be found in: https://twiki.cern.ch/Edutech/AboutSubtitlesEntry#Description

 

Comment by Kyle Dawson: An alternative method requires that the videos first be uploaded to YouTube. You do not need to publish the videos to the outside world, you can just upload the videos to YouTube privately for the sake of making subtitles. YouTube then rapidly generates subtitles, applying the correct timestamps to the correct place. This saves a lot of time as this takes a while to do in Aegisub. Then, if there are a few typos due to accents in speech, you can manually edit the text. This is all very user friendly and easy to do. You can then export the subtitles STRAIGHT into a .vtt file, which again is much quicker than user Aegisub as we don't need to use any file converters. Uploading the videos into YouTube privately and then following this method is much easier and more efficient than using Aegisub.

 

 

The List:

Video title  Indico Event
Active Presenter https://indico.cern.ch/e/574991/
CERNBox Client Installation https://indico.cern.ch/e/669067
CERNBox Share & Authenticated Share https://indico.cern.ch/e/667252
CERNBox Sync a Share https://indico.cern.ch/e/667253
CMS Glimos Instructions in french https://indico.cern.ch/e/588592
CMS Glimos Instructions in english https://indico.cern.ch/e/588590
CDS Functions https://indico.cern.ch/e/661557
CDS Introduction https://indico.cern.ch/e/661556
CDS Search Video https://indico.cern.ch/e/661559
CDS Submit Document https://indico.cern.ch/e/661558
CDS Upload Video https://indico.cern.ch/e/661560
EOS for Beginners https://indico.cern.ch/e/667379
Indico Conference Abstract Review https://indico.cern.ch/e/654658
Indico Conference Customisation https://indico.cern.ch/e/654665
Indico Conference Programme & Abstracts https://indico.cern.ch/e/654589
Indico Conference Registration https://indico.cern.ch/e/654661
Indico Lecture https://indico.cern.ch/e/631555
Indico Meeting https://indico.cern.ch/e/631554
Indico Reminders https://indico.cern.ch/e/654666
Indico Surveys https://indico.cern.ch/e/631560
Indico Vidyo use https://indico.cern.ch/e/656029
Indico Webcast/Recording booking https://indico.cern.ch/e/655293
Mac Self Service https://indico.cern.ch/e/577074
Mail2Print https://indico.cern.ch/e/577077
QuickTime https://indico.cern.ch/e/575600
Skype for Business https://indico.cern.ch/e/631571
Afs to EOS web migration https://indico.cern.ch/e/661564
Transcribe your video https://indico.cern.ch/e/658001
LHCathome Linux https://indico.cern.ch/e/506114
LHCathome Mac https://indico.cern.ch/e/536271
LHCathome Windows https://indico.cern.ch/e/539453

 

 

 

Required skills

The student should be comfortable with web browsing, searching and using an editor.

Learning experience

Collaboration in a large technical group. Cross-project exchanges (e-learning and CDS).

Project duration

1 month

Project area

Learning

Contact for further details

Maria Dimou

References

  1. vtt syntax: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API

  2. Aegisub.org evaluation by Kyle Dawson: The Aegisub software is easy to use and we can easily convert a '.txt' file into an '.srt' file. It consists of uploading the '.mp4' video to Aegisub and then adding the subtitles in manually, by creating 'time slots' in which you add your desired subtitles from the '.txt' file. This is done manually, therefore adding subtitles to longer videos can take some time. Once the subtitles are added to the video, the file can be saved as an '.srt' file. After this point, there are no instructions on how to convert the '.srt' file into a '.vtt' file. We must therefore convert the '.srt' back into a '.txt' and use an online conversion to get it into a '.vtt'.

  3. Plaintext to vtt converter: http://www.vttcaptions.com/the-caption-generator.html Evaluation by Kyle Dawson: The VTTCaptions website is not easy to use and is not practical for the type of tasks we intend to use it for. It random splits the '.txt' file into sentences and estimates the time between the time slots of each sentence. You must then manually change the timeslots to the correct time. This is not at all practical for our intended purposes.

  4. This page indicates some free tools to aid with the creation of vtt files: https://www.ustream.tv/blog/streaming-product-updates/webvtt-captioning-subtitle-support/#services

  5. Character encoding detection: https://nlp.fi.muni.cz/projects/chared/

  6. The CERN Document Server (CDS) collection where the tutorials are published: https://cds.cern.ch/collection/E-learning%20modules?ln=en

  7. Example of a video with subtitles in another CDS collection: https://videos.cern.ch/record/2245558 On the right side in the download panel, you will see `Subtitles`. Clicking on this link will show 2 available subtitles: English and French. You can download either of them and you can open it with any text editor and you will see what is the format. If you play the video you will see on the bottom right corner a small icon indicating the subtitles available.

  8. The location containing the exact plaintext scripts to precede by timestamps: https://indico.cern.ch/category/7442/

  9. Internal to CERN IT page of CERN Child request for this project: https://espace.cern.ch/hr-child/2017/2018/Lists/Child%20of%20Staff%20Program%20Projects%202013/AllItems.aspx

CERN group

IT-CDA

Status

Accomplished Submitted by Maria Dimou on Monday, June 4, 2018 - 18:10.
Student info
Student name

Kyle Richard Dawson

University

CERN Child

CERN supervisor

Maria Dimou

Thesis
Thesis type
Bachelor
Project started 11 Jun 2018
Project finished 06 Jul 2018
Defence status
other