e-learning - IT Collaboration, Devices & Applications - Insert subtitles in video tutorials
Project name
Insert subtitles in existing tutorials of the CERN Document Server e-learning collectionProject description
Use a free tool to convert existing plaintext files, containing the exact script of our short online e-learning videos, into .vtt files, in view of introducing subtitles.
Method:
- Click on each Indico event from The List below. It contains the link to the Recording and the script as attached file with .txt extension.
- Copy the .txt file locally.
- Open a text converter and enter the file you just copied. The exact workflow in full detail is HERE. There is a video tutorial about the process in https://indico.cern.ch/e/737353/. The basic steps involve:
- Open http://www.aegisub.org/ to watch the video and manually select where to chop. Aegisub will convert the <script>.txt to script.srt
- Use webvtt.org to convert the <script>.srt into subtitles_en.vtt
- Optional: Upload the <video>.mp4 of the video (follow the CDS link available from the same Indico page where you found the script) to https://cdslabs-qa.cern.ch/ Example: https://cdslabs-qa.cern.ch/record/2241210 Watch the video on cdslabs-qa.cern.ch to check that the subtitles correspond to the correct time period in the video, hence the timestamps values in the .vtt file are correct. NB! cdslabs-qa is a temporary playground and can be unavailable! Anyway, aegisub gives sufficient warranty that the synchronisation of frames and subtitles is correct.
- Check for strange characters via this tool. Strange characters can break the CDS index, so they should not be left in by mistake.
- Upload the newly created .vtt file to the same event for CDS re-publishing. THIS ticket explores how this is done. This event contains the conclusions of the investigation on publishing and the exact filepaths in CERN MediaArchive and fields in CDS.
We cannot equip the videos containing 2 separate channels (camera & slides) with subtitles before the collection is moved from the current CDS location to https://videos.cern.ch/ at the timescale decided by the CDS team. Nevertheless, we do have the .vtt files ready. Conclusions after completion and alternative methods to produce subtitles can be found in: https://twiki.cern.ch/Edutech/AboutSubtitlesEntry#Description
Comment by Kyle Dawson: An alternative method requires that the videos first be uploaded to YouTube. You do not need to publish the videos to the outside world, you can just upload the videos to YouTube privately for the sake of making subtitles. YouTube then rapidly generates subtitles, applying the correct timestamps to the correct place. This saves a lot of time as this takes a while to do in Aegisub. Then, if there are a few typos due to accents in speech, you can manually edit the text. This is all very user friendly and easy to do. You can then export the subtitles STRAIGHT into a .vtt file, which again is much quicker than user Aegisub as we don't need to use any file converters. Uploading the videos into YouTube privately and then following this method is much easier and more efficient than using Aegisub.
The List:
Required skills
The student should be comfortable with web browsing, searching and using an editor.Learning experience
Collaboration in a large technical group. Cross-project exchanges (e-learning and CDS).Project duration
1 monthProject area
LearningContact for further details
Maria DimouReferences
-
vtt syntax: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API
-
Aegisub.org evaluation by Kyle Dawson: The Aegisub software is easy to use and we can easily convert a '.txt' file into an '.srt' file. It consists of uploading the '.mp4' video to Aegisub and then adding the subtitles in manually, by creating 'time slots' in which you add your desired subtitles from the '.txt' file. This is done manually, therefore adding subtitles to longer videos can take some time. Once the subtitles are added to the video, the file can be saved as an '.srt' file. After this point, there are no instructions on how to convert the '.srt' file into a '.vtt' file. We must therefore convert the '.srt' back into a '.txt' and use an online conversion to get it into a '.vtt'.
-
Plaintext to vtt converter: http://www.vttcaptions.com/the-caption-generator.html Evaluation by Kyle Dawson: The VTTCaptions website is not easy to use and is not practical for the type of tasks we intend to use it for. It random splits the '.txt' file into sentences and estimates the time between the time slots of each sentence. You must then manually change the timeslots to the correct time. This is not at all practical for our intended purposes.
-
This page indicates some free tools to aid with the creation of vtt files: https://www.ustream.tv/blog/streaming-product-updates/webvtt-captioning-subtitle-support/#services
-
Character encoding detection: https://nlp.fi.muni.cz/projects/chared/
-
The CERN Document Server (CDS) collection where the tutorials are published: https://cds.cern.ch/collection/E-learning%20modules?ln=en
-
Example of a video with subtitles in another CDS collection: https://videos.cern.ch/record/2245558 On the right side in the download panel, you will see `Subtitles`. Clicking on this link will show 2 available subtitles: English and French. You can download either of them and you can open it with any text editor and you will see what is the format. If you play the video you will see on the bottom right corner a small icon indicating the subtitles available.
-
The location containing the exact plaintext scripts to precede by timestamps: https://indico.cern.ch/category/7442/
-
Internal to CERN IT page of CERN Child request for this project: https://espace.cern.ch/hr-child/2017/2018/Lists/Child%20of%20Staff%20Program%20Projects%202013/AllItems.aspx
CERN group
IT-CDAStatus
Accomplished Submitted by Maria Dimou on Monday, June 4, 2018 - 18:10.Kyle Richard Dawson
CERN Child
Maria Dimou
Project finished 06 Jul 2018