Whether it’s a how-to video on YouTube, a short corporate training segment, or a feature-length documentary, producing video content is a major undertaking with lots of moving parts. As a video editor, you play a crucial role in the post-production process and what the consumer ultimately sees. Ideally you are devoting most of your time to the artistic process of determining what to include versus cut, but it can be easy to get lost in technical processes and details.
One key way of both streamlining your editing process and expanding your viewership is by leveraging video transcriptions. In this guide, we will explore why video transcriptions should be a fundamental component of your work, how they can be used, and what tools are available to create them. Scroll down or click one of the sections below to read more.
Video transcription is the process of converting video content into a text format. Video transcriptions can then be used for a variety of purposes such as captioning video for a hearing-impaired audience, subtitling video with different languages, streamlining the post-production editing process, or creating text-only versions of video that are more easily indexed and therefore more discoverable online for search engine optimization (SEO) purposes.
Captioning is typically used to provide deaf or hearing-impaired communities as rich a viewing experience as possible. Video captioning assumes that viewers cannot hear the audio. Sometimes called same-language subtitling, video captioning denotes not only dialogue but also other relevant audio content such as soundtracks and background noise in text.
Text usually appears white in a black box at the base of the screen, and non-dialogue content is typically presented in brackets (e.g., [knocking on door], [violin music begins]). Captions are time-synchronized so that the audience can read the text as that same content is being spoken on video.
Video captions may be closed, open, or live. Closed captioning refers to captions that viewers can choose to turn on or off. In contrast, open captioning is always visible and cannot be turned off by viewers. Live captioning occurs during news, sporting events, or other live broadcasts. A stenographer listens to the broadcast and types what he or she hears into a specialized device and computer program so that captions appear just seconds after something is spoken.
Subtitles assume that the audience can hear the audio content but does not understand the dialogue because it is in an unfamiliar language. Subtitles translate dialogue content into a different language but don’t include descriptions of background noise, music, or other audio cues. For instance, an English-speaker viewing a French movie on Netflix can turn on subtitles to read all the dialogue in English. Like captions, subtitles can be either closed (i.e., optional) or open (i.e., permanent).
To summarize, captioning assists audiences who are deaf, hard-of-hearing, or who must mute a video’s audio content; subtitling translates video content into a viewer’s native or preferred language. Because of this, audio cues and background noises are denoted in brackets for captions but these are omitted for subtitles.
Subtitles also tend to have greater flexibility with fonts, colors, and positioning than captions do. While white text with a black rim or shadow is most common for subtitles, this can be altered. Similarly, their position is most commonly found at the lower portion of the screen but is also more easily altered.
While captions and subtitles are the most common application of video transcriptions, this text content can also be used outside of video editing. People will often transcribe videos to improve the searchability of their video content online. Because search engines do not index audio or visual content, creating video transcriptions help potential viewers discover content more easily because of improved searchability and accessibility of video with transcripts in a site.
As a video editor, your work with video transcriptions can span a wide variety of sectors and specialties. Here are some of the most common uses of transcribed video:
As a video editor, you typically have to condense a large amount of raw footage into featured content that is much shorter. Video transcriptions are one way of streamlining this process and making the editing process more efficient. Transcripts help you locate specific scenes or soundbites and facilitate paper edits.
A paper edit is a time-coded list of the segments you want to incorporate in the order you plan on using them. This list can be paired with notes on associated footage you plan on including (e.g., B-roll footage of interviewee eating at a restaurant). Creating a good paper edit can be a major challenge, especially if you are juggling large quantities of interview footage. Accurate transcriptions make it easier to scan through, highlight, edit, and re-order content during paper edits.
Improving accessibility to videos for people with disabilities via captioning is not only business-savvy and the right thing to do, it’s also the law. The American Disabilities Act (ADA) and Section 508 require that any content developed, purchased, or distributed by the federal government must be accessible to people with disabilities. By creating captions with video transcriptions, you ensure 508-compliance for the hearing-impaired and deaf community.
Video transcripts also play a major role in search engine optimization (SEO) because search engines do not index audio or video files. By transforming video to text, you improve its searchability. For instance, academics can transcribe conference presentations to increase exposure to their findings. Webinars, vlogs, speeches, sermons, and how-to videos are just some of the other source materials that gain SEO benefits from video transcription.
Social networks such as Facebook play videos without sound by default. Using video transcripts to create captions increases these embedded videos’ visibility, particularly when people view them in locations such as airports or hospitals where full volume viewing would be disruptive.
By improving both accessibility and visibility for videos, you ultimately increase total viewership.
File types - As a video editor, you may encounter a variety of file types when using transcripts. File types vary on multiple levels:
Depending on the nature of your video editing project, you may use different types of tools to generate video transcriptions. The most basic options use speech recognition software to automatically transcribe content while the most sophisticated rely more heavily on human transcribers.
When doing small scale projects, you may be able to tap into existing voice recognition software for free using your phone or computer. For instance, select “voice typing” on Google Docs on your computer while playing the video. Alternately, use the microphone on a word processing app on your phone to transcribe the recording while it plays.
Automatic transcription can also be done using paid versions of software programs (e.g., Adobe’s Premiere Pro, InqScribe). These can be purchased and downloaded onto your computer. Alternatively, you can upload your files to a web-based service (e.g., Trint, Rev) which use AI-based automated transcription. These services’ rates will vary depending on your content and additional features you want (e.g., human editing of automated transcripts).
While it is possible to make straightforward transcripts from video content with applications like these, the transcripts they generate are difficult to use as captions or subtitles without timestamps. If you’re publishing videos to a platform like YouTube, captions and subtitles will be generated automatically and you won’t have to transcribe the audio first. You will almost certainly have to edit them, however.
All these forms of speech recognition software share similar drawbacks; background noise, jargon, dialects and accents, slurring, mispronunciation, or mumbled words can all negatively affect accuracy. These tools almost always require manual editing to ensure there is no misrepresentation of the video. While they may be suitable for a simple how-to video, they will be frustrating to use for something like a documentary.
Depending on the length and complexity of the video you are editing, manually editing an automated transcript may become so time-intensive that using video transcription services makes more sense. Subtitling services and human transcriptionists can provide much greater accuracy and offer more advanced tools to properly sync text with the corresponding visual content.
For instance, TranscriptionWing™ offers a 3-stage proofing process to ensure both the accuracy of the transcript and the precision of time-synchronization. Timestamps are typically included and human transcriptionists are far better at differentiating speakers, understanding subtle changes in accents, and describing important audio cues for captions. They can also offer greater customization and more easily accommodate different file types.
Creating accurate and effective subtitles and captions may sound straightforward, but the process is often more art than science. Captions must parallel a viewing experience with sound while remaining short enough to be readable. Subtitles and captions must match the timing of the dialogue while being placed without blocking important imagery. Following are some core guidelines for using transcripts for captions and subtitles:
If you’re not already using video transcriptions during your editing process or to create captions and subtitles, then you’re missing a major source of time-savings and means of adding value to your work. Avoid headache during paper edits, increase viewership of your work by expanding its accessibility, and preserve more time for your artistic eye than irritating technicalities. Video transcriptions are just a small component of your work but they will move you one step closer to seeing the forest through the trees.