Skip to content

Transcribe videos with google colab and openai whisper

Google colab provides powerful GPUs with 12G of VRAM even at free tier. I foud this very useful to first generate srt files of chinese youtube videos and then translate the chinese srt files into english srt files.

Sample ffmpeg command to extract audio from video

ffmpeg -i input.webm -vn -acodec libmp3lame -ab 192k output.mp3

First visit https://colab.research.google.com/

File
    > New notebook in Drive

Set the runtime type

Runtime
    > Change runtime type

Select python3 as interpreter and t4 gpu as hardware accelator

1749154186.png

Each code bock below is inserted by pressing the +Code button in the UI first and then entering the text. Lines that beings with ! is run in the os shell environment, while the rest are run in the selected interpreter (here python3). After typing in each code blocks, press the run button:

1749139785.png

The codeblocks (run in sequence)

!pip install openai-whisper
from google.colab import files
uploaded = files.upload() # (1)
  1. Click on Choose Files, navigate to and select file.
filename = list(uploaded.keys())[0]
print(f"Uploaded file: {filename}")
import os
os.environ['FNAME'] = filename
!whisper "$FNAME" --output_format srt --language zh --model turbo

Note

Replace zh with the spoken language in the uploaded audio file

filename_without_ext = os.path.splitext(os.environ['FNAME'])[0]
files.download(f"{filename_without_ext}.srt")

Note

  1. The generated srt file has the same name but different srt as extension.

Translate with google translate

At https://translate.google.com, in Documents tab you see that google ony supports .docx, .pdf, .pptx, .xlsx. To work around this i had libreoffice open this file and save it as .docx. Surprisingly it took less than a second for google to translate this. I downloaded the file and converted it back to txt format with libreoffice, and then changed it to srt.

1749152901.png


Comments