Skip to content

sevengivings/subtitle-xtranslator

Repository files navigation

subtitle-xtranslator

en

A Python script to extract text from audio/video and translate subtitle using Google Cloud, DeepL APIs.

OpenAI์˜ Whisper์™€ ์ž๋ง‰์„ ์œ„ํ•ด ์กฐ๊ธˆ ๋ณ€ํ˜•ํ•œ stable-ts ๋ฐ faster-whisper๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๋””์˜ค AI ์Œ์„ฑ ์ธ์‹ ๋ฐ ๋ฒˆ์—ญ ๊ณผ์ •์„ ์ž๋™ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํŒŒ์ด์ฌ ํ”„๋กœ๊ทธ๋žจ์ž…๋‹ˆ๋‹ค.

์ด ํ”„๋กœ๊ทธ๋žจ์€ ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ์ž๋ง‰์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์œ„์˜ ์Œ์„ฑ์ธ์‹ ๋ฐ ๋ฒˆ์—ญ ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ž‘์—…์ด ํŽธ๋ฆฌํ•˜๋„๋ก ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

[์ตœ์‹  ๋ฒ„์ „์—… ๋‚ด์šฉ]

  • 2025-02-16 subtitle-translator.py๋Š” .SRT๊ฐ€ ์ด๋ฏธ ์žˆ๋Š” ๊ฒฝ์šฐ ๋ฒˆ์—ญ๋งŒ ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (python .\subtitle-translator.py --source ja --target ko D:\temp\ja.srt ์™€ ๊ฐ™์ด ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์€ [๋‹จ์ผ exe๋กœ ๋งŒ๋“ค๊ธฐ] ํ•ญ๋ชฉ์„ ์ฐธ๊ณ ํ•˜์—ฌ .exe๋กœ ๋งŒ๋“ค๋ฉด ์ข€๋” ํŽธ๋ฆฌํ•˜๊ฒŒ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.)

  • 2025-01-01 faster-whisper๊ฐ€ ์ž๋ง‰ ์ €์žฅ์˜ค๋ฅ˜๊ฐ€ ๋‚˜๋˜ ๊ฒƒ์„ ์ˆ˜์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฒˆ์—ญ ํ›„ ํ•˜๋‚˜์˜ ์ž๋ง‰ ์•ˆ์— ๊ฐ™์€ ๋‹จ์–ด๊ฐ€ ๊ณต๋ฐฑ์ด๋‚˜ ์ฝค๋งˆ๋กœ 10๊ฐœ ์ด์ƒ ์ค‘๋ณต ํ‘œ์‹œ๋˜๋ฉด ์ž๋ง‰์„ ์‚ญ์ œํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ข…๋œ deepl-rapidapi ๊ด€๋ จ ๋‚ด์šฉ์€ ์‚ญ์ œํ–ˆ์Šต๋‹ˆ๋‹ค. faster-whisper๋Š” https://github.com/Purfview/whisper-standalone-win ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€ ์˜ต์…˜์œผ๋กœ -m large-v2 --sentence -vad true --vad_method pyannote_v3 --compute_type=float16 --no_repeat_ngram_size 4 --ff_mdx_kim2 -hst 4 -bo 10 --ff_speechnorm ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. --compute_type์€ PC์™€ GPUํ™˜๊ฒฝ์— ๋งž๊ฒŒ int8๋กœ ๋ฐ”๊พธ๊ฑฐ๋‚˜ -m medium์œผ๋กœ ๋ฐ”๊พธ๋ฉด ๋ฉ๋‹ˆ๋‹ค. -l ja ์˜ต์…˜์œผ๋กœ ์ผ๋ณธ์–ด๋ฅผ ์ง€์ •ํ•ด ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • 2024-12-07 Python์€ ๋ฒ„์ „ 3.13.1์€ Whisper์„ค์น˜ ์ค‘์— wheel ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค. ์ผ๋‹จ 3.11.9๋ฅผ ์“ฐ๋Š” ๊ฒƒ์ด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค.

  • 2024-11-29 ๋งŒ์•ฝ ์‹คํ–‰ ์ค‘์— NumPy 2.0์ด๋ผ์„œ ์•ˆ๋œ๋‹ค๋Š” ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์˜ค๋ฉด, pip uninstall numpy ๋ฐ pip install "numpy<2.0" ๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • 2023-11-17 app.py์˜ ์›น ํ™”๋ฉด ๊ตฌ์„ฑ์„ ๋ณ€๊ฒฝํ•˜๊ณ , ์—ฌ๋Ÿฌ ๊ฐœ์˜ MP3๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. 20231117a ๋ฒ„์ „์€ ์œˆ๋„์šฐ11, WSL Ubuntu, Colab์—์„œ ํ…Œ์ŠคํŠธ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ, Colab์—์„œ๋Š” ์ฒ˜์Œ ํ•œ๋ฒˆ๋งŒ ์ •์ƒ ์ž‘๋™ํ•ด์„œ "๋Ÿฐํƒ€์ž„ ๋‹ค์‹œ ์‹œ์ž‘"์„ ํ•œ ํ›„ ๋‹ค์‹œ !python app.py๋ฅผ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.

161156

  • 2023-11-12 ์•„๋ž˜ WebUI(Gradio.app)๋กœ ๋งŒ๋“  app.py๋Š” Colab์—์„œ๋„ ํ…Œ์ŠคํŠธ ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ง„ํ–‰ ๊ณผ์ •์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณด์—ฌ์ฃผ๋Š” ๊ธฐ๋Šฅ์€ ์ž‘์—… ์ค‘์ž…๋‹ˆ๋‹ค...

  • 2023-11-11 WebUI๋ฅผ ์œ„ํ•œ ์•„์ฃผ ๊ธฐ์ดˆ์ ์ธ ์•ฑ์„ ๋งŒ๋“ค์–ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋ช…๋ นํ”„๋กฌํ”„ํŠธ๋‚˜ ํŒŒ์›Œ์‰˜์—์„œ venv ํ™˜๊ฒฝ์„ ์‹คํ–‰(activate.bat or activate.ps1)ํ•œ ์ƒํƒœ์—์„œ pip install gradio๋ฅผ ํ•œ๋ฒˆ ํ•ด์ค€ ํ›„ python app.py๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๋งํฌ๋ฅผ Ctrl ๋ˆ„๋ฅธ ์ฑ„๋กœ ๋งˆ์šฐ์Šค ํด๋ฆญํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ž„์‹œ ํด๋”๋กœ mp4๋ฅผ ๋ณต์‚ฌํ•˜์—ฌ ์ž‘์—…ํ•˜๊ฒŒ ๋˜์–ด ์žˆ์–ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋ฏ€๋กœ ffmpeg -i .\input.mp4 -vn -ab 128k .\output.mp3 ๋ฅผ ํ†ตํ•ด mp3๋กœ ๋งŒ๋“ค์–ด ์ค€ ํ›„ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค.

  • 2023-11-09 ์œˆ๋„์šฐ์—์„œ๋Š” ๋ช…๋ นํ”„๋กฌํ”„ํŠธ์šฉ ๋ฐฐ์น˜ํŒŒ์ผ์ธ install_venv.bat๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ํŒŒ์ด์ฌ venv ํ™˜๊ฒฝ์„ ์‰ฝ๊ฒŒ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ฃผ์˜: ์•„๋ž˜ ์„ค์น˜ ์„ค๋ช… ์ค‘ 3 ~ 5๋ฒˆ์˜ ๋‚ด์šฉ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์ „๊ณผ ์ดํ›„ ๋‹จ๊ณ„๋Š” ์ง์ ‘ ์ž‘์—…ํ•˜์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค.)

  • 2023-11-08 Google Colab์—์„œ ์‹คํ–‰ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋„๋ก .ipynb ํŒŒ์ผ์„ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. Colab์—์„œ๋Š” github์—์„œ ๋ถˆ๋Ÿฌ์˜จ ๊ฒฝ์šฐ ์‚ฌ๋ณธ์œผ๋กœ ์ €์žฅ ํ•œ ํ›„์— ์ˆ˜์ •์ด ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ DEEPL_API_KEY ๊ฐ’์„ ๋„ฃ์–ด ์ค„ ์ˆ˜ ์žˆ๊ณ  ์ž๋ง‰ ์ถ”์ถœ ํ›„ ๋ฒˆ์—ญ๊นŒ์ง€ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋„ฃ์ง€ ์•Š์•„๋„ ์ž๋ง‰ ์ถ”์ถœ์€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. audio_langugate์™€ subtitle_language ๊ฐ’์„ ์ ์ ˆํžˆ ๋ฐ”๊พธ๋ฉด ๋ฉ๋‹ˆ๋‹ค(ko, en, ja, fr, cn ๋“ฑ๋“ฑ).

  • 2023-08-27 faster-whisper ์ง€์› ์ถ”๊ฐ€๋กœ ์ ์€ VRAM(์˜ˆ: MX150 2GB)์„ ๊ฐ€์ง„ ๋…ธํŠธ๋ถ์—์„œ๋„ medium๋ชจ๋ธ ๊ฐ€๋™์ด ๊ฐ€๋Šฅ(๊ณต์œ  VRAM์ด ์žˆ๋Š” ๊ฒฝ์šฐ)ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, cuDNN ๋ฐ cuBLAS๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Quantization๋กœ int8์„ ๊ธฐ๋ณธ ๊ฐ’์œผ๋กœ ์ง€์ •ํ•ด ๋‘์—ˆ์Šต๋‹ˆ๋‹ค. ๋น„๋ก ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ๋А๋ฆฌ์ง€๋งŒ CPU๋กœ๋งŒ ์ด์šฉํ•  ๊ฒฝ์šฐ์— faster-whisper๊ฐ€ ์ข‹์€ ์„ ํƒ์ด ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

[๊ธฐ๋Šฅ๊ณผ ํŠน์ง•]

  • ๋™์˜์ƒ์—์„œ ์ž๋ง‰์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” stable-ts, whisper ๋˜๋Š” faster-whisper ์ง€์›
  • ๊ตฌ๊ธ€ ํด๋ผ์šฐ๋“œ ๋ฒˆ์—ญ(ADC ๋˜๋Š” API KEY), ๋„ค์ด๋ฒ„ ํŒŒํŒŒ๊ณ  ๋ฒˆ์—ญ(์„œ๋น„์Šค์ข…๋ฃŒ), DeepL ๋ฐ DeepL ๋ฒˆ์—ญ ์„œ๋น„์Šค ์ง€์›
  • ์˜๋ฏธ ์—†๋Š” ์งง์€ ์ž๋ง‰์ด๋‚˜ ๋ฐ˜๋ณต๋˜๋Š” ์ž๋ง‰ ์ œ๊ฑฐ ์ง€์›

[ํ•œ๊ณ„]

  • ์Œ์„ฑ ์ธ์‹์ด ์™„์ „ํ•˜์ง€ ์•Š์•„์„œ ๋ˆ„๋ฝ๋˜๋Š” ์Œ์„ฑ์ด๋‚˜ ์ž˜๋ชป ์ธ์‹๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กœํŽ˜์…”๋„ํ•œ ์šฉ๋„๋กœ ์‚ฌ์šฉ์€ ๊ถŒ์žฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • stable-ts์™€ whisper ๋ช…๋ น์–ด๋กœ ํ–ˆ์„ ๋•Œ์™€ ์ด ํ”„๋กœ๊ทธ๋žจ์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ, Whisper WebUI๋ฅผ ์ผ์„ ๋•Œ ๊ฐ๊ฐ ์ž๋ง‰์˜ ํ’ˆ์งˆ์ด๋‚˜ ๊ฐœ์ˆ˜๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ฐธ๊ณ ๋กœ stable-ts๋Š” ์ž๋ง‰ ์ถ”์ถœ ์šฉ๋„๋กœ ์ตœ์ ํ™”ํ•œ ํ”„๋กœ๊ทธ๋žจ์ด๊ธฐ๋„ ํ•˜์ง€๋งŒ Whisper ์˜ค๋ฆฌ์ง€๋„์— ๋น„ํ•ด ์ธ์‹ ๋ˆ„๋ฝ์ด ์žˆ๋Š” ํŽธ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์—†๋Š”๋ฐ ์ถ”์ถœ๋œ ๊ท€์‹  ์†Œ๋ฆฌ, ๋ฌด์˜๋ฏธํ•œ ๋ฐ˜๋ณต, ๋’ท๋ถ€๋ถ„ ์ถ”์ถœ ์•ˆ๋˜๋Š” ๋“ฑ์˜ ๋ฌธ์ œ๋Š” ์ ์€ ํŽธ์ž…๋‹ˆ๋‹ค.)
  • ์œ ๋ฃŒ๋กœ ๋ฒˆ์—ญ API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์‚ฌ์ „์— ๋ณธ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ถฉ๋ถ„ํžˆ ํ…Œ์ŠคํŠธํ•œ ํ›„ ๋ฌธ์ œ๊ฐ€ ์—†์„ ๋•Œ ์ด์šฉ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ž ์žฌ์ ์ธ ๋ฒ„๊ทธ๋‚˜ ์•Œ ์ˆ˜ ์—†๋Š” ์ด์œ ๋กœ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋Š” ํ”ผํ•ด์— ๋Œ€ํ•ด ์ฑ…์ž„์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

[๊ด€๋ จ ํ”„๋กœ๊ทธ๋žจ ๋งํฌ]

[์‚ฌ์šฉ๋ฒ•]

์ด ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋ช‡ ๊ฐ€์ง€ ์ „์ œ ์กฐ๊ฑด์„ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(์•„๋ž˜ ์„น์…˜ ์ฐธ์กฐ - [์œˆ๋„์šฐ10/11 ๊ธฐ์ค€ ์ค€๋น„ ์ž‘์—…]).

์•„๋ž˜๋Š” ์ „์ฒด ์‚ฌ์šฉ๋ฒ•์„ ๋ณด๊ธฐ ์œ„ํ•ด -h๋ฅผ ์‹คํ–‰ํ•œ ๋ชจ์Šต์ž…๋‹ˆ๋‹ค.

usage: subtitle-xtranslator.py [-h] [--framework FRAMEWORK] [--model MODEL] [--device DEVICE]
                               [--audio_language AUDIO_LANGUAGE] [--subtitle_language SUBTITLE_LANGUAGE]
                               [--skip_textlength SKIP_TEXTLENGTH] [--translator TRANSLATOR]
                               [--text_split_size TEXT_SPLIT_SIZE]
                               audio [audio ...]

positional arguments:
  audio                 audio/video file(s) to transcribe

options:
  -h, --help            show this help message and exit
  --framework FRAMEWORK
                        name of the stable-ts, whisper or faster-whisper framework to use (default: stable-ts)
  --model MODEL         tiny, base, small, medium, large-v2, large-v3 model to use (default: medium)
  --device DEVICE       device to use for PyTorch inference (default: cuda)
  --audio_language AUDIO_LANGUAGE
                        language spoken in the audio, specify None to perform language detection (default: ja)
  --subtitle_language SUBTITLE_LANGUAGE
                        subtitle target language (default: ko)
  --skip_textlength SKIP_TEXTLENGTH
                        skip short text in the subtitles, useful for removing meaningless words (default: 1)
  --translator TRANSLATOR
                        none, google, papago, deepl-api(default: none)
  --text_split_size TEXT_SPLIT_SIZE
                        split the text into small lists to speed up the translation process (default: 1000)
  --condition_on_previous_text
                        if True, provide the previous output of the model as a prompt for the next window; disabling
                        may make the text inconsistent across windows, but the model becomes less prone to getting
                        stuck in a failure loop (default: False)
  --demucs              stable-ts only, whether to reprocess the audio track with Demucs to isolate vocals/remove
                        noise; pip install demucs PySoundFile; Demucs official repo:
                        https://github.com/facebookresearch/demucs (default: False)
  --vad                 stable-ts only, whether to use Silero VAD to generate timestamp suppression mask; pip install
                        silero; Official repo: https://github.com/snakers4/silero-vad (default: False)
  --mel_first           stable-ts only, process entire audio track into log-Mel spectrogram first instead in chunksif
                        audio is not transcribing properly compared to whisper, at the cost of more memory usage for
                        long audio tracks (default: False)

์•„๋ž˜ ๋ช…๋ น์€ ๊ฐ ์ธ์ž๋“ค์˜ ๊ธฐ๋ณธ ๊ฐ’์„ ๋ช…์‹œ์ ์œผ๋กœ ํ‘œ์‹œํ•˜์—ฌ ์‹คํ–‰ํ•ด ๋ณธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ผ์–ด๋กœ ๋œ ์˜์ƒ์—์„œ ์ถ”์ถœํ•  ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> python .\subtitle-xtranslator.py --framework=stable-ts --model=medium --device=cuda --audio_language=ja --skip_textlength=1  '.\inputvideo1.mp4' '.\inputvideo2.mp4' '.\inputvideo3.mp4'

์‹ค์ œ๋กœ ์œ„ ๋ช…๋ น์˜ ๊ธฐ๋ณธ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์“ด ๊ฒƒ์ด๋ผ์„œ ์ธ์ž(์•„๊ทœ๋จผํŠธ)๋ฅผ ์ƒ๋žตํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ --translator์˜ ๊ธฐ๋ณธ์€ none์ด๋ผ์„œ ๋ฒˆ์—ญ์€ ํ•˜์ง€ ์•Š๊ณ  ์ž๋ง‰ ์ถ”์ถœ๋งŒ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> python .\subtitle-xtranslator.py '.\inputvideo1.mp4' '.\inputvideo2.mp4' '.\inputvideo3.mp4'

๋ฌผ๋ก  ์ถ”์ถœ๋œ ์ž๋ง‰์„ ํ•œ๊ตญ์–ด๋กœ ์ž๋™ ๋ฒˆ์—ญ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” --translator google๋‚˜ --translator deepl-api ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๋ฒˆ์—ญ๊ธฐ๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ API ํ‚ค๋ฅผ ์ œ๊ณตํ•˜๋ ค๋ฉด ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉ ํ•ฉ๋‹ˆ๋‹ค.

Google์˜ ๊ฒฝ์šฐ ADC(์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ธฐ๋ณธ ์ž๊ฒฉ ์ฆ๋ช… - ํŠน์ˆ˜ ํŒŒ์ผ ์ƒ์„ฑ) ๋˜๋Š” API ํ‚ค๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค์Œ์€ API ํ‚ค์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค. ADC๋Š” ๋กœ์ปฌ ์ปดํ“จํ„ฐ์— ํด๋ผ์šฐ๋“œ ์‚ฌ์šฉ์„ ์œ„ํ•œ ์ธ์ฆ ํŒŒ์ผ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์ด๋ผ์„œ API ํ‚ค ์œ ์ถœ์„ ๊ฑฑ์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ํ•ด๋‹น ํŒŒ์ผ์ด ์œ ์ถœ๋˜๋ฉด ์•ˆ๋˜๊ฒ ์ง€์š”... ๋‹ค์†Œ ๋ฒˆ๊ฑฐ๋กญ์ง€๋งŒ ์†Œ์Šค์ฝ”๋“œ์— API ํ‚ค๋ฅผ ๋‚ด์žฅํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ์ง€์ •ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด ๋ณด์•ˆ์— ๋” ์ข‹์•„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ํ•ด๋‹น ์„ธ์…˜์—์„œ๋งŒ ์ž‘๋™ํ•˜๋ฏ€๋กœ ์ปดํ“จํ„ฐ๋ฅผ ๊ป๋‹ค ์ผœ๊ฑฐ๋‚˜ ํŒŒ์›Œ์‰˜๊ณผ venv๋ฅผ ๋‹ค์‹œ ๋กœ๋”ฉํ–ˆ๋‹ค๋ฉด ๋˜ ํ•ด์ฃผ์–ด์•ผ ํ•˜๋Š” ๋ฒˆ๊ฑฐ๋กœ์›€์€ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ตฌ๊ธ€ ํด๋ผ์šฐ๋“œ ๋ฒˆ์—ญ์„ ์ด์šฉํ•˜๋ ค๋ฉด ๊ตฌ๊ธ€ ํด๋ผ์šฐ๋“œ ์ฝ˜์†”์—์„œ ํ”„๋กœ์ ํŠธ๋ฅผ ์ƒˆ๋กœ ํ•˜๋‚˜ ๋งŒ๋“ค๊ณ  Google Cloud Translation์„œ๋น„์Šค๋ฅผ ์„ ํƒํ•˜๊ณ  ADC๋ฅผ ์„ค์ •ํ•˜๊ฑฐ๋‚˜ API key๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ADC๋ง๊ณ  API key๋งŒ ๋ฐ›๋Š” ๋ฐฉ๋ฒ•์€ ์ข€ ๋” ๊ฐ„๋‹จํ•ด ๋ณด์ž…๋‹ˆ๋‹ค(https://urame.tistory.com/entry/GoogleGCP-Translation-API%EB%B2%88%EC%97%AD-API-%EC%8B%A0%EC%B2%AD-%EB%B0%8F-PYTHON-%ED%85%8C%EC%8A%A4%ED%8A%B8).

๋น„์šฉ : ์š”๊ธˆ์€ Cloud Translation์— ์ œ๊ณต๋œ ๋ฌธ์ž ์ˆ˜๋กœ ์กฐ์ •๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•œ ๋‹ฌ์— 575,000์ž๋ฅผ ์ „์†กํ•˜์—ฌ ์ฒ˜๋ฆฌํ•œ ๊ฒฝ์šฐ $1.50๊ฐ€ ์ฒญ๊ตฌ๋ฉ๋‹ˆ๋‹ค. ์ฒ˜์Œ 500,000์ž๋Š” ๋ฌด๋ฃŒ์ด๊ณ  ๋‹ค์Œ 75,000์ž๋Š” $20๋‹ฌ๋Ÿฌ/100๋งŒ์ž(์˜๋ฌธ ๊ธฐ์ค€) ์š”์œจ๋กœ ์ฒญ๊ตฌ๋ฉ๋‹ˆ๋‹ค. 7.5๋งŒ์ž x 0.2๋‹ฌ๋Ÿฌ/๋งŒ์ž = 1.5๋‹ฌ๋Ÿฌ

(venv) C:\Users\loginid> Set-Item -Path env:GOOGLE_API_KEY -Value "your_api_key"

DeepL์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ๋ฐœ์ž ๋“ฑ๋กํ•œ ํ›„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” APIํ‚ค๋ฅผ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ์ œ๊ณตํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์›” 50๋งŒ์ž๊นŒ์ง€ ๋ฌด๋ฃŒ๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ์—์„œ๋Š” set DEEPL_API_KEY=your_api_key ์™€ ๊ฐ™์ด ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> Set-Item -Path env:DEEPL_API_KEY -Value "your_api_key"

DeepL API ๋ฒˆ์—ญ์„ ์ด์šฉํ•˜๋ ค๋ฉด ์ตœ์ดˆ 1ํšŒ ๊ด€๋ จ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ด ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> pip install --upgrade deepl
(venv) C:\Users\loginid> Set-Item -Path env:DEEPL_API_KEY -Value "your_api_key" 

๊ทธ๋Ÿฌ๋ฉด ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. --translator๋กœ๋Š” deepl-api๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์ถ”์ถœ ๋ฐฉ๋ฒ•์€ stable-ts๋ฅผ ์„ ํƒํ•˜๋Š”๋ฐ, stable-ts์˜ demucs=True, vad=True, mel_first=True ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์˜์–ด๋กœ ๋˜์–ด ์žˆ๋Š” ๋™์˜์ƒ์ž…๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> Set-Item -Path env:DEEPL_API_KEY -Value "your_api_key" 
(venv) C:\Users\loginid> python .\subtitle-xtranslator.py --demucs --vad --mel_first --audio_language en --translator deepl-api --text_split_size 3000 'Y:\video_cut.mp4'

demucs, vad, mel_first์— ๊ด€ํ•˜์—ฌ๋Š” stable-ts์˜ ๊ฐœ๋ฐœ์ž ํŒ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด์•ผ๊ธฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์Œ์•…์—๋Š” demucs=True, vad=True๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ ์Œ์•…์ด ์•„๋‹Œ ๊ฒฝ์šฐ์—๋„ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ค๋””์˜ค๊ฐ€ Whisper์— ๋น„ํ•ด ์ œ๋Œ€๋กœ ์ถ”์ถœ๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, ๊ธด ์˜ค๋””์˜ค ํŠธ๋ž™์˜ ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ๋Š˜์–ด๋‚˜์ง€๋งŒ mel_first=True๋ฅผ ์‚ฌ์šฉํ•ด ๋ณด์„ธ์š”.

demucs์™€ vad๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋‹ค์Œ ํŒจํ‚ค์ง€๋“ค๋„ ์„ค์น˜ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

(venv) C:\Users\loginid> pip install wheel 
(venv) C:\Users\loginid> pip install demucs PySoundFile
(venv) C:\Users\loginid> pip install silero

๋งŒ์•ฝ ์„ค์น˜ ์ค‘ ์˜ค๋ฅ˜๊ฐ€ ๋‚˜๋ฉด https://visualstudio.microsoft.com/ko/visual-cpp-build-tools/ ๋ฅผ ์„ค์น˜ํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ, demucs์˜ ๊ฒฝ์šฐ ๋™์˜์ƒ์ด ๊ธด ๊ฒฝ์šฐ GPU๋ฉ”๋ชจ๋ฆฌ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋ฉฐ, 8GB VRAM์—์„œ๋Š” ์•ˆ ๋  ์ˆ˜ ์žˆ๊ณ (์˜ˆ: 2์‹œ๊ฐ„40๋ถ„ MP4๊ฐ€ 13GB VRAM์„ ์š”๊ตฌ), ์ถ”๊ฐ€ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„(6~7๋ถ„)์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. vad๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋™์˜์ƒ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ํƒ์ƒ‰ํ•˜๋ฏ€๋กœ ์˜ค๋ž˜ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

์‹ค์ œ stable-ts์™€ deepl-api๋ฅผ ์ด์šฉํ•˜์—ฌ ์ถ”์ถœ๊ณผ ๋ฒˆ์—ญ์„ ํ•˜๋Š” ๋ชจ์Šต์„ ๋™์˜์ƒ์œผ๋กœ ๋‹ด์•˜์Šต๋‹ˆ๋‹ค. https://www.youtube.com/watch?v=Orq6CGHw8Ag

[์œˆ๋„์šฐ10/11 ๊ธฐ์ค€ ์ค€๋น„ ์ž‘์—…]

1.ํŒŒ์ด์ฌ ์„ค์น˜

ํŒŒ์ด์ฌ์€ ์ตœ์‹  ๋ฒ„์ „์„ ์„ค์น˜๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์œˆ๋„์šฐ11์˜ ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ๋‚˜ ํŒŒ์›Œ์‰˜ ์•„๋ฌด๋ฐ์„œ๋‚˜ python์ด๋ผ๊ณ  ์น˜๋ฉด ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.

https://www.python.org/downloads https://www.python.org/ftp/python/3.11.4/python-3.11.4-amd64.exe

2.CUDA ์„ค์น˜

NVIDIA ๋น„๋””์˜ค ์นด๋“œ์˜ ์„ฑ๋Šฅ์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

์„ค์น˜ ์™„๋ฃŒ ํ›„ cuda๊ฐ€ ์ œ๋Œ€๋กœ ์„ค์น˜๋˜์–ด ์žˆ๋Š” ์ง€ ํ™•์ธํ•˜๋ ค๋ฉด ํŒŒ์›Œ์‰˜(Windows PowerShell ์•ฑ)์„ ๋„์šฐ๊ณ , nvidia-smi ๋ผ๊ณ  ๋ช…๋ น์„ ๋‚ด๋ ค ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

https://developer.nvidia.com/cuda-toolkit https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda_12.2.1_536.67_windows.exe

๋งŒ์•ฝ faster-whisper๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด cuDNN์˜ ์„ค์น˜๊ฐ€ ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค. cuDNN์€ https://developer.nvidia.com/cudnn-downloads ์—์„œ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. https://developer.download.nvidia.com/compute/cudnn/9.6.0/local_installers/cudnn_9.6.0_windows.exe ๋ฅผ ๋ฐ›์•„์„œ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ์„ค์ •๋Œ€๋กœ ์„ค์น˜ํ•˜๋ฉด C:\Program Files\NVIDIA\CUDNN\v9.6\bin ์— dllํŒŒ์ผ์ด ์ƒ๊น๋‹ˆ๋‹ค. ์ด ํŒŒ์ผ๋“ค์„ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin ๋ฐ‘์œผ๋กœ ๋ณต์‚ฌํ•ด ์ฃผ์„ธ์š”.

faster-whisper๋ฅผ ์œ„ํ•ด ์„ ํƒ์ ์œผ๋กœ ์„ค์น˜ํ•˜๋Š” cuBLAS๋Š” ์•„๋ž˜์— venv ํ™˜๊ฒฝ์ด ๋งŒ๋“ค์–ด์ง„ ํ›„ pip install nvidia-cublas-cu12 ๋ช…๋ น์„ ํ†ตํ•ด ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3.ํŒŒ์›Œ์‰˜ ์‹คํ–‰

์œˆ๋„์šฐํ‚ค๋ฅผ ๋ˆ„๋ฅด๊ณ  Rํ‚ค๋ฅผ ๋ˆ„๋ฅด๋ฉด ์ขŒ์ธก์— ์‹คํ–‰ ์ฐฝ์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ์ด๊ณณ์— "powershell"์„ ์ž…๋ ฅํ•˜๊ณ  ํ™•์ธ์„ ๋ˆ„๋ฅด๋ฉด ํŒŒ์›Œ์‰˜์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ด์™ธ์— ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅ).

ํŒŒ์›Œ์‰˜ ์ฐฝ์—์„œ python์ด๋ผ๊ณ  ์น˜๊ณ  [Enter]ํ‚ค๋ฅผ ๋ˆ„๋ฅด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‘๋‹ต์ด ๋‚˜์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

PS C:\Users\login_id> python

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

์œ„ >>> ์—์„œ ๋‚˜์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” exit() ์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.

4.VENV ํ™˜๊ฒฝ ๋งŒ๋“ค์–ด ์ฃผ๊ธฐ

ํŒŒ์ด์ฌ์€ ํŒจํ‚ค์ง€๋ฅผ ํ•„์š”ํ•  ๋•Œ๋งˆ๋‹ค ์„ค์น˜ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์‹œ์Šคํ…œ์— ์„ค์น˜๋œ ํŒŒ์ด์ฌ์— ๊ทธ๋ƒฅ ์„ค์น˜ํ•˜๋‹ค๋ณด๋ฉด ๊ฐ€๋” ๋ญ”๊ฐ€๊ฐ€ ๊ผฌ์ด๊ฒŒ ๋˜๊ณ  ๋ฌธ์ œ๊ฐ€ ๊ฐ€๋” ์ƒ๊ธฐ๋Š”๋ฐ ์•„์ฃผ ๋จธ๋ฆฌ๊ฐ€ ์•„ํ”ˆ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก , ์ด ๊ธฐ๋Šฅ๋งŒ ์ด์šฉํ•˜๊ฒ ๋‹คํ•˜๋ฉด ์ƒ๊ด€์—†์ง€๋งŒ ๊ทธ๋ž˜๋„ ์ œ๊ฑฐ๊ฐ€ ํŽธํ•˜๋„๋ก ๊ฐ€์ƒ์˜ ํ™˜๊ฒฝ์„ ๋งŒ๋“ค์–ด ์ค๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” ์‚ฌ์šฉ์ž ๋””๋ ‰ํ„ฐ๋ฆฌ์— ๊ทธ๋ƒฅ ์„ค์น˜ํ–ˆ๋Š”๋ฐ ๋‹ค๋ฅธ ๋“œ๋ผ์ด๋ธŒ๋‚˜ ํด๋”์— ํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค(์ฃผ์˜: ๊ฒฝ๋กœ ์ƒ์— ํ•œ๊ธ€์ด ์—†๋Š” ๊ณณ์—์„œ ์ž‘์—…ํ•ด์ฃผ์„ธ์š”. ํ˜น์‹œ ์œˆ๋„์šฐ ๋กœ๊ทธ์ธ๋ช…์ด ํ•œ๊ธ€์ด๋ผ๋ฉด ๋‹ค๋ฅธ ๊ณณ์— ์„ค์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.)

์šฉ๋Ÿ‰์ด 4.5GB๊ฐ€๋Ÿ‰ ๋˜๋ฏ€๋กœ ์ ์ ˆํ•œ ๋””์Šคํฌ ๋“œ๋ผ์ด๋ธŒ์— ์„ค์น˜ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

(์ฃผ์˜: ๊ฒฝ๋กœ์ƒ์— ํ•œ๊ธ€์ด ํฌํ•จ๋˜๋ฉด ์•ˆ๋ฉ๋‹ˆ๋‹ค.)

PS C:\Users\login_id> python -m venv venv 
PS C:\Users\login_id> .\venv\Scripts\Activate.ps1

๋งŒ์•ฝ .ps1๊ฐ€ ์‹คํ–‰์ด ์•ˆ๋˜๋ฉด ํŒŒ์›Œ์‰˜์„ ๊ด€๋ฆฌ์ž ๊ถŒํ•œ์œผ๋กœ ์‹คํ–‰ํ•œ ํ›„, ์•„๋ž˜ ๋ช…๋ น์„ ํ•œ๋ฒˆ ์‹คํ–‰ํ•ด ์ค๋‹ˆ๋‹ค.

PS C:\WINDOWS\system32> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned

์œ„์™€ ๊ฐ™์ด ํ•ด์ฃผ๋ฉด, ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ค€๋น„๊ฐ€ ๋๋‚ฉ๋‹ˆ๋‹ค. ์ฒ˜์Œ์— ์‹คํ–‰ํ•  ๋•Œ ๋ณด์•ˆ ๊ด€๋ จ ๋ฌธ์˜๊ฐ€ ๋‚˜์˜ค๋Š”๋ฐ Always๋ฅผ ์„ ํƒํ•ด ์ค๋‹ˆ๋‹ค. venv๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰๋˜๋ฉด ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋ฐ”๋€๋‹ˆ๋‹ค.

5.GPU๋ฒ„์ „์˜ pyTorch์„ค์น˜ ๋ฐ ๊ด€๋ จ ํŒจํ‚ค์ง€๋“ค ์„ค์น˜

GPU๋ฒ„์ „์˜ ํ† ์น˜๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค(๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์œผ๋กœ ์„ค์น˜ ์‹คํŒจ ์‹œ --no-cache-dir ์ถ”๊ฐ€).

(venv) PS C:\Users\login_id> pip install torch==2.2.1+cu121 --extra-index-url https://download.pytorch.org/whl/cu121

์ž˜ ์„ค์น˜๊ฐ€ ๋˜์—ˆ๋Š” ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด python์„ ์ž…๋ ฅํ•˜๊ณ  ๊ฐ„๋‹จํ•œ ํ”„๋กœ๊ทธ๋žจ์„ ์งญ๋‹ˆ๋‹ค. (์ฃผ์˜) "version" ์€ ๊ธ€์ž์˜ ์ขŒ์šฐ์— ์–ธ๋”๋ฐ”๊ฐ€ 2๊ฐœ์”ฉ ์žˆ์Šต๋‹ˆ๋‹ค.

(venv) PS C:\Users\login_id> python
Python 3.11.4 [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.2.1+cu121
>>> exit()

์œ„ ๊ณผ์ •์—์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋‚œ๋‹ค๋ฉด, https://aka.ms/vs/16/release/vc_redist.x64.exe ๋ฅผ ์ถ”๊ฐ€๋กœ ์„ค์น˜ํ•œ ํ›„ ์žฌ์‹œ๋„๋ฅผ ํ•˜์—ฌ ๋ด…๋‹ˆ๋‹ค.

(venv) PS C:\Users\login_id> python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
                 It can be downloaded at https://aka.ms/vs/16/release/vc_redist.x64.exe
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\login_id\venv\Lib\site-packages\torch\__init__.py", line 133, in <module>
    raise err
OSError: [WinError 126] ์ง€์ •๋œ ๋ชจ๋“ˆ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Error loading "C:\Users\login_id\venv\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies.

์ด ์ƒํƒœ์—์„œ ํ–ฅํ›„ ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋“ค์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

(venv) PS C:\Users\login_id> pip install -U openai-whisper 
(venv) PS C:\Users\login_id> pip install -U stable-ts 
(venv) PS C:\Users\login_id> pip install -U google-cloud-translate

ํ˜น์€ ๊ฐ™์ด ์ฒจ๋ถ€๋œ requirements.txt๋ฅผ ์ด์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

(venv) PS C:\Users\login_id> pip install -r requirements.txt 

์ฐธ๊ณ ๋กœ, ๊ทธ๋™์•ˆ ํ…Œ์ŠคํŠธํ•  ๋•Œ stable-ts๋Š” ์ฃผ๋กœ 2.6.2๋ฅผ ์‚ฌ์šฉ ํ–ˆ์—ˆ๋Š”๋ฐ, ๊ณ„์† ์—…๊ทธ๋ ˆ์ด๋“œ๊ฐ€ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ตœ์‹  ๋ฒ„์ „์„ ์„ค์น˜ํ•˜๊ณ  ๊ฒฐ๊ณผ๊ฐ€ ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ์•Š์„ ๋•Œ์— ํŠน์ • ๋ฒ„์ „์œผ๋กœ ๋ฐ”๊พธ์–ด ์„ค์น˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๊ณ ๋ คํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

pip install stable-ts==2.6.2

๋งŒ์•ฝ faster-whisper๋ฅผ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ด๋ผ๋ฉด, ์•„๋ž˜ ๋ช…๋ น์„ ์ถ”๊ฐ€๋กœ ์ง„ํ–‰ํ•ด ์ค๋‹ˆ๋‹ค. int8๋กœ ๊ณ ์ •๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ subtitle-xtranslator.py์˜ 518๋ฒˆ์งธ ์ค„์˜ ์ฝ”๋“œ model = WhisperModel(model_name, device=device, compute_type="int8") ์— int8๋กœ ๋˜์–ด ์žˆ๋Š” ๋ถ€๋ถ„์„ fp16 ๋“ฑ์œผ๋กœ ์ˆ˜์ •ํ•˜์—ฌ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

pip install nvidia-cublas-cu12
pip install faster-whisper

6.FFMPEG ์„ค์น˜ ๋ฐ ํŒŒ์ด์ฌ ์ธํ„ฐํ”„๋ฆฌํ„ฐ ์ƒํƒœ์—์„œ ์˜์ƒ ์ž๋ง‰ ๋งŒ๋“ค๊ธฐ

์˜์ƒ์—์„œ ์Œ์„ฑ์„ ์ถ”์ถœ์„ ํ•˜๋‹ค๋ณด๋‹ˆ ์™ธ๋ถ€ ํ”„๋กœ๊ทธ๋žจ์ด ํ•˜๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

https://www.gyan.dev/ffmpeg/builds/#release-builds ์—์„œ ffmpeg-release-essentials.zip ์„ ๋ฐ›์•„์„œ ์••์ถ• ํ•ด์ œํ•œ ํ›„, ์•ž์œผ๋กœ ์ž‘์—…ํ•  ๋””๋ ‰ํ„ฐ๋ฆฌ๋‚˜ ํ™˜๊ฒฝ๋ณ€์ˆ˜์—์„œ Path๊ฐ€ ์„ค์ •๋˜์–ด ์žˆ๋Š” ๊ณณ์— ๋ณต์‚ฌํ•˜์—ฌ๋„ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ƒฅ C:\Users\login_id\venv\Scripts ๋ฐ‘์— ๋ณต์‚ฌํ•˜๋Š” ๊ฒƒ์ด ์†ํŽธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

7.์งง์€ ์˜์ƒ์„ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ์ถ”์ถœํ•ด ๋ณด๊ธฐ

์งง์€ ์˜์ƒ ํ•˜๋‚˜๋ฅผ ํ…Œ์ŠคํŠธํ•˜๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค(์‹ค์ œ๋กœ๋Š” ์ค‘๊ฐ„์— warning์ด ๋‚˜์˜ค์ง€๋งŒ ์ž‘๋™์— ๋ฌธ์ œ๋Š” ์—†์Šต๋‹ˆ๋‹ค).

(venv) PS C:\Users\login_id> python
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import stable_whisper
>>> model = stable_whisper.load_model("small", device="cuda")
>>> result = model.transcribe(verbose=True, word_timestamps=False, language="ko", audio="20220902_131203.mp4")
[00:12.800 --> 00:16.080]  ์•„ ์•„๊นŒ ๋”ฑ ์ฐ์—ˆ์–ด์•ผ ๋˜๋Š”๋ฐ
[00:19.580 --> 00:21.580]  ๋ถˆํ–‰๋ž‘์„ ์น˜๋Š” ๊ฑธ ์ฐ์—ˆ์–ด์•ผ ๋˜๋Š”๋ฐ
[00:30.000 --> 00:34.980]  ์ง„์ถœํ•˜๋˜๊ฒ ์ง€
>>> result.to_srt_vtt("20220902_131203.srt")
Saved: C:\Users\login_id\20220902_131203.srt
>>> exit()

word_timestamps=True๊ฐ€ ๊ธฐ๋ณธ ๊ฐ’์ธ๋ฐ, ๋งํ•˜๋Š” ์ค‘ ๋‹จ์–ด๊ฐ€ ํ•˜์ด๋ผ์ดํŠธ ๋˜๋Š” ๊ธฐ๋Šฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. 2GB์˜ VRAM์„ ๊ฐ€์ง„ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋ผ์„œ small ๋ชจ๋ธ๋กœ ํ–ˆ๋Š”๋ฐ, ๋ช‡ ๋งˆ๋””(๋ถˆํ–‰๋ž‘->์ค„ํ–‰๋ž‘, ์ง„์ถœํ•˜๋˜๊ฒ ์ง€๋Š” ๊ทธ๋ƒฅ ํŒŒ๋„ ์†Œ๋ฆฌ๊ฐ€ ์ž๋ง‰ํ™” ๋˜์—ˆ๋„ค์š”)๋Š” ์ž˜๋ชป ์ธ์‹ํ–ˆ๋„ค์š”. 8GB VRAM์ด๋ผ๋ฉด medium์œผ๋กœ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

8.subtitle-xtranslator.py ๋ฐ›์•„์„œ ์ด์šฉํ•˜๊ธฐ

๋งŒ์•ฝ git๋ฅผ ์„ค์น˜ํ•ด ๋‘์—ˆ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐ›์œผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด https://github.com/sevengivings/subtitle-xtranslator ์— ์ ‘์†ํ•ด์„œ ์šฐ์ธก์— "<> CODE"๋ผ๋Š” ๋ช…๋ น๋ฒ„ํŠผ์ด ๋ณด์ž…๋‹ˆ๋‹ค. ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด Download ZIP ๋ฉ”๋‰ด๋ฅผ ํ†ตํ•ด ์••์ถ• ํŒŒ์ผ๋กœ ๋ฐ›์„ ์ˆ˜ ์žˆ๊ณ , ์ ๋‹นํ•œ ๊ณณ์— ์••์ถ• ํ•ด์ œํ•œ ํ›„ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

(venv) C:\Users\login_id> git clone https://github.com/sevengivings/subtitle-xtranslator

(์ฃผ์˜) ๋งŒ์•ฝ ํ•œ๊ธ€๋กœ ๋œ ์•ˆ๋‚ด ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋ ค๋ฉด ์••์ถ• ํŒŒ์ผ์˜ locale ๋””๋ ‰ํ† ๋ฆฌ๋„ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

[๋‹จ์ผ exe๋กœ ๋งŒ๋“ค๊ธฐ]

์ง€๊ธˆ๊นŒ์ง€๋Š” python .\subtitle-xtranslator.py๋กœ ์‹คํ–‰์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์†Œ ๋ถˆํŽธํ•˜๋ฏ€๋กœ exeํŒŒ์ผ๋กœ ๋งŒ๋“  ํ›„, venv\Scripts์— ๋ณต์‚ฌํ•˜์—ฌ ์•„๋ฌด ๋“œ๋ผ์ด๋ธŒ๋‚˜ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ๋„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

(venv) C:\Users\login_id> pip install pyinstaller
(venv) C:\Users\login_id> pyinstaller --onefile .\subtitle-xtranslator.py 

์œ„ ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜ค๋Š” C:\Users\login_id\dist\subtitle-xtranslator.exe๋ฅผ ์œˆ๋„์šฐ์˜ ๊ฒฝ๋กœ PATH๊ฐ€ ์ง€์ •๋œ ์•„๋ฌด ๊ณณ์—๋‚˜ ๋ณต์‚ฌํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด์ œ ์–ด๋А ๊ณณ์—์„œ๋‚˜ ์‹คํ–‰์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค(venv์™€ ๊ด€๊ณ„ ์—†์ด).

์œ„ ๋ฐฉ์‹์œผ๋กœ ๋งŒ๋“ค๋ฉด ์•ฝ 2.7GB์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ ๋งŒ๋“œ๋Š”๋ฐ, ๊ทธ๋ฆฌ๊ณ  ์‹คํ–‰ํ•  ๋•Œ ์˜ค๋ž˜(1~2๋ถ„) ๊ฑธ๋ฆฌ๊ธฐ๋„ ํ•˜๊ณ  ์‹ค์šฉ์ ์ด์ง€๋Š” ๋ชปํ•ด ๋ณด์ž…๋‹ˆ๋‹ค.

About

A Python script for AI speech recognition of video or audio file using whisper, stable-ts, or faster-whisper and it can translate subtitles using Google Cloud, Naver Papago and DeepL APIs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages