In the editor, make it an option to download a local transcription AI model, and add captions to the Studio Mode recording.
This will bake the captions directly into the video frames.
This should be in the form of a new tab in the editor.
If the local transcription AI model has not been downloaded yet, prompt the user to download it first.
In the new captions tab, if the model exists and the video has audio, add a "Generate captions" button.
The caption transcription should be editable, including the design/how the text looks on the video.
Rendering the text on the GPU would probably be best achieved with glyphon