Audio & Video Generation

From a written line to a full clip.

AI for Sound

We were surprised how natural these voices sound now. Modern audio models can clone a voice from just a few seconds of recording, read text aloud in many languages, or compose a short piece of music starting from a one-line description of the mood. We kept this section short — there are many more tools out there we couldn't cover here.

Two Audio Tools

ElevenLabs

Specializes in realistic voice synthesis and voice cloning. Often used for audiobook narration, dubbing, and accessibility features such as reading articles aloud.

Suno

Generates short songs, including lyrics and instrumental tracks, from a one-line description of the style and mood. Useful for background music in videos and presentations.

Sample Voiceover

A short English voiceover, ~20 seconds.

AI for Video

Video generation is the newest of the three areas. Today's tools can produce clips of a few seconds with believable motion, lighting, and camera movement. Longer clips are still difficult — its expensive to generate even ten seconds at high quality.

Two Video Tools

Sora

OpenAI's text-to-video model. Generates short clips from a written prompt, with surprisingly stable physics and lighting. Available through ChatGPT for paying users.

Runway

Offers Gen-3 for text-to-video and image-to-video, plus a full web editor for trimming, masking, and replacing parts of a clip. Popular with independent filmmakers.

Sample Video

A short demo clip with synchronized English subtitles.

Subtitles & Accessibility

The video above includes a .vtt subtitle file linked through the <track> tag. WebVTT is the standard format for browser-native captions: it makes content accessible to viewers who are deaf or hard of hearing, supports translations, and helps search engines understand the clip's content.

Things to Watch For

Generated audio and video bring real risks: voice cloning can be used for impersonation, music can copy the style of real artists without permission, and short videos can still misrepresent events. Clips should always be labelled as AI-generated, and consent must be obtained before cloning a real person's voice.