From a written line to a full clip.
We were surprised how natural these voices sound now. Modern audio models can clone a voice from just a few seconds of recording, read text aloud in many languages, or compose a short piece of music starting from a one-line description of the mood. We kept this section short — there are many more tools out there we couldn't cover here.
Specializes in realistic voice synthesis and voice cloning. Often used for audiobook narration, dubbing, and accessibility features such as reading articles aloud.
Generates short songs, including lyrics and instrumental tracks, from a one-line description of the style and mood. Useful for background music in videos and presentations.
A short English voiceover, ~20 seconds.
Video generation is the newest of the three areas. Today's tools can produce clips of a few seconds with believable motion, lighting, and camera movement. Longer clips are still difficult — its expensive to generate even ten seconds at high quality.
OpenAI's text-to-video model. Generates short clips from a written prompt, with surprisingly stable physics and lighting. Available through ChatGPT for paying users.
Offers Gen-3 for text-to-video and image-to-video, plus a full web editor for trimming, masking, and replacing parts of a clip. Popular with independent filmmakers.
A short demo clip with synchronized English subtitles.
The video above includes a .vtt subtitle file linked
through the <track> tag. WebVTT is the standard
format for browser-native captions: it makes content accessible to
viewers who are deaf or hard of hearing, supports translations, and
helps search engines understand the clip's content.
Generated audio and video bring real risks: voice cloning can be used for impersonation, music can copy the style of real artists without permission, and short videos can still misrepresent events. Clips should always be labelled as AI-generated, and consent must be obtained before cloning a real person's voice.