ByteDance’s OmniHuman can make a photo talk, sing, and move – Here’s how it works

ByteDance’s OmniHuman can make a photo talk, sing, and move – Here’s how it works

Chinese tech giant ByteDance has developed a new AI system that can turn a single photo into a video of a person talking, singing, and moving naturally. Called OmniHuman, this technology takes AI-generated media to the next level, going beyond older models that could only animate faces or upper bodies.

How OmniHuman works

OmniHuman’s secret lies in its massive training data and smart AI design. Researchers fed it over 18,700 hours of human video footage, using a unique “omni-conditions” approach. This means it simultaneously learns from text, audio, and body movements, making the animations much more natural.

“Human animation has improved a lot recently,” ByteDance researchers said in a paper published on arXiv. “But existing methods still struggle to scale up and be used widely.”

By using multiple input types, the team made the AI more efficient and improved its ability to create lifelike human movements.

It can create videos of people giving speeches, playing music, and performing other activities with surprisingly realistic motion. Compared to older models, it’s much better at syncing body movement with speech.

The race for next-gen AI video tech

ByteDance’s breakthrough comes as tech giants like Google, Meta, and OpenAI are also working on AI-generated video. These companies see huge potential for entertainment, education, and digital avatars.

Google’s Veo 2: Launched in December 2024, Veo 2 is an advanced AI video model capable of producing high-quality 4K clips that align with user prompts. It offers extensive camera controls, allowing for more detailed and realistic motion.

OpenAI’s Sora: Released in December 2024, Sora enables users to generate videos from text descriptions, creating imaginative and realistic scenes. It has gained attention for its ease of use but has faced competition from newer models like Veo 2.

Runway’s Gen-3 Alpha: Available since September 2024, Gen-3 Alpha offers filmmakers a cost-effective and time-saving way to create AI-generated videos. It provides advanced camera control features that empower creators to direct AI-generated videos with cinematic precision.

Experts say this tech could revolutionise industries like filmmaking, online learning, and virtual communication. But there are also concerns about misuse in creating deepfakes or misleading content.

ByteDance researchers plan to share more details at an upcoming computer vision conference, though they haven’t said when. As AI-generated content keeps advancing, tools like OmniHuman show how quickly technology is changing how we create and consume media.

WWE Royal Rumble Winner Charlotte Flair Divorces Andrade: Here’s What We Know Previous post WWE Royal Rumble Winner Charlotte Flair Divorces Andrade: Here’s What We Know
In Paytm CEO’s Post Featuring Sam Altman, An AI Twist Next post In Paytm CEO’s Post Featuring Sam Altman, An AI Twist

Leave a Reply

Your email address will not be published. Required fields are marked *