What is Sora? Even the AI experts aren’t sure

OpenAI has once again sent jaws dropping southwards with the announcement of Sora, an AI model that can generate video from text prompts. The demonstration videos released by OpenAI at the end of last week look stunning (with the odd obvious glitch), but they’ve also sparked disagreement among AI experts over precisely what Sora is and what it’s capable of.

What has OpenAI said about Sora?

OpenAI hasn’t said a great deal about the underlying technology behind Sora, but its blog post announcing the AI video service did give a few clues as to what’s going on.

The company claims Sora can “generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

The technology is based on a diffusion model “which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps”.

OpenAI admits the technology isn’t perfect, as is evident from many of the demonstration videos that the company released. Body parts occasionally look deformed, objects randomly appear and disappear, and the motion of people moving in scenes can be oddly robotic.

The company says Sora “may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.”

Related: OpenAI’s building GPT-5 – but it hasn’t got a clue what it will do

Sora: a “simulator of many worlds”

Even with its flaws, some AI experts are excited by what they see. Jim Fan, senior AI research scientist at graphics firm Nvidia, is convinced Sora is a much bigger deal than OpenAI’s generative art service, DALLE.

“If you think OpenAI Sora is a creative toy like DALLE… think again,” Fan tweeted. “Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, ‘intuitive’ physics, long-horizon reasoning, and semantic grounding, all by some denoising and gradient maths.”

Fan added: “I won’t be surprised if Sora is trained on lots of synthetic data using [the video game graphics engine] Unreal Engine 5. It has to be!”

He goes on to analyse the following video, explaining why it demonstrates that Sora is much more sophisticated than many are willing to believe:

Fan says the video shows how the “simulator instantiates two exquisite 3D assets”, namely the pirate ships, each of which has a different design. The video shows the ships avoiding each other’s paths, not to mention the fluid dynamics of the coffee, with detail such as the foam around the hull of the ships. What’s more “the simulator takes into account the small size of the cup compared to oceans, and applies tilt-shift photography to give a ‘minuscule’ vibe”.

He added: “The semantics of the scene does not exist in the real world, but the engine still implements the correct physical rules that we expect.”

Fan predicts that once OpenAI can “add more modalities and conditioning, then we have a full data-driven UE [Unreal Engine] that will replace all the hand-engineered graphics pipelines”.

Hallucination hype

Other experts are much less convinced that Sora is learning physics. Gary Marcus, who sold his AI company to Uber and is co-author of the book Rebooting AI, claims that Sora is an “image prediction engine” and that the glitches seen in the demo videos that OpenAI has released are evidence of that.

In a Substack article entitled “Sora’s surreal physics“, Marcus points to several examples in different videos where objects simply appear or disappear, where chairs levitate in mid-air, or where animals perform physics-defying feats.

“We will, I am certain, see more systemic glitches as more people have access [to Sora],” he writes.

“And importantly, I predict that many will be hard to remedy. Why? Because the glitches don’t stem from the data, they stem from a flaw in how the system reconstructs reality. One of the most fascinating things Sora’s weird physics glitches is most of these are NOT things that appears in the data. Rather, these glitches are in some ways akin to LLM ‘hallucinations’, artifacts from (roughly speaking) decompression from lossy compression. They don’t derive from the world.”

Marcus claims that more training data won’t fix the problem, nor is there any way to tell an AI system to obey the laws of physics. “Sora is fantastic,” Marcus writes, “but it is akin to morphing and splicing, rather than a path to the physical reasoning we would need for AGI [artificial general intelligence]. It is a model of how images change over time, not a model of what entities do in the world.”

Avatar photo
Barry Collins

Barry has 20 years of experience working on national newspapers, websites and magazines. He was editor of PC Pro and is co-editor and co-owner of BigTechQuestion.com. He has published a number of articles on TechFinitive covering data, innovation and cybersecurity.

NEXT UP