Sora (text-to-video model)

Imagine typing a simple phrase and watching a hyper-realistic, dynamic video materialize before your eyes — that's the promise of Sora. Developed by OpenAI, this groundbreaking text-to-video model is pushing the boundaries of what artificial intelligence can create, from fantastical dreamscapes to eerily accurate simulations of our world. But how does it work, and what does it mean for the future of creativity and information? Sora represents a significant leap in generative AI, producing high-fidelity videos from text prompts and understanding complex scene dynamics. At its core, Sora uses an innovative 'diffusion transformer' architecture, adapting techniques from text-to-image generation to the temporal dimension of video. Despite its impressive capabilities, Sora faces challenges with physics simulation and causality, raising important questions about AI's role in creative industries and potential for misuse.

Source: Wikipedia

AI Summary

Sora represents a significant leap in generative AI, producing high-fidelity videos from text prompts and understanding complex scene dynamics.
At its core, Sora uses an innovative 'diffusion transformer' architecture, adapting techniques from text-to-image generation to the temporal dimension of video.
Despite its impressive capabilities, Sora faces challenges with physics simulation and causality, raising important questions about AI's role in creative industries and potential for misuse.

Sora: The Sky's the Limit for AI Video

Sora, aptly named after the Japanese word for 'sky' to signify its 'limitless creative potential,' is OpenAI's venture into the world of text-to-video generation. Building on the successes of its text-to-image sibling, DALL-E, Sora allows users to conjure short video clips simply by describing them in a prompt.

A Rapid Evolution

The journey to Sora wasn't solitary. Before its unveiling, several other significant text-to-video models had already emerged from tech giants like Meta, Runway, and Google, each pushing the boundaries of what was possible. OpenAI itself had honed its generative craft with DALL-E 3, released in September 2023, setting the stage for more ambitious creations.

On February 15, 2024, the world got its first glimpse of Sora's stunning capabilities. OpenAI previewed a series of high-definition videos: an SUV traversing a snowy mountain, a fluffy monster by a candle, and even fake historical footage of the California gold rush. These clips, some up to a minute long, showcased an unprecedented level of realism and detail, immediately capturing global attention.

Early Access and Ethical Quandaries

Before its wider release, Sora was put through rigorous testing. OpenAI engaged a 'red team' of experts in misinformation and bias to stress-test the model for potential vulnerabilities. Additionally, a select group of artists and filmmakers were granted early access, offering valuable feedback on Sora's creative applications.

However, the path to public access wasn't entirely smooth. In November 2024, an API key for Sora was leaked by a group protesting what they called 'art washing.' OpenAI swiftly revoked access, emphasizing that hundreds of artists had voluntarily participated in its development, helping to shape the model responsibly.

By December 2024, Sora became available to ChatGPT Plus and ChatGPT Pro users, marking its official public debut. OpenAI also announced future plans to integrate Sora directly into ChatGPT, allowing users to generate videos seamlessly within the chatbot interface.

How Sora Works: A Glimpse Under the Hood

At its technical core, Sora is a 'diffusion transformer' — a sophisticated adaptation of the technology behind DALL-E 3. Think of it as a highly advanced 'denoising' process. It starts with what looks like static noise, then gradually refines it, adding detail and coherence until a complete, coherent video emerges.

Instead of working directly with raw video frames, Sora operates in a 'latent space,' a compressed, abstract representation of the video. It essentially denoises 3D 'patches' of this latent data, considering not just individual frames but how they change over time. This allows it to understand and generate motion, depth, and consistency across a video.

A crucial step in Sora's training involves 're-captioning.' OpenAI uses a video-to-text model to create incredibly detailed descriptions for its vast dataset of training videos. This 'data augmentation' ensures the model learns to associate specific visual elements and actions with precise textual prompts, leading to more accurate generations.

While OpenAI used a mix of publicly available and licensed copyrighted videos for training, the exact scale and sources remain undisclosed. This vast dataset allows Sora to learn complex visual patterns, object interactions, and even emergent properties like rudimentary 3D understanding and varying camera angles — all without explicit programming.

Capabilities and Current Limitations

Sora's ability to generate coherent scenes, animate complex characters, and capture diverse aesthetics is truly remarkable. Researchers observed that the model automatically created different video angles and even figured out how to create 3D graphics from its dataset alone, demonstrating an emergent grasp of cinematic grammar.

However, Sora isn't flawless. OpenAI openly acknowledges its struggles with accurately simulating complex physics, understanding causality, and even differentiating basic spatial directions like left from right. For example, a video showing wolf pups might see them inexplicably multiply or merge, creating an illogical scenario.

Safety is also paramount. In line with OpenAI's existing guidelines, Sora is designed to restrict prompts for sexual, violent, hateful, or celebrity imagery. It also prevents the generation of content featuring pre-existing intellectual property, ensuring responsible and ethical use of the technology.

To foster transparency and help identify AI-generated content, all videos produced by Sora are tagged with C2PA metadata. This digital fingerprint provides a reliable indicator that the content originated from an AI model, aiding in the fight against misinformation.

Reception and Future Impact

The debut of Sora sparked a mixture of awe and apprehension across industries. Technology reviewers praised the 'impressive' realism but cautioned that the showcase videos were likely cherry-picked, hinting that typical output might not always be as polished.

Concerns immediately arose about the potential for widespread disinformation, particularly in political campaigns, and the difficulty of discerning real from fake. Many noted Sora's 'emergent grasp of cinematic grammar' but emphasized that it would be a long time, if ever, before such text-to-video models could truly threaten traditional filmmaking.

The film industry, in particular, felt the tremors. Acclaimed filmmaker Tyler Perry, known for his expansive studios, announced he was putting an $800 million expansion on hold. He cited deep concern about Sora's potential impact on the future of film production and creative employment, underscoring the profound implications of this new technological frontier.

Article

Sora (text-to-video model)

Sora is a text-to-video model and social media app developed by OpenAI. Using artificial intelligence, the model generates short video clips based on prompts, and can also extend existing short videos. In February 2024, OpenAI previewed examples of its output to the public, with the first generation of Sora released publicly for ChatGPT Plus and ChatGPT Pro users in the United States and Canada in December 2024.

The second generation of Sora was released to select users in the US and Canada at the end of September 2025. Sora 2 integrated social media features into the app. On March 24, 2026, OpenAI announced that the Sora app and API would be shutting down. The app is planned to shut down on April 26, 2026 and the API on September 24, 2026.

By default, the generator uses copyrighted material in its videos, unless copyright holders actively opt out of having their content included. Videos contain a visible, moving digital watermark to prevent misuse, but a week after Sora 2's release, third-party programs became available which could remove the watermark.

Background

Sora (text-to-video model)

A woman walking down a Tokyo street at night, first generation, February 2024

Several other models capable of generating video from text had been created prior to Sora, including Meta's Make‑A‑Video, Runway's Gen‑2 and Google Veo. OpenAI, the company behind Sora, had released DALL·E 3, the third of its DALL-E text-to-image models, in September 2023.

History

Initial release

The team that developed Sora named it after the Japanese word for 'sky' to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it had created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush. OpenAI stated that it was able to generate videos as long as one minute. The company then shared a technical report that highlighted the methods used to train the model. OpenAI CEO Sam Altman also posted a series of tweets responding to Twitter users' prompts with Sora-generated videos of the prompts.

As of December 9, 2024, OpenAI had gradually made Sora available to the public for ChatGPT Pro and ChatGPT Plus users in the U.S. and Canada. Prior to this, the company had provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields. In February 2025, OpenAI announced plans to integrate Sora into ChatGPT by letting users generate Sora videos from the chatbot.

Sora 2

Sora 2 was unveiled on September 30, 2025, with an iOS app at the same time, as well as an Android app two months later. All videos generated by the model feature a visible, moving watermark to prevent misuse behaviors of the tool. The previous version of Sora also added a safety watermark to allow viewers to distinguish between real and fictional content. On October 7, 404 Media reported that third-party programs that could remove the watermark from Sora 2 videos had become prevalent.

Many outlets, such as Wired magazine, have noted that the Sora 2 app is overtly similar to TikTok in style and features.

Discontinuation

On March 24, 2026, OpenAI announced on X that it was discontinuing Sora in both the mobile app and the API. OpenAI noted that the app is planned to shut down on April 26, 2026, and the API on September 24, 2026. According to The Hollywood Reporter, OpenAI's partnership with Disney, which included a licensing agreement allowing Disney characters to be used within Sora, was also coming to an end.

The decision prompted British technology news website The Register to label OpenAI a "product-killer", following in the footsteps of other technology companies such as Google, Amazon Web Services, Broadcom, Cloud Software Group, and Netscape.

Legal regulation

Sora (text-to-video model)

In November 2024, an API key for Sora access was leaked by a group of testers on Hugging Face who posted a manifesto stating that they were protesting that Sora was used for "art washing". OpenAI revoked all access three hours after the leak was made public and stated that "hundreds of artists" have shaped the development and that "participation is voluntary".

At the time of its launch, Sora 2 allowed copyrighted content by default unless copyright holders contacted OpenAI to restrict the generation of their content on the platform. On October 3, 2025, OpenAI stated that a future update to Sora 2 would give copyright holders "more granular control" over the generation of copyrighted content, but the company did not state whether existing content would be removed. On October 6, the chairman of the MPA criticized OpenAI's approach to copyright with Sora 2.

On December 11, 2025, the Walt Disney Company announced that it would invest $1 billion in OpenAI to allow users to generate more than 200 of its copyrighted characters on Sora 2. These characters include those from Disney Animation, Pixar, Marvel Studios, and Star Wars.

Capabilities and limitations

Sora (text-to-video model)

A video generated by Sora of someone lying in a bed with a cat on it, containing several mistakes

The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a diffusion transformer, a denoising latent diffusion model with one transformer as its denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Recaptioning is employed to augment training data by using a video-to-text model to create detailed captions for videos.

OpenAI trained the model using publicly available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its limited capacity to simulate complex physics, to understand causality and to differentiate left from right. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful or celebrity imagery, as well as content featuring existing intellectual property.

Sora researcher Tim Brooks stated that the model learned how to create 3D graphics from its dataset alone, while fellow Sora researcher Bill Peebles said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are also tagged with C2PA metadata to indicate that they are AI-processed.

Reception

Positive

In 2024, Will Douglas Heaven of the MIT Technology Review called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output. Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming".

In October 2025, The New York Times remarked that the release of the Sora 2 app in September 2025 was "jaw-dropping (for better and worse)" though also remarked that the app was a "social network in disguise" and "the type of product that companies like Meta and X have sought to build: a way to bring A.I. to the masses that people can share." The article expressed concern regarding the product's potential impact on society and its potential use to promote misinformation, disinformation, and scams.

Negative

Some internet users and online content creators, such as Hank Green, called the mobile app "SlopTok", a reference to both the popular mobile app TikTok and the popular term AI slop.

Filmmaker Tyler Perry announced he would be putting a planned $800 million expansion of his Atlanta studio on hold, expressing concern about Sora's potential impact on the film industry.

OpenAI came under controversy over character generation after Sora 2 produced several videos that featured copyrighted characters. The company stated it would work with rights holders to block characters from Sora at their request, giving copyright holders more control. In October 2025, Japan's Content Overseas Distribution Association submitted a request to OpenAI demanding that it stop using the copyrighted content of its member companies, including Studio Ghibli and Square Enix.

Various estates of celebrities have threatened legal action against OpenAI's Sora 2 app, due to deepfake videos being created of their likeness, including celebrities that have died. Family members of the late comedians Robin Williams and George Carlin also urged OpenAI to take action against "hurtful videos" and to restrict deepfakes of their loved ones. OpenAI restricted users from making videos of the late Martin Luther King Jr. and gave estates the ability to opt out of those they represent.

American academic Oren Etzioni expressed concerns over the technology's ability to create online disinformation for political campaigns. For Wired, Steven Levy similarly wrote that it had the potential to become "a misinformation train wreck" and opined that its preview clips were "impressive" but "not perfect" and that it "show[ed] an emergent grasp of cinematic grammar" due to its unprompted shot changes. Levy added, "[i]t will be a very long time, if ever, before text-to-video threatens actual filmmaking."

In popular culture

Sora (text-to-video model)

The episode "Sora Not Sorry" from South Park is a satire that critiques AI deepfake videos and copyright issues surrounding generative AI models, with the title being a reference to Sora.