Create Videos from Text with Sora from OpenAI!

What is Sora?

OpenAI, the developer of the chatbot technology ChatGPT, has now expanded into the video realm. They have introduced Sora, an AI model that can transform descriptive sentences into high-definition videos. Similar to DALL-E (a text-to-image generator), Sora allows users to type in a desired scene and automatically generate a video clip. Sora’s capabilities are even more astonishing because it can also:

Create videos from static images.
Extend existing videos.
Fill in missing parts of videos.

This technology is exciting for AI enthusiasts, but it also raises serious concerns about the spread of misinformation. With the rise of digital crime worldwide, there is a fear that AI-generated deepfake videos could be manipulated for malicious purposes. Data from Clarity, a machine learning company, shows a 900% increase in deepfake creation over the past year.

Sora joins the ranks of OpenAI’s competitors in the AI video generation space, such as Lumiere from Meta and Stable Video Diffusion from Stability AI. Amazon is also not far behind with “Create with Alexa,” which specifically generates children’s animated content based on verbal commands.

How Does It Work?

Currently, Sora can only generate videos with a maximum duration of one minute. OpenAI, which is backed by Microsoft, has a long-term goal of connecting various modalities: text, images, and videos. In doing so, they aim to offer a more comprehensive and complex AI model suite.

Although not yet publicly available, OpenAI has granted limited access to a team of “red teamers” to test Sora’s vulnerability to misinformation and bias. Currently, OpenAI only displays 10 sample videos on their website, while the technical documentation will be released at a later date.

OpenAI is also developing a “detection classifier” to identify videos created by Sora. They also plan to include special metadata in the video output to facilitate the identification of AI-generated content. This is similar to Meta’s efforts to use metadata to recognize AI-generated images during future elections.

Sora, which relies on a diffusion model and the Transformer architecture (introduced by Google in 2017). Sora is seen as a foundation for AI models that can understand and “mimic” the real world. However, its emergence also triggers important discussions about the ethics of AI development and the responsibility to prevent misuse of the technology.

Also Read: Wow! Amazon AI Can Talk Like a Human!