Today, AI video generation tools are changing industries such as design, marketing, entertainment, and education by producing realistic video content. In particular, the Sora, Gen-3 and other live video models can generate realistic, continuous, high-quality video blockbusters by simply inputting a few lines of prompt text. While this technology has brought countless possibilities to creators around the world, it has also brought many harms and risks to the general public , especially in terms of spreading false information, propaganda, scams and phishing. Recently, Professor Junfeng Yang's team at Columbia University developed a video detection tool called DIVID (DIffusion-generated VIdeo Detector). For videos generated by models such as SORA, Gen-2 and Pika, the detection accuracy rate reached 93.7% . The research paper, which includes open-source code and a dataset, was presented at the Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle last month. How was DIVID created? Existing deepfake detectors perform well in identifying samples generated by GANs, but are not robust enough in detecting videos generated by diffusion models. In this work, the research team used a new tool called DIVID to detect videos generated by AI. According to reports, DIVID is based on Raidar, a result released by the team earlier this year, which detects text generated by AI by analyzing the text itself without accessing the internal operation of the large language model (LLM). Raidar uses LLM to restate or revise a given text, and then measures the number of edits the system makes to that text. More edits means the text is more likely to be written by a human, while fewer edits means the text is more likely to be machine-generated. They developed DIVID using the same concept. DIVID works by reconstructing a video and comparing the newly reconstructed video with the original. It uses the DIRE value to detect diffusion-generated videos because the method is based on the assumption that the reconstructed images generated by the diffusion model should be very similar to each other because they are sampled from the diffusion process distribution. If there is a significant change, the original video is likely generated by a human, if not, it is likely generated by AI. Figure | DIVID detection process. In step 1, given a series of video frames, the research team first uses the diffusion model to generate a reconstructed version of each frame. Then the DIRE value is calculated by the reconstructed frame and its corresponding input frame; in step 2, the CNN+LSTM detector is trained based on the DIRE value sequence and the original RGB frame. The framework is based on the idea that AI-generated tools create content based on the statistical distribution of large data sets, resulting in “statistical mean” content such as pixel intensity distribution, texture patterns, and noise characteristics in video frames, as well as small inconsistencies that vary unnaturally between frames or anomalous patterns that are more likely to appear in diffusion-generated videos. Figure | Detection performance on the in-domain test set. DIVID outperforms the baseline architecture in terms of accuracy (Acc.) and average precision (AP). RGB represents the pixel frame value in the original video. In contrast, human-generated videos exhibit personalization and deviation from statistical normality. DIVID achieves up to 93.7% detection accuracy on videos generated by Stable Vision Diffusion, Sora, Pika, and Gen-2 in its benchmark dataset. Future Outlook Currently, DIVID is a command-line tool that analyzes videos and outputs whether they were generated by AI or humans, and is only available to developers. The researchers noted that their technology has the potential to be integrated into Zoom as a plug-in to detect deep fake calls in real time . The team is also considering developing a website or browser plug-in to make DIVID available to ordinary users. The researchers are currently improving DIVID’s framework to handle different types of synthetic videos from open source video generation tools. They are also using DIVID to collect videos to expand the DIVID dataset. "Our framework makes significant progress in detecting AI-generated content," said Dr. Yun-Yun Tsai, one of the authors of the paper. "There are too many bad actors using AI to generate videos, and it is critical to stop them and protect society." Reference Links: https://arxiv.org/abs/2406.09601 https://techxplore.com/news/2024-06-tool-ai-generated-videos-accuracy.html |
>>: Why do we often feel depressed in summer? 4 tips to get rid of "seasonal depression"
Many people are often told by doctors that they h...
The recent hit TV series "The World" ha...
: : : : : : : : : : : : : : :...
The original intention of writing this article is ...
Electronic commerce, referred to as "e-comme...
Now that we have introduced all the basic parts, ...
[51CTO.com original article] Shi Yafeng is a full...
Planning and production Source: Higher-end humans...
The next part of the practical guide to community...
According to a report by CCTV Finance Channel on ...
Regarding the skills of choosing titles , I think...
How much is the quotation for Hefei lighting prod...
At present, there are more than 3,000 certified m...
In addition to focusing on the e-commerce field, ...