
This Is Why AI Videos Feel Wrong
Audio Summary
AI Summary
Modern AI can generate high-quality videos from text prompts with exceptional control, but struggles with realistic motion. While frames look correct, movement often feels wrong. Researchers initially believed more training data and compute would solve this.
However, a new paper challenges this, suggesting that "bad" training data, like cartoons, can hinder AI's ability to learn real-world physics. Cartoons depict unrealistic movements, confusing the AI. The paper introduces a technique to identify specific training videos that influenced an AI's decisions, allowing researchers to pinpoint and remove detrimental examples.
By applying motion masking through optical flow to the AI's internal learning signals, they discovered where decisions originated. To overcome the computational challenge of processing billions of parameters, they compressed these signals into a much smaller representation (512 numbers) using the Johnson–Lindenstrauss projection, preserving essential data properties.
This method led to significant improvements. For example, a coin spinning incorrectly in the base model spun perfectly after removing "bad influences" and fine-tuning with quality data. A user study across 50 videos and 17 participants showed a 74.1% win rate for the new method over the original.
The core message is that quality over quantity in training data is crucial. A small amount of clean, truthful data outperforms vast amounts of low-quality information, preventing the AI from "deforming its thinking."