Uncensored AI Systems Face Challenges in Handling Explicit Content from Famous Personalities
In a groundbreaking development, researchers from the University of Science and Technology of China, the Agency for Science, Technology and Research (A*STAR CFAR), and Nanyang Technological University have proposed a new system called PoseGuard [1]. This system aims to censor body poses and facial expressions in AI-generated videos that may be interpreted as sexually suggestive, offensive, or copyrighted.
PoseGuard operates by detecting and suppressing unsafe generations based on pose or facial landmark inputs. It degrades the output quality whenever it encounters malicious or unsafe poses, such as discriminatory gestures, sexually suggestive poses, or poses imitating copyrighted celebrity movements, while maintaining high-fidelity outputs for safe, benign inputs [1].
The system employs a dual-objective training strategy that balances generation fidelity with safety alignment. It fine-tunes models efficiently via LoRA (Low-Rank Adaptation), allowing the system to adapt quickly and modularly to new unsafe poses as they are identified, supporting pose-specific LoRA fusion for flexible updates [1].
Compared to existing methods, PoseGuard demonstrates stronger effectiveness and robustness. It robustly suppresses unsafe outputs even when poses are slightly varied or perturbed, avoiding false positives for benign pose deviations [2]. It generalizes well across different generation modalities, including full-body pose-guided video and more localized facial landmark-guided video synthesis (e.g., facial expressions), maintaining effectiveness in both [1][2].
When applied to reference image-guided generation, PoseGuard more effectively limits impersonation risks by exploiting dense identity information in input images, sometimes nearly fully degrading unauthorized outputs [2].
PoseGuard was tested across three categories: effectiveness, robustness, and generalization. The results showed that the model was able to suppress outputs from unsafe facial landmarks while leaving benign expressions unaffected [1]. The model also continued to suppress unsafe generations under mild perturbations, indicating that the defense remains robust to moderate perturbations [1].
The new PoseGuard system claims to be the first to degrade output when an unsafe pose is detected. It targets local installations, as safeguards in FOSS models can be easily overcome [1].
In addition to its applications in generative video systems, PoseGuard was applied to facial landmark-guided video synthesis using the AniPortrait system [1]. The system was also tested under conditions that simulate real-world deployment, where input poses may not match predefined examples exactly [1].
The paper titled "PoseGuard: Pose-Guided Generation with Safety Guardrails" was authored by six researchers from the aforementioned institutions [1]. The researchers repurpose the logic of backdoor attacks to build a defense mechanism directly into the model, attributing the system's effectiveness to the dense identity information in reference images [1].
In conclusion, PoseGuard is a significant step forward in ensuring the safety and appropriateness of AI-generated content. By detecting and suppressing unsafe generations, it offers a robust and effective solution for censorship in generative video systems.
[1] PoseGuard: Pose-Guided Generation with Safety Guardrails. (2022). ArXiv:2203.14229 [cs.AI]. [2] PoseGuard: A Robust and Efficient System for Safe Generative Video Synthesis. (2022). Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Technology plays a crucial role in the newly proposed system PoseGuard, as it employs advanced techniques like dual-objective training, LoRA fine-tuning, and backdoor attack logic to detect and suppress unsafe generative content in AI-generated videos. Moreover, the system's efficiency in adapting quickly to new unsafe poses and generalizing across different generations demonstrates the power of technology in addressing challenges in content censorship.