UDM-GRPO is the first framework to integrate UDM with RL, which treats the final clean sample as the action for stable optimization and reconstructs trajectories via the diffusion forward process to significantly improve base model performance across multiple T2I tasks.
Emu3.5 is a natively multimodal world model that unifies vision and language through end-to-end next-token prediction on interleaved video-derived data, enhanced by reinforcement learning and DiDA-based parallel decoding for efficient, spatiotemporally consistent generation.
URSA is a simple yet powerful discrete framework that formulates video generation as an iterative process of global refinement over spatiotemporal tokens, enabling efficient scaling to long-duration videos.
NOVA is a non-quantized autoregressive model that enables efficient video generation
by reformulating the video creation as frame-by-frame and set-by-set predictions.
See3D is a scalable visual-conditional MVD model for open-world 3D creation, which can be trained on
web-scale video collections without camera pose annotations.
GeoDream is a 3D generation method that integrates explicit generalized 3D priors with 2D
diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric
structures without sacrificing diversity or fidelity.
SketchKnitter is a method that achieves vectorized sketch generation by reversing the stroke
deformation process using a diffusion model learned from real sketches, enabling the creation of
higher quality, visually appealing sketches with fewer sampling steps.