122
Video Generation with Stable Transparency via Shiftable RGB-A Distribution Learner
arXiv:2509.24979v3 Announce Type: replace
Abstract: Generating RGB-A videos, which include alpha channels for transparency, has wide applications. However, current methods often suffer from low quality due to confusion between RGB and alpha. In this paper, we address this problem by learning shiftable RGB-A distributions. We adjust both the latent space and noise space, shifting the alpha distribution outward while preserving the RGB distribution, thereby enabling stable transparency generation without compromising RGB quality. Specifically, for the latent space, we propose a transparency-aware bidirectional diffusion loss during VAE training, which shifts the RGB-A distribution according to likelihood. For the noise space, we propose shifting the mean of diffusion noise sampling and applying a Gaussian ellipse mask to provide transparency guidance and controllability. Additionally, we construct a high-quality RGB-A video dataset. Compared to state-of-the-art methods, our model excels in visual quality, naturalness, transparency rendering, inference convenience, and controllability. The released model is available on our website: https://donghaotian123.github.io/Wan-Alpha/.
Abstract: Generating RGB-A videos, which include alpha channels for transparency, has wide applications. However, current methods often suffer from low quality due to confusion between RGB and alpha. In this paper, we address this problem by learning shiftable RGB-A distributions. We adjust both the latent space and noise space, shifting the alpha distribution outward while preserving the RGB distribution, thereby enabling stable transparency generation without compromising RGB quality. Specifically, for the latent space, we propose a transparency-aware bidirectional diffusion loss during VAE training, which shifts the RGB-A distribution according to likelihood. For the noise space, we propose shifting the mean of diffusion noise sampling and applying a Gaussian ellipse mask to provide transparency guidance and controllability. Additionally, we construct a high-quality RGB-A video dataset. Compared to state-of-the-art methods, our model excels in visual quality, naturalness, transparency rendering, inference convenience, and controllability. The released model is available on our website: https://donghaotian123.github.io/Wan-Alpha/.