0
StructDiff: Structure-aware Diffusion Model for 3D Fine-grained Medical Image Synthesis
arXiv:2503.09560v2 Announce Type: replace-cross
Abstract: Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing limited capability in reproducing fine-grained anatomical details. Due to the stringent requirement of topological consistency and the complex 3D morphological heterogeneity of medical data, accurately reconstructing fine-grained anatomical details remains a significant challenge. To address these limitations, we propose StructDiff, a Structure-aware Diffusion Model for fine-grained 3D medical image synthesis, which enables precise generation of topologically complex anatomies. In addition to the conventional mask-based guidance, StructDiff further introduces a paired image-mask template to guide the generation process, providing structural constrains and offering explicit knowledge of mask-to-image correspondence. Moreover, a Mask Generation Module (MGM) is designed to enrich mask diversity and alleviate the scarcity of high-quality reference masks. Furthermore, we propose a Confidence-aware Adaptive Learning (CAL) strategy based on Skip-Sampling Variance (SSV), which mitigates uncertainty introduced by imperfect synthetic data when transferring to downstream tasks. Extensive experiments demonstrate that StructDiff achieves state-of-the-art performance in terms of topological consistency and visual realism, and significantly boosts downstream segmentation performance. Code will be released upon acceptance.
Abstract: Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing limited capability in reproducing fine-grained anatomical details. Due to the stringent requirement of topological consistency and the complex 3D morphological heterogeneity of medical data, accurately reconstructing fine-grained anatomical details remains a significant challenge. To address these limitations, we propose StructDiff, a Structure-aware Diffusion Model for fine-grained 3D medical image synthesis, which enables precise generation of topologically complex anatomies. In addition to the conventional mask-based guidance, StructDiff further introduces a paired image-mask template to guide the generation process, providing structural constrains and offering explicit knowledge of mask-to-image correspondence. Moreover, a Mask Generation Module (MGM) is designed to enrich mask diversity and alleviate the scarcity of high-quality reference masks. Furthermore, we propose a Confidence-aware Adaptive Learning (CAL) strategy based on Skip-Sampling Variance (SSV), which mitigates uncertainty introduced by imperfect synthetic data when transferring to downstream tasks. Extensive experiments demonstrate that StructDiff achieves state-of-the-art performance in terms of topological consistency and visual realism, and significantly boosts downstream segmentation performance. Code will be released upon acceptance.