213
Overcoming Spectral Bias via Cross-Attention
arXiv:2512.18586v1 Announce Type: new
Abstract: Spectral bias implies an imbalance in training dynamics, whereby high-frequency components may converge substantially more slowly than low-frequency ones. To alleviate this issue, we propose a cross-attention-based architecture that adaptively reweights a scaled multiscale random Fourier feature bank with learnable scaling factors. The learnable scaling adjusts the amplitudes of the multiscale random Fourier features, while the cross-attention residual structure provides an input-dependent mechanism to emphasize the most informative scales. As a result, the proposed design accelerates high-frequency convergence relative to comparable baselines built on the same multiscale bank. Moreover, the attention module supports incremental spectral enrichment: dominant Fourier modes extracted from intermediate approximations via discrete Fourier analysis can be appended to the feature bank and used in subsequent training, without modifying the backbone architecture.
We further extend this framework to PDE learning by introducing a linear combination of two sub-networks: one specialized in capturing high-frequency components of the PDE solution and the other in capturing low-frequency components, with a learnable (or optimally chosen) mixing factor to balance the two contributions and improve training efficiency in oscillatory regimes. Numerical experiments on high-frequency and discontinuous regression problems, image reconstruction tasks, as well as representative PDE examples, demonstrate the effectiveness and robustness of the proposed method.
Abstract: Spectral bias implies an imbalance in training dynamics, whereby high-frequency components may converge substantially more slowly than low-frequency ones. To alleviate this issue, we propose a cross-attention-based architecture that adaptively reweights a scaled multiscale random Fourier feature bank with learnable scaling factors. The learnable scaling adjusts the amplitudes of the multiscale random Fourier features, while the cross-attention residual structure provides an input-dependent mechanism to emphasize the most informative scales. As a result, the proposed design accelerates high-frequency convergence relative to comparable baselines built on the same multiscale bank. Moreover, the attention module supports incremental spectral enrichment: dominant Fourier modes extracted from intermediate approximations via discrete Fourier analysis can be appended to the feature bank and used in subsequent training, without modifying the backbone architecture.
We further extend this framework to PDE learning by introducing a linear combination of two sub-networks: one specialized in capturing high-frequency components of the PDE solution and the other in capturing low-frequency components, with a learnable (or optimally chosen) mixing factor to balance the two contributions and improve training efficiency in oscillatory regimes. Numerical experiments on high-frequency and discontinuous regression problems, image reconstruction tasks, as well as representative PDE examples, demonstrate the effectiveness and robustness of the proposed method.