220

arXiv:2501.06074v2 Announce Type: replace
Abstract: We study shallow neural networks with monomial activations and output dimension one. The function space for these models can be identified with a set of symmetric tensors with bounded rank. We describe general features of these networks, focusing on the relationship between width and optimization. We then consider teacher-student problems, which can be viewed as problems of low-rank tensor approximation with respect to non-standard inner products that are induced by the data distribution. In this setting, we introduce a teacher-metric data discriminant which encodes the qualitative behavior of the optimization as a function of the training data distribution. Finally, we focus on networks with quadratic activations, presenting an in-depth analysis of the optimization landscape. In particular, we present a variation of the Eckart-Young Theorem characterizing all critical points and their Hessian signatures for teacher-student problems with quadratic networks and Gaussian training data.
Be respectful and constructive. Comments are moderated.

No comments yet.