Sergei Manzhos1,Manabu Ihara1
Tokyo Institute of Technology1
Sergei Manzhos1,Manabu Ihara1
Tokyo Institute of Technology1
In machine learnng (ML) applications in the fields of materials science, computational chemistry, and elsewhere, neural networks (NN) play a prominent role. For applications ranging from machine-learned interatomic potentials to DFT functionals to structure-property relations, they offer a high expressive power and generarity (black-box character). However, they require optimization of a large number of nonlinear parameters, leading to a high CPU cost and the dangers of overfitting. These disadvantages become more severe as the dimensionaltity of the feature space increases and the data density necessarily becomes low. As many ML techniques, NNs also do not provide physical insight.<br/>We will present a method to construct NNs with rule-based definitions of parameters which permits dispensing with non-linear parameter optimization altogether. We also construct optimal neuron activation functions for the problem at hand that are different for different neurons, thereby increasing the expressive power of the NN. It is the absence of the need to deal with nonlinear optimization that makes this meaningful (contrary to backpropagation that works best with the same neuron activation function for all neurons). The neuron activation functions are easilty obtained from an additive Gaussian process regression [1] in redundant coordinates that are neuron arguments.<br/>As a result, we ontain a method that combines the expressibe power of an NN with the robustness of a linear regression: we will demonstrate on the examples of the fitting of interatomic potentials that the NN does not suffer from overfitting as the number of neurons is increased beyond optimal [2]. We will also show that by modifyng the rules with which the NN parameters are defined, one can easily obtain an orders of coupling representation (often used in physics and computational chemistry under the names many-body and N-mode representation, respectively) [3] which also helps generate elements of insight [4] while maintaining the generality of the method.<br/><br/>[1] Mach. Learn.: Sci. Technol., 3, 01LT02 (2022)<br/>[2] arXiv:2301.05567, https://doi.org/10.48550/arXiv.2301.05567<br/>[3] arXiv:2302.12013, https://doi.org/10.48550/arXiv.2302.12013<br/>[4] Comput. Phys. Commun., 271, 108220 (2022)