Using neural networks for molecular energy regression

John Barber

Raymond Holsapple

SUMMARY

The PubChem library houses structural data for more than 100 million molecules. Simulations of molecular properties are both computationally and financially expensive. In the first phase of this research, we demonstrated how supervised learning could be used to predict molecular energy using only a molecule’s structure. Specifically, we used a vectorized Coulomb matrix concatenated with molecular shape multipoles as our regression training feature. Various models were compared, with our two primary metrics being root mean squared error (RMSE) and the coefficient of determination (R2). A random forest (RF) model performed best on these metrics; however, model training time was also taken into consideration. A bagging regression model (BG) was consistently within 1% of RF in the primary metrics, but was faster by an order of magnitude. In the second phase of this research, we extended our regression modeling to include various neural network architectures using the same training feature from phase one. We have created and tested models with single and multiple hidden layers. In this phase we have also created a new evaluation metric, K-Accuracy, which computes the proportion of predictions that fall within a specified percentage error. After fine tuning various model parameters in our neural network, results were comparable to RF and BG. Currently, we are in phase three of this research where we are investigating two key improvements. We are coding an algorithm that creates a training feature called Bag-of-Bonds, which accounts for bond-type, and we are extending our neural network to a deep neural network.

KEYWORDS

Machine Learning - Neural Networks - Computational Chemistry - Molecular Properties - Supervised Learning

Free Access

PAGES

NUMBER

Volumen: 93 Número: 1 Parte: 0 (2021)

COLLECTIONS

Biology
Social Sciences
Mathematics
Environment
Chemistry

JOURNALS RELATED

Proceedings of the West Virginia Academy of Science
Jurnal ILMU DASAR
Mustansiriyah Journal of Science

Articles related

Constructing Fuzzy Time Series Model Using Combination of Table Lookup and Singular Value Decomposition Methods and Its Application to Forecasting Inflation Rate

Agus Maman Abadi

Fuzzy time series is a dynamic process with linguistic values as its observations. Modelling fuzzy time series data developed by some researchers used discrete membership functions and table lookup method from training data. This paper presents a new met... see more

Revista: Jurnal ILMU DASAR

Open Access

Statistical Inference for Modeling Neural Network in Multivariate Time Series

Dhoriva Urwatul Wutsqa, Subanar Subanar, Suryo Guritno, Zanzawi Soejoeti

We present a statistical procedure based on hypothesis test to build neural networks model in multivariate time series case. The method involved strategies for specifying the number of hidden units and the input variables in the model using inference of ... see more

Revista: Jurnal ILMU DASAR

Open Access

New Procedures for Model Selection in Feedforward Neural Networks for Time Series Forecasting

Suhartono Suhartono

The aim of this paper is to propose two new procedures for model selection in Neural Networks (NN) for time series forecasting. Firstly, we focused on the derivation of the asymptotic properties and asymptotic normality of NN parameters estimator. Then, ... see more

Revista: Jurnal ILMU DASAR

Open Access

Image Processing of SEM Image Nano Silver Using K-means MATLAB Technique

Elham Jasim Mohammad

Nanotechnology is one of the non-exhaustive applications in which image processing is used. For optimal nanoparticle visualization and characterization, the high resolution Scanning Electron Microscope (SEM) and the Atomic Force Microscope (AFM) are used... see more

Revista: Mustansiriyah Journal of Science

Open Access

Decision support system for tool condition monitoring in milling process using artificial neural networkTool Condition Monitoring System

Mohanraj Thangamuthu, Tamilvanan A

This work discusses the design of tool condition monitoring system (TCMs) during milling of AISI stainless steel 304 using sound pressure and vibration signals. Response Surface Methodology (RSM) was used to design the experiments. The various milling pa... see more

Revista: Journal of Engineering Research (JER)

Open Access