Back to overview

Lecture

Importance of performance assessment for machine learning models of entropy of all solid substances



Thermodynamic stability is the foundation for any physically based model of the materials properties-structure-process chain. High-throughput first principles calculations of hundreds of thousands of materials such as reported on materialsproject.org, oqmd.org or aflowlib.org can be used as basic description of thermodynamic stability, however only at 0K, due to a significant increase in computational effort at higher temperatures.

In this work, we report a methodology that allows up-scaling of thermodynamic stability from 0K up to the melting point using machine learning.

A machine learning model for standard entropy S298K is trained on both experimental and phonon data. We show that a simple linear rule of mixture is sufficient to reproduce phonon calculated vibrational entropy with an accuracy which is close to the difference between experimental and phonon data. To avoid over-fitting, the ML model based on an artificial annealing algorithm is then trained on only the experimental data set, using the equally large phonon dataset as test set. We will show that the most important descriptors are unit cell volume and enthalpy of formation. The error even on the huge test set decreases when increasing the number of descriptors up to 25 descriptors. However, we will show that the choice of number of descriptors should be based on the model performance, i.e. the answer to the question “How can thermodynamic properties be extrapolated from 0K to high temperatures?”. To evaluate model performance, we present liquidus temperature calculations in 570 binary systems using the enthalpy of formation from materialsproject.org and the ML model developed here. The deviation between calculated and experimental liquidus temperatures indicates that there is no reason to increase the number of descriptors beyond five. We therefore propose that the assessment of model performance should be an integral part of machine learning models in engineering sciences.

 

Speaker:
Dr. Florian Tang
GTT-Technologies
Additional Authors:
  • Dr. Moritz to Baben
    GTT-Technologies