Back

Developing a machine learning framework for accelerated material discovery

machine learning framework for material property prediction (Download Image)

An illustration of the proposed machine learning framework for material property prediction.

Driven by the success of machine learning (ML) in commercial applications such as product recommendations and advertising, researchers are attempting to apply ML tools to scientific data analyzation. One such application area is materials science, where ML methods could accelerate the selection, development, and discovery of materials by learning structure–property relationships. However, several unique challenges exist when applying ML in materials science applications. In an article published in npj Computational Materials, LLNL researchers describe these challenges and propose a ML framework to overcome them.

The article identifies a common pitfall of existing ML techniques, which happens when using underrepresented/imbalanced data—a common occurrence in material science applications. Imbalanced data can cause standard methods for assessing the quality of ML models to break down and lead to misleading conclusions. Additionally, the model’s confidence score—the probability that the model represents reality—cannot be trusted. Alternate methods such as model introspection (using simpler models) result in loss of predictive performance.

To overcome these challenges, the team proposed a general-purpose ML framework that employs an ensemble of simpler models to reliably predict material properties. The framework is composed of three main components: a training procedure for learning from imbalanced data, a rationale generator for model-level and decision-level explainability, and reliable testing and uncertainty quantification techniques to evaluate the prediction performance of ML pipelines. The team demonstrated the technique’s versatility for two applications: predicting properties of crystalline compounds and identifying potentially stable solar cell materials.

This research received support from the Laboratory Directed Research and Development Program (16-ERD-019 and 19-SI-001).

[B. Kailkhura, B. Gallagher, S. Kim, A. Hiszpanski, and T.Y.-J. Han, Reliable and explainable machine-learning methods for accelerated material discovery npj Computational Materials 5, 108 (2019), doi: 10.1038/s41524-019-0248-2.]