Pawan Goyal1,Kishalay Das1,Bidisha Samanta2,Seung-Cheol Lee3,Satadeep Bhattacharjee3,Niloy Ganguly1
IIT Kharagpur1,Google2,Indo Korea Science and Technology Center3
Pawan Goyal1,Kishalay Das1,Bidisha Samanta2,Seung-Cheol Lee3,Satadeep Bhattacharjee3,Niloy Ganguly1
IIT Kharagpur1,Google2,Indo Korea Science and Technology Center3
In this talk, we will describe our attempts towards large-scale pretraining to be able to predict properties of crystalline properties to lower the need for large property tagged datasets. We will present a deep-learning framework, CrysXPP that uses an autoencoder, CrysAE. The important structural and chemical properties captured by CrysAE from a large amount of available crystal graphs data helped in achieving low prediction errors. Moreover, we design a feature selector that helps to interpret the model’s prediction. Most notably, when given a small amount of experimental data, CrysXPP is consistently able to outperform conventional DFT. A detailed ablation study establishes the importance of different design steps. We release the large pre-trained model CrysAE. We believe by fine-tuning the model with a small amount of property-tagged data, researchers can achieve superior performance on various applications with a restricted data source.