Claudia Draxl1
Humboldt-Universität zu Berlin1
Claudia Draxl1
Humboldt-Universität zu Berlin1
Advanced machine learning (ML) and, more generally, artificial intelligence (AI) algorithms have entered many fields of research, also successfully predicting new materials with improved properties. However, one drawback remains that almost all of these investigations are based on data sets that have been created or adapted for the specific purpose. Therefore, ML results are mainly interpolations rather than out-of-the-box predictions. To change this situation, data from different sources must be brought together. Leveraging the knowledge created by the entire community promises major breakthroughs in AI for materials research. This raises the issues of veracity and variety, two of the 4V challenges of big data. Both have a strong impact on the FAIRness of materials science results, especially the "I" - interoperability. One challenge here is to introduce metrics to assess the conditions under which data can be shared. Of course, this also depends on the specific research question. Again, AI can help speed up the process. In this talk, I will first focus on how to compare data, how to "define" data quality, and how to select data suitable for a given purpose. I will illustrate of how seemingly the "same" data can behave differently when used in a given context. I will discuss challenges related to data from the computational materials side as well as from experimental characterization techniques. For the theoretical side, I will also present ML approaches that are able to extrapolate results from computations with computational settings used in daily practice to highly converged results. The availability of such tools is a big step towards the interoperability of computational data.