Dec 3, 2024
1:30pm - 2:00pm
Sheraton, Second Floor, Constitution B
Matthew Evans1,2,3
Université Catholique de Louvain1,Matgenix SRL2,Datalab Industries3
The primary barrier to widespread adoption of AI-accelerated materials science is the availability and quality of data. Researchers lack frictionless tooling and have limited incentive to record their data in such a way that is immediately amenable for machine learning, whether by them or by others. This talk introduces two data projects in the materials space that aim to lower the barrier to data access and curation by both humans and machines: the OPTIMADE federation of materials databases, and the open-source <i>datalab</i> materials data management platform.<br/><br/>OPTIMADE consists of an international consortium of databases that have designed, over many years, a common application programming interface (API) format, which now allows for 30+ databases across 20+ providers to be seamlessly queried. Such federated data unification enables decentralized data-driven workflows in materials informatics and beyond, from materials selection up to materials discovery. OPTIMADE is supported by several community-oriented tools that allow others to easily contribute their data to this growing ecosystem. This talk will introduce the OPTIMADE ecosystem, discuss the process of consensus-forming amongst provideres, and outline how OPTIMADE could be extended to other domains.<br/><br/>The second project primarily concerns experimental data; <i>datalab</i> is a open-source data management platform that can be customized and adopted by materials research groups to allow for straightforward provenance tracking of samples, devices and raw data. It integrates with the broad open-source community of file format parsers (from the datatractor initiative and other popular packages) to allow for data normalization and simple analysis in the browser for many characterisation techniques (XRD, NMR, Raman, electrochemistry, etc). This platform provides the traditional benefits of having a digital system of record (e.g., an electronic lab notebook)<i>, </i>whlilst also enabling programmatic re-use of data across a research group via its API, with the aim to allow end user programming. By providing labs with control over their data platform, they can develop their own AI-driven developments, as well as selectively sharing and collaborating with others on shared workflows and samples. This talk will summarize the ongoing developments of <i>datalab, </i>including the integration of AI-based agents, and motivate future use cases of a federation of such <i>datalab </i>deployments.