Symposium Organizers
Benji Maruyama, Air Force Research Laboratory
Kristofer Reyes, University at Buffalo, State University of New York
Kristin Persson, Lawrence Berkeley National Laboratory
Aleksandra Vojvodic, University of Pennsylvania
LN02.01: Reports on Current Initiatives
Session Chairs
Tuesday PM, April 03, 2018
PCC West, 100 Level, Room 106 C
10:30 AM - LN02.01
Introduction—What is the Future of Research?
Show Abstract10:45 AM - LN02.01.01
GenX for Materials Informatics—Fusing the Science of Learning with Machines with the Science of Materials
Krishna Rajan1
State University of New York at Buffalo1
Show AbstractThis presentation will provide a perspective of what is needed for the next phase of growth in the field of Materials Informatics. Terms such as machine learning, data mining, AI are often viewed as independent of the materials science paradigm. Thus many informatics studies are simply using these tools to process data and information that has already been measured or computed, which in turn are based on heuristic or theoretical constructs that were in place before the “intervention” of machine learning tools. Hence one can and should justifiably challenge if this is really “data driven discovery”! In this talk I present approaches that address this problem directly and provide examples where the research and education paradigm is working for the next generation of Materials Informatics.
11:15 AM - LN02.01.02
Materials Acceleration Platforms
Alan Aspuru-Guzik1
Harvard University1
Show AbstractThe acceleration of materials discovery would lead into immediate benefits for a variety of sectors including the energy sector, healthcare, transportation, etc. Currently, materials discovery is an expensive process that is driven by human intuition. Can robots and artificial intelligence aid humans in accelerating this process? Recently, I co-led with Kristin Persson (UC Berkeley) a workshop in Mexico City under the Mission Innovation Platform to discuss the challenges associated with this transition. The resulting report for Innovation Challenge 6 came up with six challenge areas for materials discovery. I will describe the challenges discovered by the workshop participants and will describe them with research examples from several groups including mine. I will se the inverse design of materials as an illustrative example.
LN02.02: Autonomous Research and Artificial Intelligence
Session Chairs
Tuesday PM, April 03, 2018
PCC West, 100 Level, Room 106 C
1:30 PM - LN02.02.01
AI for Automation Materials Discovery
Carla Gomes1
Cornell University1
Show AbstractArtificial Intelligence (AI) is a rapidly advancing field. Novel machine learning methods combined with reasoning and search techniques have led us to reach new milestones with increasing frequency, from self-driving cars to computer vision, machine translation, computer Go trained on human play, to Go and Chess world-champion level play using pure self-training strategies. These ever expanding AI capabilities open up exciting new avenues for automating scientific discovery. I will discuss our work on using AI for accelerating and automating materials discovery. In particular, we have focused on high-throughput structure determination for combinatorial materials discovery and on solving the phase map diagram problem for composition libraries. While standard statistical and machine learning methods are important to address this challenge, they fail to incorporate relationships arising from the physics of the underlying materials. I will introduce an effective approach based on a tight integration of machine learning methods, to deal with noise and uncertainty in the measurement data, with optimization and inference techniques, to incorporate the rich set of constraints arising from the underlying physics. Finally, I will describe our vision for a Scientific Autonomous Reasoning Agent (SARA), a multi-Agent system to accelerate materials discovery integrating in a synergistic and complementary way, first principles quantum physics, experimental materials synthesis, processing, and characterization, and AI based algorithms for reasoning and scientific discovery, including the representation, planning, optimization, and learning of materials knowledge.
2:00 PM - LN02.02.02
Keynote: The Rise of AI—Status, Prospects and Thresholds
Subbarao Kambhampati1
Arizona State University1
Show AbstractI will start with a perspective on the status and recent progress in AI, and the heightened public expectations surrounding it, with the aim of separating hype from technical reality. I will discuss critical thresholds still to be crossed for human-level intelligence, including the need for human-aware AI systems. I will touch on potential applications of AI technology in materials science research.
3:30 PM - LN02.02
Panel Discussion—What Can AI Do To Accelerate Materials Development?
Show AbstractLN02.03: Poster Session
Session Chairs
Benji Maruyama
Kristin Persson
Kristofer Reyes
Aleksandra Vojvodic
Tuesday PM, April 03, 2018
PCC North, 300 Level, Exhibit Hall C-E
5:00 PM - LN02.03.01
Machine Learning for Accelerated Prediction of Electronic Density of States
Byung Chul Yeo1,Sang Soo Han1
Korea Institute of Science and Technology1
Show AbstractRecently, artificial Intelligence (AI) and machine learning are unlocking the potentials with accelerated prediction of material properties for novel material discovery and design. Electronic density of states (DOS) is a key factor in condensed matter physics that determines the properties of metals. First-principles density-functional theory (DFT) calculations have typically been used to obtain the DOS despite their considerable computation cost. Herein, we report a fast machine-learning method for predicting the DOS patterns of multi-component alloy systems based on a principal component analysis. Within this framework, we input only three features based on the composition and atomic structure: the d-orbital occupation ratio, coordination number, and mixing factor. While the DFT method scales as O(N3), where N is the number of electrons in the system size, our pattern learning method takes only 1 minute on a single CPU core irrespective of N and therefore can scale as O(1). Furthermore, our method provides an accuracy of 91~98 % compared to DFT calculations. This reveals that our learning method will be an alternative that can break the trade-off relationship between accuracy and speed that is well known in the field of electronic structure calculations.
5:00 PM - LN02.03.02
Number Density Descriptor on Extended-Connectivity Fingerprints Combined with Machine Learning Approaches for Predicting Polymer Properties
Takuya Minami1,Yoshishige Okuno1
Research Association of High-Throughput Design and Development for Advanced Functional Materials1
Show AbstractExtended-Connectivity Fingerprints (ECFPs) are refined to predict polymer properties.
Original ECFPs have been circular topological fingerprints designed for substructure and similarity search, as well as for structure-activity model, for finite molecules [1]. Indeed, ECFPs have been successfully applied to cheminformatics [2]. However, their applications to polymer informatics have been yet limited, though they are demanded in chemical industry.
In this study, we develop a new type of polymer descriptor based on ECFPs. Number densities, that is, the substructure numbers divided by the number of atoms in a polymer repeat unit, are employed. We found that this approach is superior in accurately predicting the properties of infinite linear polymers, compared to the conventional approach, where just the substructure numbers are used as descriptors. In addition, feature selection using Least Absolute Selection and Shrinkage Operator (LASSO) regression is found to improve prediction accuracy by eliminating insignificant variables. As a result, the novel descriptor based on ECFPs with machine learning approaches achieve accurate prediction comparable to the prediction of refractive index by ab-initio density functional theory for infinite linear polymer [3]. The results of other properties such as glass transition temperature are also discussed.
[1] Rogers, D., Hahn, M. J. Chem. Inf. Model. 2010, 50, 742-754.
[2] Duvenaudy, D., et. al., arXiv:1509.09292v2.
[3] Maekawa, S., Moorthi, K. J. Phys. Chem. B 2016, 120, 2507-2516.
5:00 PM - LN02.03.03
Artificial Neural Network for Prediction of Mechanical Properties of High Entropy Alloy
Wen-Jay Lee1,Chia-Yung Jui1,A. C. Yang1,Nan-Yow Chen1,E-Wen Huang2,Nien-Ti Tsou2,An-Chou Yeh3
National Applied Research Laboratories1,National Chiao Tung University2,National Tsing Hua University3
Show AbstractExploring novel materials is always the most important issue for product application in industry because it is difficult and cost consuming. Although the rapid development of computational physics and chemistry enables us to calculate the fundamental property of materials, it is still a big challenge to explore the appropriate and unknown materials with good properties for the application, out of thousands of candidates.Due to the achievement of Material genome initiative first launched by U.S. in 2011 and similar projects by other countries worldwide, several huge databases have been created recently. For example, AFlow(organic & inorganic material)[1], Material Project (organic & inorganic material)[2], Khazana (polymer genome)[3], and OQMD (inorganic composites)[4] provides the computational results of material properties by density function theory and machine learning. The huge database has led to the possibility of material design by applying material informatics. In this work, we have employed artificial neural network (ANN) with the database of Material Project to predict the mechanical properties (ie. Young’s modulus, shear modulus, elastic constant) of inorganic compounds. Elemental property (ie. Mass, row & group number in periodic table, atomic number, …), structural(crystal system), and composite (element fraction) features are considered as the descriptors in ANN. After the training and validation of the ANN model, it is validated that using the trained model with the dataset of the binary compound is able to predict the k(>2)-nary compounds. The model has been used to explore the candidate of complex high entropy alloys with high strength. The design of ANN model enables the researchers to screen the material with desired properties from vast compounds.
References
[1]Stefano Curtarolo et al., AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations, Computational Materials Science 58 (2012) 227-235 (http://aflowlib.org/)
[2] A. Jain et al., “The Materials Project: A materials genome approach to accelerating materials innovation” APL Materials 1, 011002 (2013) (https://www.materialsproject.org/)
[3] T. D. Huan et al., A polymer dataset for accelerated property prediction and design, Scientific Data 3 (2016) 160012 (http://khazana.uconn.edu/index.php?m=1)
[4] S. Kirklin et al., The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies, npj Computational Materials 1 (2015) 15010 (http://oqmd.org/)
Symposium Organizers
Benji Maruyama, Air Force Research Laboratory
Kristofer Reyes, University at Buffalo, State University of New York
Kristin Persson, Lawrence Berkeley National Laboratory
Aleksandra Vojvodic, University of Pennsylvania
LN02.04: Knowledge Extraction and Representation
Session Chairs
Wednesday AM, April 04, 2018
PCC West, 100 Level, Room 106 C
8:45 AM - LN02.04.01
Machine Learning for Atomistic Systems—Two Deep Learning Examples
Patrick Riley1
Google Research1
Show AbstractDeep learning has brought many significant successes and a lot of hype in many fields. Through two examples of novel architectures applied to chemical problems I'll illustrate a core idea of learnable feature representations. I'll cover the main ideas and empirical results of Message Passing Neural Networks (for graph data) and Tensor Field Networks (for naturally rotationally equivariant 3D tensor data). These examples should help materials researchers understand what they should, and should not be, excited about modern machine learning methods.
9:15 AM - LN02.04.02
Application of Machine Learning and Computer Vision Techniques for Accelerated Materials Development and Deployment
T. Yong Han1
Lawrence Livermore National Laboratory1
Show AbstractMaterials Informatics project at the Lawrence Livermore National Laboratory aims to accelerate materials discovery, optimization and scale-up processes by combining automated information extraction, machine learning, data analytics and experimental validations to pinpoint critical reaction parameters in a given synthesis to aid in the development of advanced materials that are highly desirable to the lab as well as to the society. In this regard, we are developing information ingest pipeline to take unstructured data (scientific literature) and generate structured knowledge database, that are machine readable, which will allow us to perform data analytics to discover and improve materials synthesis pathways and optimization processes. In this presentation, discussion on applying machine learning algorithms for information extraction from literature, as well as application of computer vision techniques to extract relevant information will be discussed. Development of such a tool in chemical sciences will significantly shorten the time for a researcher to canvas his/her field of research as well as identify key reaction steps and insights to materials synthesis and optimization by connecting multiple variables from multiple sources simultaneously.
9:30 AM - LN02.04.03
Artificial Intelligence in Chemical Database Auto-Generation Tools for Data-Driven Materials Discovery
Jacqueline Cole
Show AbstractLarge-scale data-mining workflows are increasingly able to predict successfully new materials that possess a targeted functionality [1]. The success of such materials discovery approaches is nonetheless contingent upon having the right database source to mine. This presentation shows how to auto-generate tailor-made databases to search for functional materials to meet the needs of a given device application.
The talk presents the 'chemistry-aware' open-source text- and table-mining software tool, ChemDataExtractor, that can extract large volumes of material-property data from the literature, using natural language processing, optical character recognition and machine learning capabilities [2]. Machine learning is then employed to populate any missing experimental data.
The role of this tool in accelerating materials discovery is illustrated.
[1] J. M. Cole K. S. Low, H. Ozoe, P. Stathi, C. Kitamura, H. Kurata, P. Rudolf, T. Kawase, “Data Mining with Molecular Design Rules Identifies New Class of Dyes for Dye-Sensitised Solar Cells” Phys. Chem. Chem. Phys. 48 (2014) 26684-90. (Communication).
[2] M. C. Swain, J. M. Cole, ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature,
J. Chem. Inf. Model., 2016, 56, 1894–1904
LN02.05: Feature Engineering and Modeling for Materials Systems
Session Chairs
Wednesday PM, April 04, 2018
PCC West, 100 Level, Room 106 C
10:15 AM - LN02.05.01
Soft Matter Design and Characterization in the Era of Machine Learning
Juan de Pablo1
University of Chicago1
Show AbstractThe advent of innovative molecular modeling algorithms, optimization strategies, and machine learning techniques is ushering a new era of materials science and engineering in which computational tools are routinely used to probe, design, and interrogate matter and functional materials systems. In this presentation I will illustrate some of these ideas in the context of a variety of examples taken from chemical engineering, physics, biology and materials science. In the first, I will discuss the simultaneous interpretation of scattering data from multiple sources by relying on molecular models. In the second I will present models of biological systems that use machine learning to integrate experimental and computational information form a wide range of sources. In the third, I will discuss how evolutionary optimization and machine learning can be used to create new mechanical metamaterials.
10:45 AM - LN02.05.02
Materials Informatics and Big Data—Realization of 4th Paradigm of Science in Materials Science
Ankit Agrawal1,Alok Choudhary1
Northwestern University1
Show AbstractIn this age of “big data”, large-scale experimental and simulation data is increasingly becoming available in all fields of science, and materials science is no exception to it. Our ability to collect and store this data has greatly surpassed our capability to analyze it, underscoring the emergence of the fourth paradigm of science, which is data-driven discovery. The need to use of advanced data science approaches in materials science is also recognized by the Materials Genome Initiative (MGI), further promoting the emerging field of materials informatics.
In this talk, I would present some of our recent works employing state-of-the-art data analytics for exploring processing-structure-property-performance (PSPP) linkages in materials, both in terms of forward models (e.g. predicting property for a given material) and inverse models (e.g. discovering materials that possess a desired property). Some examples include developing models for predicting fatigue strength of steel alloys, data-driven discovery of stable compounds, and microstructure optimization of a magnetostrictive Fe-Ga alloy. I will also demonstrate some online web-tools we have developed that deploy machine learning models to predict materials properties.
Such data-driven analytics can significantly accelerate prediction of material properties, which in turn can accelerate the optimization process and thus help realize the dream of rational materials design. The increasingly availability of materials databases along with groundbreaking advances in data science approaches offers lot of promise to successfully realize the goals of MGI, and aid in the discovery, design, and deployment of next-generation materials.
11:00 AM - LN02.05.03
Addressing Complexity in Computational Catalyst Design
Zachary Ulissi1
Carnegie Mellon University1
Show AbstractHeterogeneous catalysis is fundamental to chemical industry and consumes several percent of the entire global energy supply. Reducing this usage and enabling next-generation energy solutions such as direct conversion of CO2 to fuels requires the design of new catalysts with optimal activity, selectivity, and stability. Scientific computing advances have enabled electronic structure codes to aid in this design process but fundamental limitations make it unlikely that direct simulation of macroscopic catalysts will be possible. The huge design space can be reduced by recognizing similarities in materials (developing structural fingerprints) and adopting regression tools from the systems engineering or machine learning communities to provide useful surrogate models as a guide for full-accuracy calculations. I will present two examples: accelerating the reduction of large reaction networks in thermal catalysis, and automatic identification of active site motifs in intermetallic electrochemical catalysis. Finally, I will discuss ongoing work to enable on-line/active-learning processes to automatically discover new intermetallics of interest to guide the experimental discovery process.
11:15 AM - LN02.05.04
Practical Modelling with DFT Accuracy Using Machine-Learning in Application to Catalytic Activity and Ionic Diffusion
Ryoji Asahi1,Ryosuke Jinnouchi1,Kazutoshi Miwa1,Hiroshi Ohno1
Toyota Central R&D Labs1
Show AbstractIndustry has developed along with historical finding and dramatic improvement of functional materials. The typical examples can be found for automobile in exhaust gas purification catalysts, Li-ion batteries, magnet motors, and fuel cells. On the other hand, research and development of materials may take huge resources and long time. In order to accelerate to develop the materials on demand, we have developed machine learning algorithm with DFT accuracy that can access to high-throughput simulations for practical size of materials modelling. DFT data sets for simple models are stored in a database, which is used to predict energy and force in a practical model through similarity kernels, such as Gaussian or polynomial function of power spectrum.1, 2 The regression coefficients are determined to reproduce the DFT training data by using a Bayesian linear regression method. Applications to catalytic activity of nanoparticles3 and diffusion properties in solid-state ionic conductor2 demonstrate that the present data-driven method is promising to predict chemical reactions and transport properties, which are not easily determined only with DFT calculations, thus to design a variety of functional materials.
[1] Bartok, Kondor Csanyi, Phys. Rev. B 87, 184115 (2013).
[2] Miwa, Ohno, Phys. Rev. Mater. 1, 053801 (2017); Phys. Rev. B 94, 184109 (2016).
[3] Jinnouchi, Asahi, J. Phys. Chem. Lett. 8, 4279 (2017).