Improving Data and Prediction Quality of High-Throughput Perovskite Synthesis with Model Fusion

Autor:	Alexander J. Norquist, D. Frank Hsu, Hamed Eramian, Zhi Li, Emory M. Chan, Joshua Schrier, Mansoor Ani Najeeb Nellikkal, Yuanqing Tang
Rok vydání:	2021
Předmět:	Support Vector Machine Computer science General Chemical Engineering Medicinal & Biomolecular Chemistry Library and Information Sciences computer.software_genre 01 natural sciences Machine Learning Medicinal and Biomolecular Chemistry Theoretical and Computational Chemistry 0103 physical sciences Classifier (linguistics) Throughput (business) Characteristic function (convex analysis) Titanium 010304 chemical physics Percentage point Oxides Computation Theory and Mathematics General Chemistry Calcium Compounds 0104 chemical sciences Computer Science Applications Random forest Data set Support vector machine 010404 medicinal & biomolecular chemistry Data quality Data mining computer
Zdroj:	Journal of chemical information and modeling, vol 61, iss 4
Popis:	Combinatorial fusion analysis (CFA) is an approach for combining multiple scoring systems using the rank-score characteristic function and cognitive diversity measure. One example is to combine diverse machine learning models to achieve better prediction quality. In this work, we apply CFA to the synthesis of metal halide perovskites containing organic ammonium cations via inverse temperature crystallization. Using a data set generated by high-throughput experimentation, four individual models (support vector machines, random forests, weighted logistic classifier, and gradient boosted trees) were developed. We characterize each of these scoring systems and explore 66 possible combinations of the models. When measured by the precision on predicting crystal formation, the majority of the combination models improves the individual model results. The best combination models outperform the best individual models by 3.9 percentage points in precision. In addition to improving prediction quality, we demonstrate how the fusion models can be used to identify mislabeled input data and address issues of data quality. In particular, we identify example cases where all single models and all fusion models do not give the correct prediction. Experimental replication of these syntheses reveals that these compositions are sensitive to modest temperature variations across the different locations of the heating element that can hinder or enhance the crystallization process. In summary, we demonstrate that model fusion using CFA can not only identify a previously unconsidered influence on reaction outcome but also be used as a form of quality control for high-throughput experimentation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7145ea93f3ef948fe8c52bf2a3f1e362 https://escholarship.org/uc/item/45z0s3dp Zobrazit plný text záznamu