OpenCrystalData: An open-access particle image database to facilitate learning, experimentation, and development of image analysis models for crystallization processes.

Autor: Yash Barhate, Christopher Boyle, Hossein Salami, Wei-Lee Wu, Nina Taherimakhsousi, Charlie Rabinowitz, Andreas Bommarius, Javier Cardona, Zoltan K. Nagy, Ronald Rousseau, Martha Grover
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Digital Chemical Engineering, Vol 11, Iss , Pp 100150- (2024)
Druh dokumentu: article
ISSN: 2772-5081
DOI: 10.1016/j.dche.2024.100150
Popis: Imaging and image-based process analytical technologies (PAT) have revolutionized the design, development, and operation of crystallization processes, providing greater process understanding through the characterization of particle size, shape and crystallization mechanisms in real-time. The performance of corresponding PAT models, including machine learning/artificial intelligence (ML/AI)-based approaches, is highly reliant on the data quality used for training or validation. However, acquiring high quality data is often time consuming and a major roadblock in developing image analysis models for crystallization processes.To address the lack of diverse, high-quality, and publicly available particle image datasets, this paper presents an initiative to create an open-access crystallization-related image database: OpenCrystalData (OCD, at www.kaggle.com/opencrystaldata/datasets). The datasets consist of images from different crystallization systems with different particle sizes and shapes captured under various conditions. The initial release consists of four different datasets, addressing the estimation of particle size distribution using in-situ images for different categories of particles and detection of anomalous particles for process monitoring purposes. Images are collected using various instruments, followed by case-specific processing steps, such as ground-truth labeling and particle size characterization using offline microscopy. Datasets are released on the online collaborative platform Kaggle, along with specific guidelines for each dataset. These datasets are aimed to serve as a resource for researchers to enable learning, experimentation, development, and evaluation and comparison of different analytical approaches and algorithms. Another goal of this initiative is to encourage researchers to contribute new datasets focusing on various systems and problem statements. Ultimately, OpenCrystalData is intended to facilitate and inspire new developments in imaging-based PAT for crystallization processes, encouraging a shift from time-consuming offline analysis towards comprehensive real-time process insights that drive product quality.
Databáze: Directory of Open Access Journals