Impact of data bin size on the classification of diesel fuels using comprehensive two-dimensional gas chromatography with principal component analysis
Autor: | Derrick V. Gough, Sarah E. Prebihalo, Robert E. Synovec, Paige E. Sudol |
---|---|
Rok vydání: | 2019 |
Předmět: |
Chemistry
010401 analytical chemistry Sample (statistics) 02 engineering and technology 021001 nanoscience & nanotechnology 01 natural sciences Column (database) Bin Plot (graphics) 0104 chemical sciences Analytical Chemistry k-nearest neighbors algorithm Data binning Metric (mathematics) Principal component analysis Statistics 0210 nano-technology |
Zdroj: | Talanta. 206 |
ISSN: | 1873-3573 |
Popis: | Principal component analysis (PCA) is a widely applied chemometric tool for classifying samples using comprehensive two-dimensional (2D) gas chromatography (GC × GC) separation data. Classification via PCA can be improved by 2D binning of the data. A “standard operating procedure (SOP) bin size” is often applied to improve the S/N and to mitigate potential retention time misalignment issues. The SOP bin size is generally selected to be slightly larger than the typical 2D peak dimensions. In this study we examine to what extent a single SOP bin size is optimal for all of the class comparisons that can be made in a single PCA scores plot. For this purpose, a GC × GC-FID dataset comprised of 5 different diesel fuels (i.e., 5 sample classes), each run with 4 replicates using a reverse column configuration (polar 1D column and non-polar 2D column) was utilized. The dataset was collected within about one day, which minimized retention time misalignment in order to allow the study to focus on S/N enhancement concurrent with maintaining the chemical selectivity provided by the GC × GC separations. A total of 110 bin sizes were evaluated. Degree-of-class separation (DCS) was utilized as a quantitative metric to assess the impact of binning in improving separation in the scores plot. The DCS was calculated pair-wise between nearest neighbor sample classes for each of the 5 sample classes in the scores plot (5 sample class pairs). Results indicated the SOP bin size did not provide the highest DCS for any of the 5 fuel pairs. Each fuel pair is found to have its own optimal bin size, suggesting the binning finds the balance between S/N optimization concurrent with leveraging the chemical selectivity information differences in the samples as manifested in their GC × GC separation “patterns”. Robustness of the findings in this study were supported by leaving out one fuel at a time and re-running the PCA models. |
Databáze: | OpenAIRE |
Externí odkaz: |