| Abstract | The demand for plant-based milk alternatives (PBMA) has increased substantially, especially among consumers allergic and/or intolerant to animal dairy products and consumers attentive to environmental sustainability. Concurrent with market expansion and higher production costs, fraudulent activities involving PBMA are of great concern. In order to validate authenticity of PBMA products, a headspace solid-phase microextraction gas chromatography mass spectrometry method (HS-SPME-GC-MS) was developed and optimized to differentiate 8 types of PBMA (i.e., almonds, cashews, hazelnuts, walnuts, oats, peanuts, pistachios, and macadamias) on the basis of their volatile metabolic profile (i.e., volatilome). A total of 80 samples (i.e., 10 replicates for each type of PBMA) were analyzed using HS-SPME-GC-MS and subjected to data preprocessing and classification model construction using machine learning algorithms. Approximately 143 volatile compounds were identified based on the MS-DIAL database (Version: 4.9.221218). Three machine learning algorithms were tested and among them, Support Vector Machine (SVM) achieved the best performance (100 % and 98.8 % accuracy for calibration and for cross-validation), followed by Random Forest (RF, 100 % and 94.3 %), and k-Nearest Neighbor (kNN, 98.8 % and 88.8 %). To further validate robustness, additional 32 samples (i.e., 4 biological replicates for each type of PBMA) were prepared, analyzed and identified with these models. SVM achieved an accuracy of 100 %, followed by RF (96.9 %) and kNN (90.6 %). RF yielded comparable accuracy with respect to SVM, but offered further information about features contributing substantially to classification. Hence, RF led to the identification of the top 30 most relevant volatile metabolites. A simplified RF model, constructed using only these 30 features, achieved a calibration accuracy of 100 %, cross-validation accuracy of 96.5 %, and validation accuracy of 96.9 %, indicating a great potential for these 30 metabolic features to be used as markers for (targeted) authentication. Harnessing the power of the non-targeted HS-SPME-GC-MS and machine learning, a highly accurate and reliable workflow for the authentication of PBMA was established. This method is reliable for the authentication of PBMA, ensures the integrity of the products, and can protect the health of consumers and the economy of this emerging area. |
|---|