Study Participants
The research protocol for this study was approved by the Institutional Review Board of the University of Louisville (IRB number 20.1154). All research was conducted in accordance with the Declaration of Helsinki and relevant guidelines/regulations of the IRB. Informed consent was obtained from all participants. Participants were enrolled from the Travel Clinic of the Division of Infectious Diseases of the University of Louisville and University of Louisville Health Hospitals in Louisville, Kentucky. The Travel Clinic offered COVID-19 PCR testing required before international travel, testing for employees of local businesses who required a negative test result before returning to their workplace, and for patients who required a negative PCR test before an out-patient surgical procedure. Most participants recruited from the Travel Clinic showed no symptoms of COVID-19 infection and most participants were COVID-19 negative by PCR testing. Subjects recruited from hospitals were mostly patients with mild COVID-19 symptoms as well as subjects with trauma and incidental SARS-CoV-2 positive tests. Written informed consent was obtained from each participant. All participants were tested for SARS-CoV-2 using RT-PCR from nasopharyngeal swab samples. Adult patients aged 18 years or older were recruited for the study. Both symptomatic and asymptomatic subjects were included. COVID-19 negative subjects were recruited from a travel clinic.
Breath sample collection and processing
A new silicon microreactor (Figure S1) was used to capture carbonyl compounds in breath, and then the captured compounds were analyzed by ultra high-performance liquid chromatography-mass spectrometry (UHPLC-MS). Breath samples were collected in 1 liter Tedlar bags (Sigma-Aldrich, St. Louis, MO) based on our previous study28,29The silicon microreactor was fabricated using microelectromechanical system (MEMS) technology, and the device is designed to analyze carbonyl compounds in breath.28Subjects were instructed to breathe directly into the Tedlar bag through a mouthpiece attached to the bag. A 1 liter breath sample of a mixture of tidal and alveolar breath was collected. After collection, the mouthpiece was disconnected, disinfected and then disposed of. The Tedlar bag was sealed with the attached valve and placed in a biohazard bag inside a cooler at 4°C before being transported to a Biosafety Level 2 laboratory (BSL-2) for processing and analysis. A nasopharyngeal swab sample was also collected for RT-PCR to test for SARS-CoV-2.
Between March and December 2021, a group of subjects with an age range of 18-82 years were recruited for the study. In Louisville, Kentucky, the Alpha variant of SARS-CoV-2 was reported to be dominant by the City Health Office during the study period between March and June 2021, so subjects recruited during that period of COVID-19 were attributed to the Alpha wave. The Delta variant was dominant between July and December 202130Thus, subjects admitted during that period of COVID-19 were attributed to the delta wave.
All breath samples were transferred to the BSL-2 laboratory in the Division of Infectious Diseases Laboratory at the University of Louisville within 2 h of collection for processing. Breath samples were left at ambient temperature for 5 min and then extracted through a silicon microreactor at a flow rate of 7 mL/min to achieve more than 90% capture efficiency of carbonyl compounds. The silicon microreactor contains thousands of triangular micropillars as shown in Figure S1 (Supporting Information). The fabrication of the silicon microreactor is described in a recent publication28The surfaces of channels and micropillars in the microreactors are functionalized with 2-(aminooxy)ethyl-N,N,N-trimethylammonium triflate (ATM) to capture aldehydes and ketones via oxidation reactions. The Tedlar bag was connected to the silicon microreactor via an inert silica tube. To avoid contamination, breath samples were extracted from the Tedlar bag through the microreactor, then through a HEPA filter, and finally through 75% alcohol in water impingement before being air inhaled in a BSL-2 hood. A detailed description of the silicon microreactor and the processing of breath samples was reported elsewhere28,
UHPLC-MS analysis
After the breath sample in the Tedlar bag was completely evacuated through the microreactor, the ATM reaction adduct was extracted from the microreactor using 200 µL methanol. ATM-Acetone-D6 adduct (5 × 10–9 mol) was added to the extracted samples as an internal reference (IR). Then, the samples were diluted by a factor of 10 with water for analysis. After processing, all materials, including tubes and Tedlar bags, were disinfected according to the laboratory standard procedure for biohazardous waste disposal. The samples were analyzed using a Thermo Scientific UHPLC-MS system equipped with an automated sampler, a Vanquish UHPLC, and a Q Exactive Focus Orbitrap mass spectrometer (MS). The UHPLC had an Acquity BEH phenyl column (2.1 mm × 100 mm, 1.7 μm, Waters, MA, USA) to separate the atm-carbonyl adducts. Mobile phase A was 0.1% formic acid in water, and mobile phase B was acetonitrile. The mass spectrometer was operated in positive electron spray ionization (ESI) mode with a spray voltage of 3.5 kV. Nitrogen was used as sheath, auxiliary and sweep gas at flow rates of 49, 12 and 2 (arbitrary units), respectively. Full MS mode with a mass range (m/z) from 50 to 500 with a resolution of 70,000 was used to process breath samples. For MS/MS analysis, the parallel reaction monitoring (PRM) method by MS was used. Chromatographic separation conditions were determined through a gradient elution program28The total chromatographic runtime was 11 min. A total of 34 carbonyl compounds were detected for all breath samples and compound concentrations were calculated by comparing each compound peak area with the IR in each breath sample. UHPLC-MS chromatograms, including saturated ketones and aldehydes, hydroxy-aldehydes, unsaturated 2-alkenals, and 4-hydroxy-2-alkenals.28A total of 56 features, including 34 carbonyl compound concentrations and 22 derived features of compound ratios and sums, were used for statistical analysis (Table S1), including the sum of formaldehyde, acetaldehyde, and acetone, the sum of all other carbonyl compounds (OT), and the ratio of acetone to butanone. Data acquisition and processing were performed using Thermo Scientific Excalibur version 4.4. To identify the chemical structure of most of the detected carbonyl compounds, ATM adduct standards were synthesized in-house and used for comparison of retention times and MS/MS spectra.28,
Data and statistical analysis
There are several classification methods, including generalized partial least squares, support vector machines, random forests, and logistic regression models, that classify patients into disease and control groups based on breath analysis data.31Prediction (classification) methods involve structured categorical outcomes and multiple structured or unstructured covariates32,33There is no model suitable for every situation. Therefore, it is important to identify a good model that takes into account sequential structured covariates for prediction. In addition, further progress is needed for proper identification of key carbonyl compounds through statistical and machine learning techniques.
In the analysis of a typical breath sample, molecular concentration data on several hundred endogenous and exogenous VOCs are usually obtained. For the detected VOCs, it may not be necessary to use all VOCs for patient classification or prediction model building process (i.e., training machine learning models and later using them for class label predictions). Therefore, it is reasonable to select/identify some metabolic VOCs related to COVID-19 as key features for COVID-19 detection. The selection of key features (here metabolic VOCs) from multiple VOCs is called feature selection in machine learning.34In addition, it is necessary to determine the number of important VOCs (e.g., feature size or dimension of VOC data) that can be used in training classification models to predict the class type of COVID-19 patients. The selection of important VOCs saves time for all VOCs present in breath samples. Thus, researchers can focus on a few VOCs instead of generating data on all VOCs present in breath samples of patients.
The data were first normalized using the logarithm (log)2) method was used and then t-test was used for continuous variables and chi-square test was used for categorical variables35A p-value less than 0.05 defines a statistically significant difference at a 95% confidence interval. All calculations were performed with SAS statistical software36A logistic regression model was employed for both univariate and multivariate regression. After logarithmic and quantile methods to normalize the data, it is no longer non-linear. The multivariate logistic prediction model is the most robust, especially when there are fewer covariates32Model performance was evaluated by receiver operator characteristic (ROC) curve with area under the ROC curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Boxplots were used to visualize the difference between COVID-19 positive and negative groups. A random segment of approximately 67% of samples was used for the training dataset and 33% of samples for the testing dataset for all logistic regression models.