Publication date: 29 november 2021
University: Universiteit Maastricht
ISBN: 978-94-6423-558-6

Volatile Organic Compounds Analysis

Summary

Summary and general discussion

This thesis is focused on the untargeted analysis of volatile organic compounds (VOCs) in the context of gastrointestinal health and disease. Breath analysis, with its non-invasive nature, is an attractive prospect for diseases diagnosis and monitoring. It is widely assumed that VOCs in breath provide a reflection of a metabolic state of the organism but understanding and capturing that reflection with pinpointing its source is still an incredibly challenging task. In addition, results published in the literature are based on small patient numbers that impacts statistical validity of the results and where it is virtually impossible to study confounding factors [1]. Chapter 2 of this thesis is looking at the general influence of endogenous and exogenous factors on exhaled breath composition to gain better understanding which potential confounding factors must be accounted for when studying exhaled VOCs. Thanks to support of Top Institute Food and Nutrition, we had an opportunity to study confounding factors in breath samples from the 1417 well-characterised participants from the LifeLine (LL) population cohort in the Netherlands: i.e., age, BMI, smoking, blood cell count, several metabolic parameters, and a group of 16 recorded medications. The results showed that smoking impacted the VOCs profile most significantly.

Age, gender and BMI influence an individual’s metabolism. Although they impact breath composition to lesser extent than smoking, they should also be considered in study designs. In our study, no evidence was found that cholesterol or triglyceride levels influenced overall VOCs profile. The immune response is expected to influence VOCs, but surprisingly no significant differences were detected related to the number of various types of white blood cells. This latter may be explained by the lack of statistical power, as in this general population cohort the number of people with increased neutrophil counts was small (n=11). Medications can affect both host and/or microbial metabolism and as such were expected to influence VOCs profiles [2]. Within the available cohort set up we were only able to compare VOCs profiles of women taking chemical contraceptives vs. those not taking them. However, no significant impact on VOCs was noted. For all other medications, because we were not able to compare individuals with a certain disease taking medication versus individuals with the same disease not taking the medications, it was impossible to distinguish disease effect from treatment effect on VOCs. It is important to note that medication may affect the breath profile and this information should therefore be considered in breath analysis studies [1]. The effect of external factors, i.e., age, gender, smoking, presence of anxiety and depression was also included in Chapter 5, where the use of VOC as a potential diagnostic biomarker for IBS was investigated. No statistically significant correlations were observed, indicating they were not confounding the separating ability of the VOCs profile for diagnosis of IBS.

Chapter 3 and Chapter 4 focus on exploration if and how exhaled breath was affected by dietary changes in healthy individuals. First, a set of 12 volatile compounds distinguished samples obtained during a gluten-free diet from those obtained during a normal diet (Chapter 3). Our findings indicate that a gluten-free diet had a reversible impact on participants’ excreted metabolites visible in breath. Nine weeks after ending the gluten-free diet, the VOC profile returned to the one measured prior to intervention. In the research of Palma et al., gluten-free diet appeared to be associated with reduction of polysaccharide intake and since undigested carbohydrates are considered as a main source of energy for commensal microbiota, it might explain the drift of the gut microbial composition [3]. This in turn is expected to be associated with a change of volatile metabolites produced by bacterial communities. Those metabolites when absorbed into the bloodstream, can be further metabolized, and finally be excreted in urine or breath [4, 5]. An interesting observation was that even though the change in VOCs caused by diet followed the same pattern, the dynamic of this change differed between individuals. Heinzmann et.al., in research on stability and robustness of human metabolic phenotypes in response to food challenges, highlights the importance of understanding impact of individual differences in metabolic baseline and in response to dietary interventions. In their study, some individuals displayed greater stability of metabolic profile than others [6]. Therefore, it is possible that differences in dynamic of changes visible in our study could relate to diverse individual metabolic phenotypes.

Secondly, in Chapter 4, we investigated whether breath analysis can be indicative for altered metabolism of two studied infant formulas, where the only difference was a droplet size. In a double-blind, cross-over design study, 29 healthy, non-smoking adult males, consumed two different milk formulas and delivered exhaled breath samples at various time points. Results showed significant differences in exhaled breath composition between the two formulas 240 minutes after ingestion, which corresponds to the moment of food entering the small intestine. We speculated that this (in part) may be related to differences in the digestion, absorption, or metabolism of nutrients and/or differences in GI motility [7, 8]. Here also a role of microbial activity could be considered, but this was not investigated in that study. No significant changes were observed at earlier time points. Although the exact origin of the discriminating compounds was not confirmed and pathophysiological consequences are not clear, the findings do show the potential of VOC analyses in fields of nutrition and metabolism.

Furthermore, monitoring the effect of the diet by exhaled breath composition has potential for example to aid checking dietary compliance or to discover different digestion or absorption patterns. Nowadays, there are several simple commercial breath tests available that aid diagnosis of metabolic disturbances. These include for example measurements of carbohydrate malabsorption or detection of small intestine bacterial overgrowth (SIBO). Enzyme activities and organ functions can also be assessed by using stable isotope-labelled probes, such as 13C-labelled urea for the diagnosis of the gastric bacterium Helicobacter pylori. An approach based on the use of exogenous VOCs probes (EVOC) as a potential strategy to measure activity of metabolic enzymes in vivo and by that aid development of breath based diagnostic and prognostic tests, was recently proposed by Owlstone Medical. Exogenous volatiles used as a probe allow direct monitoring of the substrate (probe itself) and products of its metabolism. The recent study by Ferrandino et al. has successfully demonstrated the use of exhaled breath limonene as a biomarker for liver cirrhosis [9]. Identifying specific conditions and diseases that can be targeted via this strategy opens a door for more targeted breath applications. It was suggested that any exogenous VOC that is metabolised by the human body can offer opportunity to assess metabolic enzyme or organ function [10]. Identifying specific conditions and diseases that can be targeted via this strategy opens a door for more targeted breath application.

The diagnostic and monitoring potential of VOCs analysis has been described in Chapter 5 and 6. First, we determined a discriminating VOCs profile between clinically confirmed IBS patients, obtained from the clinical Maastricht IBS (MIBS) Cohort, and GI healthy controls. Next, we determined how this specific VOC pattern correlated with GI symptoms in the MIBS and in LL DEEP general population cohort (Chapter 5). A set of 16 VOCs correctly predicted 89.4% of the IBS patients and 73.3% of the healthy controls (AUC=0.83). To our knowledge, this was the first time that a set of VOCs in exhaled air was able to predict the presence of a prevalent functional GI disorder, which could be considered an important first step forward in the design and development of reliable non-invasive biomarkers for IBS. Random Forest analysis was validated twice to ensure the reliability of the classification and correct selection of the set of discriminatory VOCs. One of the big advantages of the study was the relatively high number of well-characterised people included. As the diagnosis of IBS is based on symptoms (as defined by the Rome III criteria), our second aim was to test whether our VOC biomarker set would also correlate with the severity of GI symptoms. Here results showed that the VOC-biomarker set correlated moderately with a set of GI symptoms in the MIBS (r=0.55, p=0.0003). Since functional GI symptoms observed in IBS patients are also rather common in the general population, we then confirmed that the VOC biomarker showed a moderate but significant correlation with a set of GI symptoms in the general population (r=0.54, p=0.0004) [11]. Correlation with symptoms shows promising potential of VOCs analysis in evaluation of treatments effect in IBS. As lifestyle factors, co-morbid diseases and mediation used are associated with IBS, potential confounding factors should be included in the analysis as also found in Chapter 2. Kruskal-Wallis test showed however no influence from possible confounding factors in distinguishing IBS patients from healthy controls. We have speculated that both increase in inflammation and oxidative stress as well as microbiota changes could be highly related to the discriminatory power of the set. Further in-depth research is however still needed to investigate the origin of the discriminatory VOCs, to confirm their identity, and to investigate the potential link to the underlying causes of IBS. Furthermore, to move from the reported potential closer to clinical applications, the results should be validated in an external cohort.

In Chapter 6, the diagnostic value of breath VOCs to monitor mucosal inflammation in inflammatory bowel disease (IBD) was investigated. In 2005, Lechner et al. [12] already demonstrated the feasibility of exhaled air as a novel diagnostic tool in the differential diagnosis of GI diseases but used a limited number of IBD patients (n=10) and did not identify the significant compounds. At that time, to our knowledge, it was the first large prospective study evaluating the role of VOCs profiles in exhaled air in relation to disease activity of Crohn’s Disease (CD) patients [13]. 140 samples originated from an active disease stage, based on fecal calprotectin (FC >250 µg/g) and 135 samples from inactive disease stage (with a clinical HBI score <4 and serum CRP<5mg/l and FC<100µg/g). A third group consisted of samples from 110 healthy controls. A set of 10 discriminatory VOCs correctly predicted active CD in 81.5% and remission in 86.4% (sensitivity 0.81, specificity 0.80, AUC 0.80). Among tentatively identified discriminatory compounds, enhanced levels of alkanes, methylated alkanes, aldehydes were found, which may be connected to increased inflammation. Several studies show that oxygen mediated injury through increased free radical production and impaired antioxidant defense systems plays an important role in the pathophysiology of IBD [14, 15]. Hydrocarbons as the end products of lipid peroxidation show low solubility in blood and are quickly excreted into breath after formation and can be used to monitor the degree of oxidative damage [16]. Hydrocarbons and aldehydes were previously reported to be produced also by intestinal microbiota [17]. In recent work by Smolinska et al., a strong correlation between volatiles in breath and the fecal microbiome was shown in CD patients [18]. It was speculated that this may be due to anabolism/catabolism of volatile metabolites by microbes and/or stimulation/inhibition of microbial growth by metabolites. Volatile metabolites such as short chain fatty acids (SCFAs; butyrate, propionate, acetate) and alcohols (ethanol and propanol) are products of bacterial fermentation mainly from non-digestible oligo- and polysaccharide [19, 20]. SFCAs have been shown to have anti-inflammatory and anti-carcinogenic effects [21]. Additionally, acetone and isoprene came up as discriminatory compounds when comparing healthy controls versus active CD or CD in remission (Chapter 6). Both isoprene and acetone were present in high abundances in every breath sample, but they were both found to be part of discriminating sets of compounds in multitude of conditions [22-29]. As they are the result of ‘common’ biochemical processes in the body, it remains the question whether they ever will be specific enough to distinguish one condition from another. Furthermore, these compounds change considerably unrelated to the diseased conditions and are quite variably exhaled: their breath levels were reported to vary due to the inhalation from ambient air, diet, fasting, resting and exercise and even circadian rhythms [23, 30-34]. These uncertainties, among others are reasons why mentioned VOCs despite reported discriminatory powers have not yet found the way to clinical applications. Inflammation, oxidative stress, metabolic changes caused by normal and pathological processes as well as microbiome perturbations, are all factors simultaneously reported to be involved in a multitude of conditions. What one can observe in breath profiles may collectively be impacted by these factors. However, it is not yet fully understood how specific VOCs or combinations thereof can be related to shared underlying pathophysiological mechanisms rather than the specific disease condition per se. Furthermore, more validation is needed, not only to study the prospective biomarker sets in new external cohorts, but also to study them against other diseases. While the statistical significance or discriminatory power of exhaled breath is crucial for development of a test, it is also crucial to understand the biological origin of VOCs to study specificity of its production and to aid interpretation of results. In Chapter 7, we showed the validation of a system that would allow measuring VOCs in headspace of an in vitro cultured cell line. Considering interest in GI disorders, we have chosen the Caco-2 human, epithelial cell line that has been wildly used as a model for among others gut barrier function. The in vitro system allows for the application of different cell lines, as well as different experimental setups including varying exposure times and treatment options while preserving cell viability. High reproducibility of the collection system was confirmed by checking correlations (p≤0.0001) between replicate samples. When studying the influence of oxidative stress on the VOCs composition, a total of 10 VOC’s showed either increased or decreased abundance in the headspace of the cell cultures due to the presence of the H2O2 stressor. An advantage of the developed system is that the released compounds accumulate and can be detected without affecting cell viability. Further, our study design ensured relatively high number of replicate experiments (n≥20) and inclusion of appropriate controls. Studying the relation between certain biochemical pathways and the excretion of VOCs in vitro, has the benefit of an enhanced level of control, which is not as simple in in vivo studies. In vitro, we can for example isolate much easier the matrix effect by looking at the volatilome produced by cells with medium, by medium itself, by medium exposed to a trigger (e.g., H2O2) and by medium exposed with cells. Thereby, we can exactly pinpoint where discriminatory compounds are coming from and avoid drawing wrongful conclusions. In Chapter 7, we investigated cells exposure to hydrogen peroxide but in similar manner we could study genetically altered metabolic pathways, cells co-cultures (e.g., with macrophages), exposures to other agents – either oxidative stress inducing and/or suppressing anti-inflammatory properties, or VOCs produces/consumed by pathogens. We presented a potential and a concept but to fully benefit from in vitro experiments it is important to implement standards to confirm the identified compounds and to further study potential biological origin of the changes observed. Challenges, choices, and other dilemmas in Breath Analysis Different analytical and methodological approaches in VOCs analysis There are many different approaches to collect, measure and analyse VOCs, both within the field of breath and headspace analysis. Before designing any breath analysis study several decisions must be made: • Which fraction of the breath should be collected (e.g., upper airways air, late expiratory air, alveolar air and mixed air)? • What is the volume needed (e.g., 2/3 of the total collect, the last 150 ml, 500 ml)? • How to collect the breath (e.g., Bags, Syringes, Bio-VOC sampler, ReCIVA)? • What VOCs are we expecting to capture (e.g., entire spectrum for untargeted analysis, specific subset of volatiles)? What is the spectrum of volatility and polarity? • What types of sorbent to use to trap the volatile compounds (e.g., Tenax TA&GR, Carbograph, Carboxen)? • Under what conditions should the adsorption tubes be stored before chemical analysis and for how long? • What analytical equipment and method parameters would allow detection of these compounds (e.g., GC-MS, IMS, PTR-MS, SIFT-MS)? • How to control instrumental variation and introduce Quality Controls? • How to perform data pre-processing? • What machine learning tools are most appropriate for the posed question? • How to obtain identification of the unknown compounds? First, more information will be given in relation to breath collections dilemmas, choices around breath preconcentrating, breath measurement and analysis with some advantages and disadvantages discussed. Then, the subject of standardisation will be discussed and its meaning to VOCs research. Breath collection Several air compartments of the lungs can be collected, i.e., upper airways air (commonly known as dead space air), late expiratory air, alveolar air and mixed air. Collection of dead space air, accounts for around 150 ml of air and has a high resemblance to inhaled air and therefore is least informative about endogenously produced volatiles [35]. Collection of late expiratory air is defined in the least detail and there is no standard practice in place [36]. Due to that, different volumes of initial breath are discarded by different research groups, varying from the first few seconds of exhalation [37, 38], removal of one-third [39], half of each sample [40], or use of specific final volume (such as 100 [41] or 150ml [42]). The actual volume of dead space can be calculated based on the tidal volume of an individual and will vary due to bronchoconstriction, as well as different breathing patterns [43, 44]. Simply discarding one part of a breath volume, for all individuals, based on averaged out theoretical calculations is likely to bring an error. Alveolar air collection [45-47], is biologically, the most relevant one as this relates to partition, where gas exchange takes place. It will contain the most relevant VOCs, originating from the body via systemic circulation. For a reliable alveolar air collection, simultaneous monitoring with capnography is highly recommended [48]. End-tidal CO2 concentrations fluctuate during the different phases of breathing [36]. Relatively low levels of CO2 are present in Phase I, representing dead space air, which rise during the transitional Phase II (between dead space air and alveolar air) and finally reach a plateau, signaling the start of alveolar Phase III. With a CO2 sensor, air of Phase III can reliably be collected. Last, but not least, is the collection of total mixed air [11, 49], consisting of both dead space air and alveolar air. The one obvious advantage of that method is its simplicity. Less complex approaches often prove to bring less error and as such, especially in large scale studies, are worth considering. What needs to be noted is that mixed air will contain more environmental contamination from dead space air, and from nose and mouth. These need to be considered when planning collection, measuring samples and later in statistical analysis. Figure 2. Schematic representation of air compartments and in relation to capnograph, breath phases and air compartments. Modified from Miekisch et.al.,2008 [50] Sampling Devices There are many breath collection containers and devices available in the scientific community. Most often used are polymer bags including Tedlar [51-53] and Mylar bags [54]. Marks International has released Bio-VOC sampler [42, 55] and Menssana Research released breath collection apparatus BCA [56] both for a late expiratory air collection. Some research teams use glad vials and gas-tight syringes [57-59]. The ideal collection container should be durable, cost and user friendly, chemically inert, and non-penetrable for environmental VOC contaminations nor for leakage of breath VOCs. In our research group, we have adapted the mixed breath collection by use of 3-5 L Tedlar bags for adult study participants and 1L bags for children. Tedlar bags are made from polyvinyl fluoride (PVF), being inert to a wide range of compounds and having a relatively good resistance to gas permeation. In previous studies, we have shown that the contribution of dead space air does not impact the sensitivity of measuring VOCs by gas chromatography time-of-flight mass spectrometry (GC-tof-MS) [35, 53]. This approach was done with the assumption that even though dead space air may introduce contamination including environmental background compounds, as well as originating from mouth and nose, their distribution will be spread among the study population and as such it should not introduce significant differences to studied groups. The ReCIVA Breath Sampler available from Owlstone Medical, was developed in collaboration with multidisciplinary leaders in the breath research that joined forces to tackle robustness and reliability issues around breath sampling devices. Today ReCIVA is used in over hundred academic and clinical research sites worldwide and is a great choice for reliable and user-friendly collection. Due to the incorporation of pressure and CO2 monitoring, it allows collection of VOCs from the exact compartments, including alveolar air. The supply of clean air additionally reduces the exogenous background in a sample. With this type of collection, the exhaled VOCs are directly transferred to the thermal desorption tube. With other methods that transfer is the subsequent, manually performed step in the sampling procedure. Prolonged periods between collection in the bag or other container and preconcentration of VOCs onto the tube, may lead to some loss of VOCs. At present, ReCIVA is the collection method of choice and is being used in several breath studies carried out by also Maastricht research team. Breath preconcentration After collection, pre-concentration of the VOC sample is needed to detect VOCs present in very low concentrations, which is the case for majority of studies. This can be done with the use of thermal desorption (TD) tubes, a needle trap device (NTDs), or solid phase micro extraction (SPME). In their review, Lawal et al. showed that TD tubes are used in nearly half of all the studies published [36]. While Carboxen packed tubes are suitable for trapping highly volatile organic compounds (~C2–C4), Tenax sorbents might be a better choice to trap the less volatile VOCs in breath (~C7–C15) [60]. Volatility of the compounds collected, their polarity, sample volume and the humidity should all be considered when the choice is being made. Nowadays, one can compare and choose the absorbent or construct a multi-bed tube for a specific range of analytes across various sample volumes. In the procedure developed by our research team, after collection of breath in the Tedlar Bag, stainless-steel two-bed sorption tubes, filled with carbograph 1TD/Carbo-pack X (Markes International, Llantrisant, UK) was connected with one end to the bag itself and with another end to the vacuum pump. The content of the bag is ‘pulled’ inside the tube. The adsorption tube can then be measured or stored for further analysis. At the time the research in this thesis was performed, samples were stored at the room temperature. Since then, a study on the stability of 74 exhaled breath compounds suggested that analyzing samples by day 14 would minimalize a potential 1-2 standard deviations gain or loss in VOC concentrations [61]. When comparing three storing temperatures (4°C, 21°C and 37°C), 4°C was found to be optimal for compound stability [61]. Kang et al. have examined the stability of compounds stored on dual bed Tenax TA: Carbograph during 12,5 months at -80°C. Results showed that maximum storage duration under these conditions is 1.5 month with 94% of the VOCs being stable [62]. More research including a wider range of compound evaluated should be performed to confirm the preferable conditions for storage of thermal desorption tubes. It is also important to understand optimum purging requirement to remove water prior to storage and if/how this is affected by different storing temperatures. Breath Measurements GC-MS [53, 63, 64] is currently considered a gold standard technique for determination of VOCs in breath, allowing detection of a wide range of compounds, complete profile recognition and single VOC identification. Identified VOCs are easier to compare across the studies and can be related to underlying disease processes. Detailed parameters and description of our approach is mentioned throughout the chapters of this thesis and have been described more extensively by Van Berkel et al., [53]. GC-MS delivers high level of reproducibility [63, 65] and highly accurate chromatograms and it is especially valuable for analysis where the aim is to discover, identify and quantify compounds of interest. It plays a vital and critical role in providing evidence needed to support clinical relevance of potential biomarkers before those can be considered for direct, online testing in the clinic. For that, GC-MS devices are no longer suitable. Proton Transfer Reaction Mass Spectrometry (PTR-MS) [66], and Selected-Ion Flow Tube Mass Spectrometry (SIFT-MS) [67] can be used for real time measurements, and monitoring changes occurring rapidly, such as the influence of exercise, heart rate or ventilation [35]. In the work of Amann et. al., PTR-MS was described as a powerful alternative for online monitoring of biochemical reactions in the body [68]. Inability to provide chemical identification of the compounds is its biggest disadvantage. That is not a case for SIFT-MS [67] that even though also based on chemical ionization, allows identification and quantification of VOCs. Handling Omic- Data One of the challenges of breathomics, similarly to other omics, is the ability to analyze the tremendous amount of data. Breath samples contains thousands of volatiles, including the ones of interest coming from endogenous sources and connected to host metabolism, as well as exogenous ones, originating from the inhaled air and its contaminants. When it comes to statistical analysis, the goal is to choose the right tools that will allow to extract the information of interest and neglect the irrelevant one. Various machine learning methods are available nowadays to identify relevant and/or discriminating VOCs [69]. Before data can be analyzed, a preprocessing pipeline needs to be established to deal with spectral noise and differences in baseline [70]. Denoising, baseline correction, alignment across all samples, peak picking, automated peak matching, as well as converting peak areas into a data matrix are all steps to be taken to receive a reliable base for further analysis. This was described in detail by Smolinska et.al., [69]. A major aim of machine learning in breathomics is to explore and understand data complexity. Several characteristics need to be considered. Within breath data set, the number of detected compounds can reach thousands and in general, will by far exceed the number of subjects included. For further analysis, the number of variables needs to be reduced by removing redundant, repetitive, and irrelevant information. Before the actual variable selection strategy can be applied, compounds that are present in small quantity of the studied samples should be removed. In our case, compounds that were present in less than 20% of samples in at least one of the studied groups were eliminated as they do not carry representative value for the studied objectives. As a subsequent step, an exploratory analysis by means of unsupervised approaches is a good way to gain an unbiased view of the data. Most frequently used is Principal Component Analysis (PCA) that allow visualization and distribution of samples, certain outlier detection, as well as spotting eventual grouping of data points and overall trends in the data. It extracts and displays the systematic variation in the data, provides a summary of all samples and shows how they are related to each other. The relation among measured VOCs impacting separation between samples can also be visualized. More detailed explanation of PCA and its interpretation can be found in Chapter 3. The next step of analysis is to use a priori knowledge, e.g., patient labels, treatment groups and other subject characteristics in supervised analyses. This approach enables finding discriminatory compounds and building predictive models. Although, supervised analysis is powerful, improperly used can lead to biased results. To avoid overfitting, predicted algorithms need to be validated by means of different types of cross validation within the studied set, by use of internal validation set, bootstrapping mentioned later, or ideally in a new independent data set. Supervised technics can be split into linear and non-linear analyses. Linear discriminant analysis (LDA) is a fast and powerful technique regularly applied in breath research. However, for it to be reliable few steps are required to counteract its limitations. Since LDA assumes normal distribution of the data and since it can only be applied if the number of samples is larger than number of measured compounds – breath data requires transformation to a normal distribution and substantial variable reduction. Example of linear method suited for breath analysis, is Partial Least Square (PLS) and its classification version- Discriminant Analysis (PLS-DA), which can be performed to find VOCs that discriminate between studied groups. PLS-DA resembles PCA and is a latent variable method [71]. While the first principal component (PC) in PCA is constructed in the direction of the highest variance in the data set, the latent variable (LV) of PLS is built in the direction that explains the highest covariance between VOC data and examined class, leading to increased separation between classes. More about that approach can be found in Chapter 6 [72]. The method was successfully implemented in discriminating healthy children from children with allergic asthma, as well as, in recognition of lung cancer [73, 74]. PLS-DA is a powerful technique; however, it only allows looking at linear relationships in the data, is prone to overfitting and can be affected by outliers. In complex biological systems, nonlinear relations between compounds are often present. For example, disease subgroups, variations in disease stages, medication use, can additionally alter metabolic profiles and generate relations that cannot be captured by previously mentioned linear tools [75]. Nonlinear statistical tools have more power in terms of prediction but are much more difficult to interpret [69]. Among many nonlinear supervised methods used in pattern recognition Random Forest (RF) is one of the most often applied. In RF, randomly selected subsets of original samples selected with replacement (so called bootstrap aggregating used as training set) and randomly selected sets of compounds are used to build a large collection of uncorrelated trees so called weak classifiers. The samples that were not selected into the training set for classifier building are consequently used to validate the model (so called out-of-bag cases for calculation out-of-bag error). The goal of RF is to build many weak classifiers as different as possible from each other. The number of trees, number of samples in terminal node and the number of random variables being used at each node must be optimized [76]. Random forest was used to select most discriminatory compounds in Chapter 5 for IBS recognition and in Chapter 6 to find compounds discriminating active CD from subjects in remission and from healthy subjects [13]. It was previously used also for classification of patients with different stages of lung diseases and for COPD and bronchial carcinoma diagnosis [77, 78]. Supervised machine learning allows capturing trends in data and identifying compounds responsible for the differences spotted between studied groups. Nevertheless, we need to prove that these trends are not artificial and can be found also in an independent data set. It is extremely important to validate obtained results [79]. It is recommended to have a large enough sample size to split the data from the start in a training set, which will be used to learn, optimize algorithm, and spot the patterns in data, and a validation set, which is solely used for checking reliability of the findings. Such a validation is applied to both studies of diagnostic nature described in Chapter 4 and Chapter 5. Ideally one would also validate results in an external cohort, recruited at different time points and preferably at different clinical sites. Despite the major progress in breath research, none of the candidate biomarker sets reported in untargeted studies had reached the clinic yet. Several reasons can be put forward for this and although hard to pinpoint exactly, lack of consistency between studies is an important factor. For instance, for disease as IBD and lung cancer more than 40 VOCs are proposed [80]. A recent review on VOCs in IBS and IBD from breath and faeces samples, also showed little overlap [81]. This heterogeneity in results between studies can (in part) be explained by the varying procedures used for breath sampling and storage, the application of diverse detection platforms, differences in data processing and predictive modelling. Furthermore, most of the biomarker sets are found in single-center studies and are not yet externally validated in multi-center studies, making these biomarkers speculative and as such not equipped to be the base of a diagnostic tool in the clinic. Standardization in Breath Research It is important to consider standardization at every step of the process in each methodological approach and to validate the results in a multicenter study. Attempts are ongoing by the European Respiratory Society to define standard guidelines for sampling procedures [82]. Furthermore, within the International Association of Breath Research (IABR) [83] tasks forces are working on harmonizing breath sampling and analysis techniques including QC standardization between labs. While some standardized practices could be implemented across different methodological platforms (e.g., minimum reporting requirements on number of samples, number of variables, demographics, including food, smoking, medication use and consistent use of reporting units to name few), other will depend on the method and instrumentation used. Using a standard gas mixture as a reference to test equipment and any preprocessing steps was proposed by one of the priorities for future research in exhaled VOCs according to European Respiratory Society [83]. In recent work of Stavropoulos et al., the importance of batch analysis within and across studies was thoroughly explained [82].

See also these dissertations

We print for the following universities