The sixth newsletter of the HEALS project is... ...
Using data fusion techniques, traditional health and exposure data derived from fixed monitoring networks will be supplemented by a range of emerging novel techniques and technologies such as Agent Based Models (ABM), mobile phone apps, environmental sensor-webs, micro-sensors and satellite remote sensing. In addition we will considerably improve exposure modelling and phenotype identification using deterministic and probabilistic approaches, and applying new epidemiological and statistical methods to relate modelled exposure to health outcomes. ABM will be informed by data relating to an individual’s behaviour within his/her environment (such as movement data within specific micro-environments) and between individuals exploring interactions around health related behaviours and issues such as low Socioeconomic status (SES). Using these parameters and the evolution of agents, simulations will produce detailed information relating to the emulated systems, data that can be used to fill in the gaps that exist in traditional datasets. This holistic approach is highly novel, taking the best from existing monitoring and sensor technology, but supplementing it with computational modeling simulations where real-world data is unavailable at the spatial and temporal scales that the individual exposome requires. Although commonly used elsewhere, ABM and fusion methods have not been applied to our knowledge in environmental epidemiology. This array of novel technologies, coupled with state-of-the-art environmental monitoring of chemical health stressors will provide a complete and dynamic picture of external exposure to environmental chemicals.
Human biomonitoring (HBM)/ Omics
HEALS also focuses on the main biological processes that govern the biological and physiological responses to toxicological insults from environmental xenobiotics. Thus, it introduces the integrated approach to health risk assessment, which attempts to draw the maximum benefit from the exposure information related to the biomonitoring data collected throughout Europe. Biomonitoring studies performed so far in the EU (e.g. COPHES) indicate that children are exposed to several environmental contaminants/stressors but there is still limited or inadequate epidemiologic evidence to support clear associations between environmental exposures and health outcomes. Major HBM limitations concern the sample size, the lack of information on sources of exposure, the choice of biomarkers of exposure, the lack of consideration of susceptibility, and last but not least the great variability of testing age and health outcomes considered. HEALS intends to contribute to filling these knowledge gaps, by taking advantage of data collected in on-going EU human biomonitoring and epidemiological studies (including twin studies) to extract relevant information both for exposure and health effects in order to identify and validate predictive biomarkers to be applied in a pilot survey including singletons and twins cohorts across the EU.
This will be attained through the use of –omics, primarily metabolomics and adductomics, supported by targeted transcriptomics and proteomics coupled with physiology-based biokinetic modeling for data interpretation. Advanced bioinformatics and multivariate statistical techniques developed for genome-wide associations will be used for environment-wide association studies to link environmental exposure and health status data collected and tested in population surveys tackling key health endpoints of SCALE and the Parma Declaration such as respiratory, neurodevelopmental and neurodegenerative disease, obesity, childhood and type II diabetes (T2D). Finally, the tools brought to bear in HEALS will be put together in an integrated methodology that aims at optimally managing the knowledge base already available for twins cohorts in EU (including samples stored in dedicated biobanks) and to design and perform pilot surveys on children (mother-child cohorts, starting from pregnancy up to 3 years of age) including twins and matched singletons to be carried out in countries participating to HEALS. Thus, new methods for the estimation of the environmental burden of disease (EBD) will be developed, using novel data for predictive biological monitoring. The HEALS approach takes the temporal dimension into account, thus improving the study of latent and epigenetic effects of early life exposures.
Internal dose modelling
A key novelty in HEALS is the integrated use of advanced computational tools supporting environmental and biological data analyses for comprehensive data interpretation. These tools include physiology-based biokinetic models (PBBK), novel bioinformatics strategies for biomarker prediction and advanced multivariate statistics for associating the links between exposure to environmental stressors and health status and investigating causality.
Physiology Based BioKinetic (PBBK) models translate external exposures from multiple routes into internal exposure metrics, addressing the effects of exposure route in the overall bioavailability or the dependence on critical developmental windows of susceptibility, such as pregnancy, lactation and infancy. With regard to cumulative exposure, PBBK models can quantify the effect of the interactions among mixture compounds at the level of metabolism; however applications so far are limited to VOCs, metals and homocysteine. Recently, efforts have shifted also towards the integration of whole-body physiology, disease biology, and molecular reaction networks, as well as integration of cellular metabolism into multi-scale whole-body models. PBBK models are also used for assimilating biomonitoring data, through exposure reconstruction, i.e. by quantifying exposure components related to observed biomarkers levels. Several techniques have been developed for this ranging from exposure conversion factors (ECF) to Maximum Likelihood Estimates coupled to PBBK modeling approaches with synthetic biomarker data using Bayesian statistics.
A long-time hurdle to the widespread use of PBBK models for exposure/risk assessment is the lack of a standardised modelling framework. Thus, several research groups are developing generic PBBK models, either as stand-alone tools such as PK-Sim, Indus-Chem or incorporated within integrated computational platforms for exposure assessment such as INTERA, TAGS and MENTOR. The development of generic PBBK models for many chemicals (including data-poor and new chemicals) is supported by recent advances in quantitative structure–activity relationships (QSARs) and quantitative structure–property relationships (QSPRs). The HEALS methodology advances the current state of the art by integrating the above elements through development and validation of a generic lifetime (including pregnancy) multi-route PBBK model for chemical mixtures. The integration of this generic PBBK model into a wider modeling framework will allow forward (internal exposure, BEs, BPADs) and reverse calculations (exposure reconstruction) linking functionally external exposure components (Stream 3), biomonitoring data (WP4) and omics (WP5).
Another key aspect of HEALS is the carving of innovative bioinformatics strategies for biomarker prediction. The bioinformatics tools currently available for biomarker detection and analysis range from statistical approaches to data mining. The latter is the process of discovering valuable information from large amounts of data in the form of associations, patterns, changes, or significant structures. Data mining can be descriptive or predictive. The distinction between descriptive and predictive data mining models is in many cases unclear mainly because the same tools can be used either way. However, when prediction is under consideration, it can be reached through classification or regression. The available techniques for both tasks are numerous, with decision trees, neural networks and support vectors more widely used.
In complex problems model combination has been adopted to improve prediction performance. Combining domain specific models into a cohesive analysis framework, or a meta-model, is an approach that gathers increasing attention. In multifactorial problems, as is the case of the exposome, a single model usually fails to learn and generalize efficiently on the entire training set. Building a meta-model instead, offers the ability to each one of the component models to specialize on certain types or even subsets of data. Meta-models will be used to integrate the results from descriptive and predictive data mining models. By post-processing the architecture of the derived meta-model as well as by interpreting its inference mechanism, significant multivariate profiles can be revealed that best describe the available data. Systematic examination of these profiles can indicate the most eminent traits of robust candidates for predictive biomarkers. Based on the profile associations a biomarker fusion schema will be designed to improve prediction accuracy and personalized application.
WP4-WP6 results (biomonitoring, omics and PBBK modeling) from cohort studies, together with environmental exposures (Stream 3), establish the exposome. Derivation of predictive biomarkers based on these data requires pre-processing the large amount of data produced, the discovery of specific data patterns and/or clusters, the creation of a data model based on a training dataset and, finally evaluation of that model with regard to its validity and prediction capacity on the basis of test data. Advanced bioinformatics will be used to evaluate epigenetically-influenced and independent SNPs and the correlation of splicing/methylation.
WP7 will apply and enhance currently available bioinformatics to select most relevant omics data for given exposure/disease pathways. For descriptive data mining, the FPGrowth and LPMiner algorithms will be used for pattern discovery, whereas for data clustering a number of available tools (K-means, self-organizing maps (SOM), graph-based clustering) will be used. For predictive data mining techniques such as artificial neural networks (ANN), decision trees, support vector machines (SVM), K-nearest neighbors and Bayesian networks will be tested and integrated to improve upon the current state of the art. WP7 will also provide the methodological tools for integration of multiple omics biomarkers into a mechanistic description of toxicity pathway interactions, in relation to external/internal exposure. This will be achieved by collecting/retrieving relevant pathway information from sources such as Wikipathways and subsequently developing from these systems biology pathway models for the endpoints identified in Stream 5 using the predictive bioinformatics approaches outlined above. Systems biology aims to understand how biological function, absent from isolated biomarkers, arises when they are components of their system.
The DIAMONDS data infrastructure, an integrated metadata/data and analysis infrastructure for computational chemistry and genomics will be used as bioinformatics data integrator, in addition to the Generic Study Capture Framework/dbNP infrastructure, which is an IT infrastructure capable of documenting human study related metadata and data, in relation to related omics data.
Environment-wide association studies
HEALS introduces a novel approach towards defining causal associations between health status and environmental stressors through the integrated use of advanced statistical tools for environment-wide association studies (EWAS). Environmental factors that are correlated are not considered confounders; rather they are co-variates, which are in “linkage disequilibrium” with each other. EWAS findings could then be used to identify further factors that may be in “disequilibrium”, for further detailed measurement and causal identification. Considering that adjustment-variable selection in epidemiology can be broadly grouped into background knowledge-based and statistics-based approaches, the use of Directed Acyclic Graphs (DAGs) has come to be a core tool in the background-knowledge approach. DAGs present assumed relationships between variables graphically and, based on these assumptions, identify variables to adjust for confounding and other biases. The enhancement in DAGs construction in epidemiology that includes arrow-on-arrow representations for effect modification boosted their applicability, including the identification of the direction of unmeasured confounding bias, adjustment for socioeconomic status in occupational cancer studies, identifying variables that need to be adjusted for high resolution spatial epidemiology, as well as metabolic syndrome confounders. Recent applications include the assessment of complex gene-environment interactions, as well as asserting that the role of ozone in studies of temperature and mortality is a causal intermediate that is affected by temperature and that can also affect mortality, rather than a confounder. Also, the combination of directed acyclic graphs and the change-in-estimate procedure is a novel approach to adjustment-variable selection in epidemiology.
Bayesian inference based modeling has recently been used for the minimization of covariates and determinants in epidemiology, considering that such models offer a more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent in epidemiology. Their applicability domain includes Bayesian regression models for analyzing longitudinal binary process data with emphasis on dealing with missing data, Bayesian adjustment for covariate measurement errors and more specifically for health association studies as Bayesian sensitivity analysis for mismeasured and unobserved confounders.
In HEALS, the tools mentioned above will be combined to provide an integrative framework for identifying the proper covariates that affect the health endpoints of relevance under the light of the health surveys, as well as to constrain the magnitude and direction of bias parameters. In practice, HEALS ushers in “enviromics”, attempting to bridge environmental factors and function through the collection and analysis of dynamic envirome data. The necessary elements to such an analysis are: i) setting the universe of cellular functions and envirome components, ii) collecting informative envirome data over time, and iii) systems level analysis of dynamic envirome data to find relationships between environmental variables and cellular functions.