Working Package 12: Exposure and Health data management

Working Package 12: Exposure and Health data management

LEADER: VTT
PARTNERS: UPMC, AUTH, USTUTT, VTT, UOWM, CERETOX, IDMEC-FEUP, FMUP, NCSRD, URV, UC,
START MONTH: 4
END MONTH: 36

Objectives:

  1. To define the functionality and design the structure of the HEALS GeoDatabase platform
  2. To define the technical framework and system architecture
  3. To develop, implement and populate the HEALS platform through an efficient integration and assimilation of the all datasets  collected/developed for HEALS

Description of work and role of partners:

The main objective of WP12 is to develop  and operate the HEALS GeoDatabase platform (publicly available), which will systematically  support the collection of and access  to all datasets collected/developed for HEALS environment-wide association  studies. The platform will enable  users to manage  and explore spatial data (when applicable),  to process these data and to effectively visualize  the results  of spatially resolved models.  It will be linked to a number of external database  modules  to access  datasets including environmental as well as molecular biology/biochemistry and clinical data to support the performance  of Environment-Wide Association Studies on the populations studied in Stream 5. It will also be linked towards Biomonitoring/omics/PBPK/Bioinformatics  data infrastructures developed in stream 2. The HEALS database platform will effectively  support  the HEALS methodology  for the construction  of the individual exposome  and the derivation of environment-wide association  studies linking human exposure  to chemical  and physical stressors over an individual’s lifetime and observed  health outcomes.  It will be connected  to the ToxHub database of the HEROIC project to support European data integration.

The following specific tasks have been identified:

 

 


Task 12.1 Definition of functional specifications (VTT, UPMC, USTUTT, UOWM, CERETOX, IMDEC-FEUP, FMUP, NCSRD, URV)

The database  will be designed  to accommodate  both geo-referenced  and non-spatial data. Geo-referenced data (i.e. environmental, exposure,  population, satellite and GPS  sensor-based data)  will be included  to capture spatial variability of exposure  information and support spatial analysis thereof using a Geographical Information System (GIS).  Spatially differentiated analysis may support the development  of more refined exposure and risk assessment and thus contribute to the development  of more refined risk management measures. This is particularly important when policy-relevant  conclusions need to be drawn. Non-spatial  datasets  will be also included  through linkage to a number of publicly available databases to retrieve molecular biology/biochemistry and clinical data that already exist or that are produced  during the project as needed  to perform EWAS  to the population surveys addressed in Stream 5.

First steps include:

(a) the definition of the Database functionalities;

(b) the identification of the main information sources and

(c) the incorporation of the datasets  the GeoDatabase platform will include towards  the implementation of the HEALS approach.

The results of this process will be analysed and discussed within the HEALS consortium during technical meetings in close collaboration  with all the other Streams. Based on these deliberations the technical team of WP12 will define  the HEALS platform functional specifications  which will guide the development  of the overall HEALS database design. Special care will be paid to compatibility  with ToxHub (HEROIC), and IPCHeM (JRC). A key functionality  will be the possibility to readily deliver relevant HEALS data to the IPCheM database.

 


Task 12.2 Definition of the technical framework and system architecture (VTT, AUTH, URV, IDMEC-FEUP, USTUTT, UOWM, UPMC, UC)

According  to the conceptual  framework and the information collected  in Task 12.1 the technical framework of the HEALS platform will be defined. The platform will be web-based,  publicly available, flexible and interactive. Structurally it will include  i) a Library containing significant documents and guidelines, as well as links to a number of external database  to access  chemical,  molecular biology/biochemistry data and clinical dataset, ii) a GeoDatabase, which will systematically  support the collection of and the access  to all datasets collected/developed  for HEALS case-studies and population surveys. Geo-referencing and clustering of data will follow the technical  specifications  of the EC  INSPIRE initiative, and in particular the Environment  and Health cluster specifications. In addition the platform will be designed  to report and display uncertainty across all computation stages and datasets composing the HEALS system.

The platform will be operationally linked to database  modules  incorporating internal HEALS and external datasets such as the Human Metabolome database  (HMDB) which contains 40278 metabolite entries including both water-soluble  and lipid soluble  metabolites.  Additionally, 7761 protein (and DNA) sequences are linked to these metabolite entries. Information on pathways involved in both primary and secondary metabolism  will be accessed through the MetaCyc Database, data on genome sequencing  will be retrieved through the KEGG,

and GenBank Databases; other Databases includes  bioactivity screens of chemical substances (PubChem) and protein sequence database (PDB, Swiss-Prot). ). Further, in order to link towards  cohort specific omics data generated  within the HEALS project, linkage towards the bioinformatics data infrastructures developed  under WP7 (dbNP and DIAMONDS) are foreseen. In addition automated  link to libraries  such as EpiSuite and QSAR models  will be developed  to support the parameterization of the generic PBPK  model developed in WP6 for known and new chemicals with limited information. Clearance/elimination  kinetics will be retrieved from PopGen and the publicly available  data from Simcyp, while plasma protein binding will be obtained from the ToxCast Phase I chemical library. Functional  on-line links with SES and EHES will allow the integration of high quality national health data at European scale into the HEALS platform in support of the EWAS  studies developed in WP13. The HEALS database  will support extensive  text, sequence, chemical structure and relational query searches. In order to compile the individual exposome,  additional information will have to be attributed related  to other exogenous factors, such as the potential use of pharmaceutical drugs. Interactions elucidated by the use of these types of compounds  will need to be properly interpreted, thus links to databases such as DrugBank will be established. The dissociation  between metabolites identified due to use of drugs instead from environmental toxicants  will be facilitated by the interpretation to Toxin and Toxin Target Database (T3DB); their interpretation will be facilitated by using the tools developed in WP7, utilizing data from the Small Molecule Pathway Database (SMPDB),  an interactive, visual database  containing more than 350 small molecule pathways found in humans. All the system will undergo  tests for operational resilience  in order to ensure that continuous  service would be available  to the HEALS community and the users of the HEALS methodology and data sets.

 


Task 12.3 Development and implementation of the HEALS platform (VTT, AUTH, NCSRD, URV, UC)

Based on the analysis and design work performed in WP12 and in WP13 the development  and implementation of the HEALS platform will be performed. At first, all available spatial data sets will be geo-referenced  to capture spatial variability and will be imported in the GeoDatabase. This will entail both input data to the modelling tools as well as the modelling results. Close collaboration  with WP8 is foreseen  to include the Environmental  Management  System developed in WP8 in the GeoDatabase. For each data set the respective  adapter will be developed  that will map the existing data to the commonly agreed  model. The system will be prepared to accommodate  future adapters  either for locally stored or remotely stored datasets and integrate them in the HEALS platform. Close collaboration  with Stream 5 partners  will be ensured at this stage, so that all information collected  in the various population studies are properly imported in the system. Molecular biology/biochemistry and clinical  data will be retrieved from the existing databases identified in Task 12.2 and accessed through hyperlinks and query scripts.

In parallel, a development  track will focus on the data integration patterns  that need to be developed in order for them to be applied in the Environmental-Wide Association  Studies on the populations  identified in Stream 5. We will develop  or use primarily open source libraries  that allow merging,  splitting, filtering, mapping of information, as well as basic computational functionality for on-line data handling and analysis.

At the end of the task, basic tools will allow performing operations  on the complete set of available datasets, irrespective of their original source. The outcome of this integration effort would be sets of spatially resolved data used by the GIS  underlying engine and other non-spatial  information. The latter is a key component of the overall database design since Geo-referenced  data (e.g. environmental, exposure, epi, cohort, biomonitoring) will have to be connected  with molecular  biology/biochemistry data to unravel individual exposome.

For the assessment and management of spatially resolved data, the HEALS platform will make use of a GIS system. The GeoDatabase platform will enable  the user to manage  and explore spatial data, to process these data (e.g. spatial statistical analysis such as Exploratory Spatial Data Analysis, Principal Component Analysis, Cluster Analysis, Hot Spot Analysis), and to effectively visualize  the results  of spatially resolved models. Several query interfaces  will be developed, in order to integrate the database  with tools for: a) automatic updating (update query), b) importing form other data formats or software (import query), c) exporting into other data formats (export query), d) selecting specific subsets of data (selection query), e) grouping records by means of aggregation functions (group by query).

Capturing,  qualifying and quantifying uncertainty is key to the development  of a robust data management system able to support effectively the association  between exposure  and health outcomes. To this aim the platform will support advanced  mapping methods to quantitatively display the uncertainty associated to the data set stored in the GeoDatabase.

Furthermore,  meta-information allowing the user to identify the origin of the data, their uncertainty levels, the spatial and temporal scales of reference, and the format requirements  for communication  with the other data sources available in the system will be included in the GeoDatabase platform.

Particular  attention will be paid to the analysis of the spatial and temporal relations between various types of environment and health data: the definition of possible spatial relations (e.g. overlay, inclusion, proximity) and temporal relations (e.g. definition of the most suitable  time interval for integrating information from different sources) will support  the fusion/integration of different environment  and health data. Taking into account the relevance of an effective communication of results  to different end-users and stakeholders, it is suggested to implement HEALS platform on a WebGIS system (commercial or open source) enabling easy access  to data and results visualization.