Working Package 8: Environmental data mining

Working Package 8: Environmental data mining

PARTNERS: UPMC, IOM, USTUTT, VTT, TNO, CSIC, UOWM, CERETOX, IDMEC-FEUP, CNR, NCSRD
LEADER: UOM
START MONTH: 4
END MONTH: 24

Objectives:

  1. To identify, mine, collect and review existing datasets  including environmental and food contamination variables  for different environmental health stressors considered in Stream 5 and of ancillary data such as spatially resolved land use/cover  data.
  2. To analyse the collated data in terms of collection method and data quality, availability, applicability domains.
  3. To store the collated datasets  within an environmental data management system to render  the data readily available  for the WP9 and WP11 and finally for health impact assessment in population studies executed  in Stream 5.

Description of work and role of partners:

WP8 will develop  an environmental data management system to permit the integration of data on emissions of stressors, concentrations of toxic substances in environmental media (outdoor and indoor air, soil, water), in food and in drinking water and external exposures to environmental  hazards.

The following specific tasks have been identified:

 


Task 8.1 Data collection (UOWM, USTUTT, VTT, CSIC, IMDEC-FEUP, TNO, CERETOX, CNR)

Environmental data sources needed  to perform Environment-wide  application studies  in the areas covered  by the population studies addressed in Stream 5 will be identified through a detailed review including past and ongoing research and survey projects both at National and European level and EU-wide monitoring systems such as the ones managed by EEA  and ESA,  national and regional monitoring networks in the areas of interest to the Stream 5 population studies. Contributions on data provision from every HEALS partner  will be assessed to verify the data directly available  within the consortium.

The main objective of this Task is to gather collect  and mining the environmental  data from the information sources identified through the above review process and to be successively stored in the environmental management system developed in Task 8.3. The data collected,  relevant to the groups of substances identified in Stream 5, will comprise  but will be not limited to the following variables:

  1. Emission data and emission factors
  2. Satellite  data for estimation of air pollution levels (data from and in collaboration  with the GMES initiative)
  3. Pollution levels in different media  (outdoor air, indoor air, soil, surface water, ground water)
  4. Pollution levels in food and drinking water
  5. Meteorological  data to be used in input of air quality modeling
  6. Land use/land/cover  for estimation of emission inventory

Since the data collected  at this stage should serve WP11 and WP12, the dataset  will be completed and provided in a standard  format, compliant with the INSPIRE Directive.

 


Task 8.2 Quality Assessment / Quality Control (QA/QC) (UOWM, NCSRD, UPMC, URV, IOM)

The data collected in the previous  task will be evaluated  against  their quality and applicability through the activities foreseen in the frame of Task 8.2. Careful checking of all the data used, whether obtained from outside the project or derived  within it is mandatory. A wide range of methods  will be used for this purpose,  building on techniques already extensively applied in partner institutions. These will include:

  • consultation with data suppliers  and past data users to identify any known gaps or uncertainties;
  • scrutiny of relevant metadata, describing the source and genealogy of the data (e.g. sampling methods, measurement procedures, analytical procedures, reporting);
  • establishment of clear data standards and criteria prior to their acquisition  and use, so that data can be rejected if they  are not of a sufficient quality (e.g. in terms of coverage, sampling density, measurement  accuracy, timeliness);
  • statistical screening of the data – e.g. to check for outliers, impossible  values, anticipated correlations/patterns;
  • manual checking of subsamples of the data (e.g. to check for formatting errors);
  • intercomparison and triangulation against independent data sets and sources.

An important corollary of the above QA/QC activities  will be the identification of gaps, which  will need to becovered. Moreover  this will help the project team guide the optimally design  the Pilot European Exposure and Health Examination Survey (EXHES) carried  out in WP17.

 


Task 8.3 Designing and building the Environmental management system (UOWM, VTT, TNO, CNR)

After the data collection (Task 8.1) and its quality control (Task 8.2) the data will be stored in a coherent environmental data management system for further use within the project.  The environmental data management system (EDMS) is planned  to store all the data collected  in Task 8.1. The work in this task will start with the design  of the Db structure which besides accommodating HEALS own datasets should be able to retrieve data from existing Databases identified in Task 8.1 through suitable query scripts. The Db will be implemented in a standard database package such as MySQL, in order to grant interoperability in data storage, management and exchange with the Geo-database platform developed  in WP12.

All data structures  will be relational, i.e. the data tables  will be linked to each other by means of univocal identifiers of records (IDs). In this way, each record can be easily accessed and shared by different tables.  This will ensure a seamless integration  with the GeoDatabase platform developed  in WP12. In addition, all data will be geo-referenced, by specifying the geographic coordinates of each single observation,  both for point-form and for polygonal spatial information, and as such they will be ready for analysis by GIS  technology developed in WP12. All data will be univocally coupled to a time reference  (instant, hour, day, month, etc.) and as such they will be ready to be investigated by using time series based statistics, and furthermore can be easily aggregated on the basis of different time scales (e.g. weekly or monthly averages).

Several query interfaces  will be developed, in order to integrate the database  with tools for: a) automatic updating (update query), b) importing form other data formats or software (import query), c) exporting into other data formats (export query), d) selecting specific subsets of data (selection query), e) grouping records by means of aggregation functions (group by query).

The structure of the EDMS  will be compatible  with the IPCHeM database of the JRC and the ToxHub platform of the HEROIC project, so that the collected  data can feed into the above databases during project execution and in the future, contributing thus to environmental  data integration across Europe.