Scoring deliveries under the core data flows

It is proposed that the new, more harmonised approach for simple, while still meaningful evaluation criteria should be based on the overall logic outlined below, thereby making use of results from Reportnet's automatic data quality checking routines, as far as possible.

Most core data flows are using the XML 1.0 Standard as data exchange format, which allows for an automated checking of data quality in Reportnet, immediately after data upload. Over the years, data quality checks have been defined for most of the major data flows. It is assumed that for each of the core data flows, automated data quality checks will be defined where missing, while existing ones will be expanded and refined.

Future scoring for core data flows

The proposal is that in the evaluation of deliveries under the core data flows, scores of 0..4 points (earlier known as: smileys) will be given, according to the table below. Following existing practice from evaluating the earlier set of priority data flows, the main components for the scoring are a delivery's timeliness as well as its data quality. Scoring results from individual core data flows will be aggregated by country, so that an average score for the country can be derived, typically as achievement (performance) in percent. Additionally, average performance values can be calculated for a single data flow (across all countries), as well as average scores for different country groupings (across all data flows). More details and definitions for the individual scoring categories are given further down below.

Point scoring table for evaluating core data flows

Data Quality
Timeliness Basic test
failed
Basic test
passed
All tests
passed
Serious
delay
0 0 0
Small
delay
0 1 3
Timely
delivery
0 2 4

Definitions

The definitions for the categories used in the above matrix are given below. Similar definitions have largely been used in the past for the priority data flows. However, there was a substantial complexity and variety across the earlier priority data flow criteria, often obfuscating the underlying principles and commonalities.

Timeliness

The timeliness criteria refer to the actual reporting date, in comparison with the reporting deadline and the length of the reporting cycle. The logic is based on a process-oriented approach, taking into account how much effort is needed for handling a given delivery in subsequent data processing steps.

  • Timely Delivery: Envelope released on time in Reportnet, i.e. latest by midnight (UTC) of the reporting deadline.
  • Small Delay: The reporting delay is so small, that it is still be possible to handle the delayed delivery in the regular data processing and publish the data according to schedule (together with punctual deliveries). Unless specified otherwise, "small delay" is defined as being notably less than 10% of the reporting cycle, which typically translates into: max. 1 month (30 days) in case of annual data reporting.
  • Serious Delay: The reporting delay is larger than what has been defined as the limit for "small delay". The essential difference here is that the delivery arrived too late for being included in the regular, planned and scheduled data processing routines. Seriously delayed data deliveries will either be completely ignored in further data processing, or if resources allow for: handled in a separate ad-hoc process, leading to a partial update of the already published European dataset.

Data Quality

The evaluation categories are based on the results of (automated) data quality checks which inspect format and completeness, as well as internal and external consistency of the delivery. Typically, the checking rules verify initially the presence of all mandatory values (completeness check), followed by internal consistency checks, such as: uniqueness of primary keys, references between optional data elements (conditional tests) and checks for referential integrity between tables, or between GIS data and attribute data. A further option for checking the internal consistency is to test the incoming data against earlier deliveries under the same reporting, e.g. in form of outlier checks. Eventually, external consistency checks can be executed, where delivered data are compared against external reference data, e.g. code lists and nomenclatures provided by Eurostat or other international data providers.

  • Basic test failed: Generation of BLOCKER or ERROR messages during the basic test. In terms of Reportnet AutomaticQA routines, the basic test is typically the XML schema validation, unless a data flow specific "mandatory check" (or similar) exists.
  • Basic test passed: Only OK or INFO messages have been generated during the basic test.
  • All tests passed: Only OK and INFO messages have been generated in additional tests, which are usually defined as "complementary validation", "conditional checks", "cross-table checks", or similar.

Error levels in Reportnet's automatic checks (existing practice)

  • BLOCKER: Blocker messages indicate that the detected error will prevent data submission (envelope release is not possible).
  • ERROR: Error messages indicate issues that clearly need corrective action by the data reporter.
  • WARNING: Warning messages indicate issues that may be an error. Data reporters are expected to double-check relevant records.
  • OK: Data check has been executed without generating any warning or error messages.
  • INFO: Informative message. Neutral or statistical feedback about the delivery, e.g. number of species reported.
  • SKIPPED: Data check has been executed, but there was "Nothing found to check", typically because of missing optional data elements.
  • UNKNOWN: Script execution failed, e.g. due to missing reference data or unresponsive third party web service.