Statistically reliable, useable, and meaningful national-scale environmental data has long been acknowledged as a challenge in the NZ context, particularly for the Land Domain. The first 2 years of the IDA programme therefore focused on key gaps in the system.
This critical component of the IDA project brought together existing, heterogeneous spatial data from multiple sources relating to soil quality, land use, and species occupancy to produce a suite of higher value information products. The research involved looking at techniques for data harmonisation and integration, creating related data processing pipelines, implementation of data validation tools and mechanisms to report on data quality and data provenance. Technologies for integrating distributed data were also investigated.
The work identified issues such as how data are currently collected, e.g. scale, coverage, and frequency of collection, data quality, how data are managed, data ownership and related data access problems. These factors significantly affect the automation of data harmonisation and integration in the 3 domains considered in the research programme.
The findings from the work undertaken in this component of the IDA programme informed the contribution by IDA staff to an Our Land and Water National Science Challenge think piece -- A Data Ecosystem for Land and Water Data to Achieve the Challenge Mission (2016).
Commissioned Report for the Our Land and Water NSC -- A Data Ecosystem for Land and Water Data to Achieve the Challenge Mission
Land use is difficult to map as there is no single dataset that can easily describe how we use the land. We’ve developed ways to record the workflow for the creation of a land use map.
Our team contributed to national environmental reporting on soil quality by providing data management and monitoring protocols to improve national consistency. Workflows and scripting have been re-used for integration and validation of soil data as part of the process for importing legacy soil data into the NSDR.
Using the IDA data pipeline, key data - not previously available - extended species coverage for the development of specoies occupancy indicators.