The Innovative Data Analysis research programme: an overview and progress to date
The Innovative Data Analysis (IDA) programme is an MBIE-funded research project led by Landcare Research that runs for 4 years (2014–2018). The overall goal is to develop better infrastructure and tools to access and make use of existing data to support environmental reporting and decision-making.
The programme works in partnership with a wide range of stakeholders including regional and central government, and aligns with key initiatives such as the State of the Environment (SOE) reporting, Environmental Monitoring and Reporting (EMaR) and the National Science Challenges. There are three main test applications based on indicator domains: soil health, land use, and species occupancy. A key aspect of the project is developing techniques to characterise the provenance, quality, and uncertainties for each data source and recording workflows to enable an auditable process behind any reporting product.
In the first 2 years we have focused on integrating, harmonizing, and federating key land resource and biodiversity datasets in a standardised, statistically robust and transparent way, with some of the highlights including:
Federating datasets on soil
To support the development of soil quality indicators, data from the legacy ‘500 Soils’ database was evaluated to establish whether it could be loaded into Landcare Research’s new National Soils Data Repository (NSDR). The feasibility study showed that the NSDR is capable of storing the data; however, various initial data cleansing steps were required. As a result it will be important to look at defining national standards and procedures for the registration of sites and collection of data, so that monitoring datasets are recorded in a consistently organised way.
A linked project, the Open Geospatial Consortium (OGC) Soil Interoperability Experiment (IE), has recently been published as an approved OGC Engineering Report. This 6-month experiment successfully reconciled multiple existing soil data exchange models into a single draft standard that was then implemented as a set of internationally interoperable prototype data services and demonstration clients. Through its technical control of this project, Landcare Research was able to ensure that the IE work could subsequently be adapted for soil quality data.
Federating datasets on land use
Reliable and up-to-date land use (LU) information is important for the increased use of spatial modelling, for analysing and reporting trends, and for the development of fair and consistent land-use policies. Several independent LU classifications of varying qualities have been developed for New Zealand, all of which tend to draw on the same public and commercial datasets. However, classification methodologies are only occasionally documented and this can make it difficult to regenerate a given classification when the underlying source datasets are updated. The IDA programme produced a NZ Land Use Classification Regenerator based on spatial models and recorded workflows. The emphasis was on practical LU classifications, particularly those orientated towards rural and agricultural uses, e.g. land use classification for soil monitoring, and the Land Use of New Zealand (LUNZ). We have been successful in reproducing these classifications with a repeatable workflow using ArcGIS. We are now developing a platform-independent technology ‘pyLUC’ that provides a framework for defining and constructing LU changes that are automatically well-documented and easily reproducible. To enforce the use of well-documented and versioned input data, pyLUC can currently interact with datasets available on the LRIS portal. More development is planned to allow automatically generated reports for each classification, and additional usability features, such as creating a graphical user interface.
An open source linked data registry system was also deployed to test its suitability for the publication and management of land use classification systems. This registry software uses semantic Web standards and technology to store, describe, organise, search, and publish classification data on the web. Further work will use semantic web tools to infer equivalence between classes. Ultimately these pieces of research will unite, allowing pyLUC to have access to the registry and providing standardised definitions for the automatically-generated reports.
The final 2 years of this programme will focus on investigating different approaches to analysing, modelling, and visualizing indicators.
ANNE-GAELLE AUSSEIL, DAVID MEDYCKYJ-SCOTT, ALISTAIR RITCHIE, ANDREW MANDERSON, BEN JOLLY, JERRY COOPER – LANDCARE RESEARCH