How to map INSPIRE Annex III to local PostgreSQL schemas Jump to heading

Mapping complex INSPIRE Annex III GML application schemas into production PostgreSQL/PostGIS environments requires deterministic handling of XML inheritance, coordinate precision decay, and schema drift. Government tech teams and Python ETL engineers frequently encounter silent data truncation, broken foreign key references, and CRS misalignment when flattening themes like Hydrography, Transport Networks, and Land Cover. Aligning ingestion pipelines with established Geospatial Schema Architecture & Standards Mapping practices eliminates downstream validation bottlenecks. The following reference provides a strict, implementation-ready workflow for schema generation, CRS enforcement, and automated drift routing.

1. Flatten GML Inheritance & Generate Deterministic DDL Jump to heading

INSPIRE Annex III relies on gml:AbstractFeature inheritance and appSchema type extensions. Direct ogr2ogr ingestion often fragments tables or drops nullable constraints. Pre-parse the XSD to extract a flat column registry, then generate PostgreSQL DDL with explicit guards.

  • Strip xsi:nil="true" attributes during XML parsing.
  • Map nilReason values to a dedicated _audit_nil_reason column.
  • Enforce NOT NULL on primary identifiers and lifecycle timestamps.
  • Apply CHECK constraints to prevent temporal paradoxes.
sql
-- Deterministic base table for Annex III Transport Network
CREATE TABLE IF NOT EXISTS inspire_annex3_transport_network (
    gml_id TEXT PRIMARY KEY,
    inspire_id TEXT UNIQUE NOT NULL,
    begin_lifespan_version TIMESTAMPTZ NOT NULL,
    end_lifespan_version TIMESTAMPTZ,
    geometry GEOMETRY(MULTILINESTRING, 25832) NOT NULL,
    feature_type VARCHAR(50) NOT NULL,
    _audit_nil_reason TEXT,
    CONSTRAINT chk_lifespan CHECK (begin_lifespan_version <= COALESCE(end_lifespan_version, CURRENT_TIMESTAMP))
);

Handle missing fields by creating a staging schema that mirrors the target table but allows NULL across all columns except gml_id. Run an INSERT ... SELECT with explicit COALESCE fallbacks before promoting data to production. This prevents pipeline aborts when upstream GML publishers omit optional attributes.

2. Enforce CRS Locking & Coordinate Precision Jump to heading

Annex III mandates ETRS89 derivatives. Local deployments frequently default to EPSG:4326, causing metric distortion in spatial joins. Lock the target schema to the correct projected SRID and enforce decimal precision at ingestion.

  • Transport Networks: ≤ 0.01 m tolerance
  • Hydrography: ≤ 0.10 m tolerance
  • Land Cover: ≤ 1.00 m tolerance
python
import psycopg2
from psycopg2 import sql

PRECISION_THRESHOLDS = {
    "inspire_annex3_transport_network": 0.01,
    "inspire_annex3_hydrography": 0.10,
    "inspire_annex3_land_cover": 1.00,
}

def enforce_crs_and_precision(conn, table_name: str, target_srid: int = 25832):
    threshold = PRECISION_THRESHOLDS.get(table_name, 0.1)
    stmt = sql.SQL(
        "ALTER TABLE {table} "
        "ALTER COLUMN geometry SET DATA TYPE GEOMETRY(MULTILINESTRING, {srid}) "
        "USING ST_Transform(ST_ReducePrecision(geometry, {tol}), {srid})"
    ).format(
        table=sql.Identifier(table_name),
        srid=sql.Literal(target_srid),
        tol=sql.Literal(threshold),
    )
    with conn.cursor() as cur:
        cur.execute(stmt)
    conn.commit()

Address CRS mismatches by validating the source GML srsName attribute before transformation. If the source lacks an explicit CRS, default to EPSG:25832 but log a warning to the audit table. Use ST_IsValid and ST_MakeValid to repair self-intersecting geometries before applying precision rounding. Consult the official PostGIS ST_ReducePrecision documentation for tolerance behavior across different geometry types.

3. Automate Schema Drift Detection & CI Validation Jump to heading

Schema drift occurs when upstream data publishers modify XSD definitions without versioning. Embed automated drift checks into your CI/CD pipeline to catch mismatches before deployment.

  • Compare column counts and data types between staging and production.
  • Validate geometry type consistency (MULTILINESTRING vs LINESTRING).
  • Fail CI builds if mandatory INSPIRE attributes are absent.
python
# ci_drift_check.py
import psycopg2
import sys

def validate_schema_drift(conn, table_name: str, expected_columns: list):
    with conn.cursor() as cur:
        cur.execute("""
            SELECT column_name, data_type 
            FROM information_schema.columns 
            WHERE table_name = %s AND table_schema = 'public'
            ORDER BY ordinal_position;
        """, (table_name,))
        actual = {row[0]: row[1] for row in cur.fetchall()}
        
        missing = set(expected_columns) - set(actual.keys())
        if missing:
            print(f"DRIFT DETECTED: Missing columns in {table_name}: {missing}")
            sys.exit(1)
        print("Schema validation passed.")

Route CI failures to a metadata fallback queue. Store rejected records in a schema_drift_quarantine table with the original GML payload, timestamp, and error trace. This preserves data lineage while unblocking downstream analytics. Align quarantine routing with INSPIRE Directive Schema Compliance reporting standards to maintain audit trails during national submissions.

4. Bulk Ingestion & Audit Routing Jump to heading

Use psycopg2.extras.execute_values or COPY for high-throughput ingestion. Map nilReason explicitly to avoid psycopg2 type coercion errors.

  • Pre-validate XML payloads with lxml against the target XSD.
  • Batch inserts in chunks of 5,000 records to manage transaction logs.
  • Commit only after ST_IsValid returns true for all geometries in the batch.
python
from psycopg2.extras import execute_values

def bulk_insert_features(conn, table_name: str, records: list):
    # records format: list of tuples matching table column order
    with conn.cursor() as cur:
        execute_values(
            cur,
            f"INSERT INTO {table_name} (gml_id, inspire_id, begin_lifespan_version, geometry, _audit_nil_reason) VALUES %s ON CONFLICT (gml_id) DO NOTHING",
            records,
            page_size=5000
        )
    conn.commit()

Implement a fallback routing mechanism for malformed GML. When lxml validation fails, serialize the payload to a JSONB audit table and trigger an alert to the data stewardship queue. This prevents silent truncation and ensures full compliance traceability. Refer to the official GDAL GML Driver documentation for advanced XSD parsing configurations and namespace handling.