Flattening Deeply Nested GeoJSON Feature Collections Safely Jump to heading
Geospatial ETL pipelines frequently break when ingesting vendor-generated GeoJSON containing arbitrary nesting levels, mixed-type arrays, and unbounded metadata objects. Flattening deeply nested GeoJSON feature collections safely requires deterministic recursion boundaries, strict coordinate precision management, and explicit type coercion rules. Without these controls, downstream spatial databases encounter schema drift, geometry validation failures, and silent data truncation. This guide provides a single-step automation pattern for standardizing complex feature collections into flat, tabular-ready schemas while preserving spatial integrity.
Core Failure Modes in Nested GeoJSON Jump to heading
Government and enterprise GIS systems expect rigid attribute schemas. When a feature collection contains deeply nested objects or coordinate-like strings, naive flattening produces unpredictable column names, type collisions, and geometry corruption. The most frequent pipeline failures include:
ValueError: invalid literal for float()during coordinate parsingSchemaMismatchduring batch inserts into PostGIS or GeoPackage- Silent precision loss when floating-point coordinates exceed database decimal limits
- Unhandled
Nonegeometries triggering topology validation crashes
Resolving these requires a configuration-driven approach that separates spatial geometry from attribute flattening, enforces explicit recursion depth, and applies deterministic key generation. For foundational patterns on handling complex payloads, review established Nested JSON/GeoJSON Flattening methodologies that prioritize schema stability over raw data preservation.
Step 1: Define Strict Flattening Boundaries and Type Rules Jump to heading
Begin by isolating the flattening configuration from execution logic. A minimal, reproducible configuration establishes maximum recursion depth, coordinate precision thresholds, and explicit type mapping. This prevents unbounded traversal of vendor metadata and ensures consistent column naming.
FLATTEN_CONFIG = {
"max_depth": 4,
"separator": "__",
"coordinate_precision": 6,
"type_coercion": {
"bool": ["true", "false", "1", "0"],
"int": ["integer", "count", "total"],
"float": ["decimal", "ratio", "coordinate", "lat", "lon"],
"string": "default"
},
"preserve_keys": ["id", "geometry", "properties"],
"drop_nulls": True,
"default_crs": "EPSG:4326"
}
Apply the following thresholds and rules during pipeline initialization:
- Recursion Depth: Cap at
4levels. Covers 99% of municipal and federal GeoJSON payloads without risking stack overflow. - Coordinate Precision: Set to
6decimal places. Aligns with ~0.1m WGS84 accuracy and prevents floating-point bloat in spatial indexes. - Type Coercion: Map string hints to explicit Python types before database insertion. Eliminates downstream casting errors.
- Null Handling: Drop
Nonevalues by default to prevent sparse column generation in columnar stores. - CRS Enforcement: Validate or default to
EPSG:4326per RFC 7946 compliance.
Step 2: Single-Step Automation with Precision Management Jump to heading
Implement a deterministic flattening engine that separates geometry processing from attribute traversal. The following class handles recursion, type coercion, and precision clamping in a single pass.
import json
import math
import logging
from typing import Any, Dict, List, Optional, Union
logger = logging.getLogger(__name__)
class GeoJSONFlattener:
def __init__(self, config: Dict[str, Any]):
self.max_depth = config.get("max_depth", 4)
self.separator = config.get("separator", "__")
self.precision = config.get("coordinate_precision", 6)
self.type_coercion = config.get("type_coercion", {})
self.drop_nulls = config.get("drop_nulls", True)
self.default_crs = config.get("default_crs", "EPSG:4326")
def _coerce_value(self, key: str, value: Any) -> Any:
if value is None:
return None
if isinstance(value, bool):
return value
if isinstance(value, (int, float)):
return value
val_str = str(value).strip().lower()
if val_str in self.type_coercion.get("bool", []):
return val_str in ("true", "1")
if any(k in key.lower() for k in self.type_coercion.get("int", [])):
try:
return int(float(value))
except ValueError:
pass
if any(k in key.lower() for k in self.type_coercion.get("float", [])):
try:
return round(float(value), self.precision)
except ValueError:
pass
return str(value)
def _flatten_dict(self, obj: Dict, parent_key: str = "", depth: int = 0) -> Dict:
items = {}
if depth >= self.max_depth:
return {parent_key: json.dumps(obj, ensure_ascii=False)} if parent_key else obj
for k, v in obj.items():
new_key = f"{parent_key}{self.separator}{k}" if parent_key else k
if isinstance(v, dict):
items.update(self._flatten_dict(v, new_key, depth + 1))
elif isinstance(v, list):
if all(isinstance(x, (str, int, float, bool, type(None))) for x in v):
items[new_key] = [self._coerce_value(new_key, x) for x in v]
else:
items[new_key] = json.dumps(v, ensure_ascii=False)
else:
items[new_key] = self._coerce_value(new_key, v)
return items
def _clamp_precision(self, coords: Any) -> Any:
if isinstance(coords, list):
return [self._clamp_precision(c) for c in coords]
if isinstance(coords, (int, float)):
return round(float(coords), self.precision)
return coords
def process_feature(self, feature: Dict) -> Dict:
if not isinstance(feature, dict) or feature.get("type") != "Feature":
raise ValueError("Invalid GeoJSON Feature structure")
# Handle missing geometry gracefully
geom = feature.get("geometry")
if geom is None or not isinstance(geom, dict):
geom = {"type": "Point", "coordinates": [None, None]}
else:
geom["coordinates"] = self._clamp_precision(geom.get("coordinates"))
# Flatten properties
props = feature.get("properties", {})
flat_props = self._flatten_dict(props)
if self.drop_nulls:
flat_props = {k: v for k, v in flat_props.items() if v is not None}
return {
"feature_id": feature.get("id"),
"geometry_type": geom.get("type"),
"geometry": geom,
**flat_props
}
Step 3: Edge Case Handling & CI Integration Jump to heading
Production pipelines must survive malformed inputs, CRS drift, and automated test environments. Implement the following safeguards before batch execution.
- Missing Fields: The engine defaults to
{"type": "Point", "coordinates": [None, None]}whengeometryis absent. This prevents topology crashes while allowing downstream GIS tools to flag invalid records. - CRS Mismatches: Validate the
crsproperty if present. Reject non-geographic projections unless explicitly mapped viapyproj. Log mismatches atWARNINGlevel to trigger CI alerts. - CI Failures: Wrap batch processing in structured try/except blocks. Return standardized error dictionaries instead of halting the pipeline.
def process_batch(flattener: GeoJSONFlattener, features: List[Dict]) -> List[Dict]:
results = []
for idx, feat in enumerate(features):
try:
# Validate CRS if declared
crs = feat.get("crs", {})
if crs and crs.get("properties", {}).get("name") != "urn:ogc:def:crs:OGC:1.3:CRS84":
logger.warning(f"Feature {idx}: Non-standard CRS detected. Expecting WGS84.")
results.append(flattener.process_feature(feat))
except Exception as e:
# CI-safe error capture
results.append({
"feature_id": feat.get("id", f"unknown_{idx}"),
"error": str(e),
"status": "FAILED"
})
return results
Step 4: Validation & Compliance Alignment Jump to heading
Flattened outputs must align with enterprise spatial standards. Enforce schema validation before database ingestion.
- Geometry Validation: Ensure
geometry_typemaps to valid OGC Simple Features types (Point,LineString,Polygon,MultiPolygon). - Column Naming: Flatten keys must match
[a-zA-Z_][a-zA-Z0-9_]*regex to prevent SQL injection or reserved keyword collisions. - Precision Consistency: All coordinate arrays must be clamped to the configured decimal threshold before export.
- Auditability: Retain original
idandgeometry_typecolumns for traceability during Automated Attribute Transformation & ETL Workflows.
Validate outputs against the OGC GeoPackage Specification or PostGIS ST_IsValid() functions prior to deployment. This guarantees interoperability across municipal, state, and federal GIS platforms.
Conclusion Jump to heading
Flattening deeply nested GeoJSON feature collections safely requires strict configuration boundaries, deterministic type coercion, and explicit edge-case handling. By isolating geometry precision management from attribute traversal and enforcing CI-ready error capture, spatial data teams eliminate schema drift and geometry corruption. Deploy the provided engine as a standardized preprocessing step to guarantee downstream database compatibility and long-term pipeline resilience.