Geospatial Data Pipeline

Expert in processing, optimizing, and visualizing geospatial data at scale.

When to Use

✅ Use for:

Drone imagery processing and annotation
GPS track analysis and visualization
Location-based search (find nearby X)
Map tile generation for web/mobile
Coordinate system transformations
Geofencing and spatial queries
GeoJSON optimization for web

❌ NOT for:

Simple address validation (use address APIs)
Basic distance calculations (use Haversine formula)
Static map embeds (use Mapbox Static API)
Geocoding (use Nominatim or Google Geocoding API)

Technology Selection

Database: PostGIS vs MongoDB Geospatial

Feature PostGIS MongoDB

Spatial indexes GiST, SP-GiST 2dsphere

Query language SQL + spatial functions Aggregation pipeline

Geometry types 20+ (full OGC support) Basic (Point, Line, Polygon)

Coordinate systems 6000+ via EPSG WGS84 only

Performance (10M points) <100ms <200ms

Best for Complex spatial analysis Document-centric apps

Timeline:

2005: PostGIS 1.0 released
2012: MongoDB adds geospatial indexes
2020: PostGIS 3.0 with improved performance
2024: PostGIS remains gold standard for GIS workloads

Common Anti-Patterns

Anti-Pattern 1: Storing Coordinates as Strings

Novice thinking: "I'll just store lat/lon as text, it's simple"

Problem: Can't use spatial indexes, queries are slow, no validation.

Wrong approach:

// ❌ String storage, no spatial features interface Location { id: string; name: string; latitude: string; // "37.7749" longitude: string; // "-122.4194" }

// Linear scan for "nearby" queries async function findNearby(lat: string, lon: string): Promise<Location[]> { const all = await db.locations.findAll();

return all.filter(loc => { const distance = calculateDistance( parseFloat(lat), parseFloat(lon), parseFloat(loc.latitude), parseFloat(loc.longitude) ); return distance < 5000; // 5km }); }

Why wrong: O(N) linear scan, no spatial index, string parsing overhead.

Correct approach:

// ✅ PostGIS GEOGRAPHY type with spatial index CREATE TABLE locations ( id SERIAL PRIMARY KEY, name VARCHAR(255), location GEOGRAPHY(POINT, 4326) -- WGS84 coordinates );

-- Spatial index (GiST) CREATE INDEX idx_locations_geography ON locations USING GIST(location);

-- TypeScript query async function findNearby(lat: number, lon: number, radiusMeters: number): Promise<Location[]> { const query = SELECT id, name, ST_AsGeoJSON(location) as geojson FROM locations WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography, $3 ) ORDER BY location <-> ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography LIMIT 100 ;

return db.query(query, [lon, lat, radiusMeters]); // <10ms with index }

Timeline context:

2000s: Stored lat/lon as FLOAT columns, did math in app code
2010s: PostGIS adoption, spatial indexes
2024: GEOGRAPHY type handles Earth curvature automatically

Anti-Pattern 2: Not Using Spatial Indexes

Problem: Proximity queries do full table scans.

Wrong approach:

-- ❌ No index, sequential scan CREATE TABLE drone_images ( id SERIAL PRIMARY KEY, image_url VARCHAR(255), location GEOGRAPHY(POINT, 4326) );

-- This query scans ALL rows SELECT * FROM drone_images WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, 1000 -- 1km );

EXPLAIN output: Seq Scan on drone_images (cost=0.00..1234.56 rows=1 width=123)

Correct approach:

-- ✅ GiST index for spatial queries CREATE INDEX idx_drone_images_location ON drone_images USING GIST(location);

-- Same query, now uses index SELECT * FROM drone_images WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, 1000 );

EXPLAIN output: Bitmap Index Scan on idx_drone_images_location (cost=4.30..78.30 rows=50 width=123)

Performance impact: 10M points, 5km radius query

Without index: 3.2 seconds (full scan)
With GiST index: 12ms (99.6% faster)

Anti-Pattern 3: Mixing Coordinate Systems

Novice thinking: "Coordinates are just numbers, I can mix them"

Problem: Incorrect distances, misaligned map features.

Wrong approach:

// ❌ Mixing EPSG:4326 (WGS84) and EPSG:3857 (Web Mercator) const userLocation = { lat: 37.7749, // WGS84 lon: -122.4194 };

const droneImage = { x: -13634876, // Web Mercator (EPSG:3857) y: 4545684 };

// Comparing apples to oranges! const distance = Math.sqrt( Math.pow(userLocation.lon - droneImage.x, 2) + Math.pow(userLocation.lat - droneImage.y, 2) );

Result: Wildly incorrect distance (millions of "units").

Correct approach:

-- ✅ Transform to common coordinate system SELECT ST_Distance( ST_Transform( ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326), -- WGS84 3857 -- Transform to Web Mercator ), ST_SetSRID(ST_MakePoint(-13634876, 4545684), 3857) -- Already Web Mercator ) AS distance_meters;

Or better: Always store in one system (WGS84), transform on display only.

Timeline:

2005: Web Mercator (EPSG:3857) introduced by Google Maps
2010: Confusion peaks as apps mix WGS84 data with Web Mercator tiles
2024: Best practice: Store WGS84, transform to 3857 only for tile rendering

Anti-Pattern 4: Loading Huge GeoJSON Files

Problem: 50MB GeoJSON file crashes browser.

Wrong approach:

// ❌ Load entire file into memory const geoJson = await fetch('/drone-survey-data.geojson').then(r => r.json());

// 50MB of GeoJSON = browser freeze map.addSource('drone-data', { type: 'geojson', data: geoJson // All 10,000 polygons loaded at once });

Correct approach 1: Vector tiles (pre-chunked)

// ✅ Serve as vector tiles (MBTiles or PMTiles) map.addSource('drone-data', { type: 'vector', tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'], minzoom: 10, maxzoom: 18 });

// Browser only loads visible tiles

Correct approach 2: GeoJSON simplification + chunking

Simplify geometry (reduce points)

npm install -g @mapbox/geojson-precision geojson-precision -p 5 input.geojson output.geojson

Split into tiles

npm install -g geojson-vt

Generate tiles programmatically (see scripts/tile_generator.ts)

Correct approach 3: Server-side filtering

// ✅ Only fetch visible bounds async function fetchVisibleFeatures(bounds: Bounds): Promise<GeoJSON> { const response = await fetch( /api/features?bbox=${bounds.west},${bounds.south},${bounds.east},${bounds.north} ); return response.json(); }

map.on('moveend', async () => { const bounds = map.getBounds(); const geojson = await fetchVisibleFeatures(bounds); map.getSource('dynamic-data').setData(geojson); });

Anti-Pattern 5: Euclidean Distance on Spherical Earth

Novice thinking: "Distance is just Pythagorean theorem"

Problem: Incorrect at scale, worse near poles.

Wrong approach:

// ❌ Flat Earth distance (wrong!) function distanceKm(lat1: number, lon1: number, lat2: number, lon2: number): number { const dx = lon2 - lon1; const dy = lat2 - lat1;

return Math.sqrt(dx * dx + dy * dy) * 111.32; // 111.32 km/degree (WRONG) }

// Example: San Francisco to New York const distance = distanceKm(37.7749, -122.4194, 40.7128, -74.0060); // Returns: ~55 km (WRONG! Actual: ~4,130 km)

Why wrong: Earth is a sphere, not a flat plane.

Correct approach 1: Haversine formula (great circle distance)

// ✅ Haversine formula (spherical Earth) function haversineKm(lat1: number, lon1: number, lat2: number, lon2: number): number { const R = 6371; // Earth radius in km

const dLat = toRadians(lat2 - lat1); const dLon = toRadians(lon2 - lon1);

const a = Math.sin(dLat / 2) * Math.sin(dLat / 2) + Math.cos(toRadians(lat1)) * Math.cos(toRadians(lat2)) * Math.sin(dLon / 2) * Math.sin(dLon / 2);

const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));

return R * c; }

// San Francisco to New York const distance = haversineKm(37.7749, -122.4194, 40.7128, -74.0060); // Returns: ~4,130 km ✅

Correct approach 2: PostGIS (handles curvature automatically)

-- ✅ PostGIS ST_Distance with GEOGRAPHY SELECT ST_Distance( ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)::geography ) / 1000 AS distance_km; -- Returns: 4130.137 km ✅

Accuracy comparison:

Method SF to NYC Error

Euclidean (flat) 55 km 98.7% wrong

Haversine (sphere) 4,130 km ✅ Correct

PostGIS (ellipsoid) 4,135 km Most accurate

Production Checklist

□ PostGIS extension installed and spatial indexes created □ All coordinates stored in consistent SRID (recommend: 4326) □ GeoJSON files optimized (<1MB) or served as vector tiles □ Coordinate transformations use ST_Transform, not manual math □ Distance calculations use ST_Distance with GEOGRAPHY type □ Bounding box queries use ST_MakeEnvelope + ST_Intersects □ Large geometries chunked (not >100KB per feature) □ Map tiles pre-generated for common zoom levels □ CORS configured for tile servers □ Rate limiting on geocoding/reverse geocoding endpoints

When to Use vs Avoid

Scenario Appropriate?

Drone imagery annotation and search ✅ Yes - process survey data

GPS track visualization ✅ Yes - optimize paths

Find nearest coffee shops ✅ Yes - spatial queries

Jurisdiction boundary lookups ✅ Yes - point-in-polygon

Simple address autocomplete ❌ No - use Mapbox/Google

Embed static map on page ❌ No - use Static API

Geocode single address ❌ No - use geocoding API

References

/references/coordinate-systems.md
EPSG codes, transformations, Web Mercator vs WGS84
/references/postgis-guide.md
PostGIS setup, spatial indexes, common queries
/references/geojson-optimization.md
Simplification, chunking, vector tiles

Scripts

scripts/geospatial_processor.ts
Process drone imagery, GPS tracks, GeoJSON validation
scripts/tile_generator.ts
Generate vector tiles (MBTiles/PMTiles) from GeoJSON

geospatial-data-pipeline

Safety Notice

Copy this and send it to your AI assistant to learn

Simplify geometry (reduce points)

Split into tiles

Generate tiles programmatically (see scripts/tile_generator.ts)

Source Transparency

Related Skills

geospatial-data-pipeline

video-processing-editing

interior-design-expert

tech-entrepreneur-coach-adhd