S2 Geometry Style Guide
Overview
S2 is Google's library for spherical geometry and spatial indexing, used internally for Google Maps, Google Earth, and countless geo-aware services. It solves the fundamental problem: how do you efficiently index and query locations on a sphere?
The library was developed at Google and open-sourced in 2017. It powers proximity search, geofencing, spatial joins, and geographic sharding at planetary scale.
Core Philosophy
"The Earth is not flat. Your spatial index shouldn't pretend it is."
"A good spatial index turns geometric queries into range queries on integers."
"Locality in space should mean locality in your index."
S2's insight: project the sphere onto a cube, fill each face with a space-filling Hilbert curve, and encode positions as 64-bit integers. Nearby points get nearby integers. Geometric queries become range scans.
Design Principles
Spherical First: No projection distortion. Work on the actual sphere.
Hierarchical Decomposition: 30 levels from ~85km² cells down to ~1cm² cells.
Locality Preservation: Hilbert curves ensure nearby points have nearby cell IDs.
Integer Encoding: Any cell is a single 64-bit integer. Hierarchy is in the bits.
Exact Predicates: Robust geometric operations that don't fail on edge cases.
When Writing Code
Always
-
Think in cells, not coordinates
-
Choose the right cell level for your precision needs
-
Use region coverings for irregular shapes
-
Leverage the hierarchical nature for multi-resolution queries
-
Remember: cell ID ordering preserves spatial locality
Never
-
Use lat/lng bounding boxes for spherical queries (they distort near poles)
-
Ignore the antimeridian (longitude ±180°)
-
Assume cells are square (they're quadrilaterals on a sphere)
-
Store raw coordinates when you could store cell IDs
-
Forget that level 30 is ~1cm², level 12 is ~3km²
Prefer
-
Cell IDs over lat/lng pairs for storage and indexing
-
Region coverings over point-in-polygon for containment
-
S2 distance calculations over Haversine (more accurate)
-
Hierarchical queries (coarse to fine) for performance
-
Cell unions for representing complex regions
The S2 Cell Hierarchy
Level Cell Size (edge) Use Case
─────────────────────────────────────────────────────
0 ~7,842 km Continental regions
4 ~490 km Country-scale
8 ~31 km Metro area
12 ~1.9 km Neighborhood
14 ~477 m City block
16 ~119 m Building cluster
18 ~30 m Building footprint
20 ~7.4 m Room-scale
24 ~0.46 m Sub-meter precision
30 ~0.7 cm Maximum precision
Code Patterns
Basic Cell Operations
import s2sphere # Python wrapper
Point to cell at specific level
lat, lng = 37.7749, -122.4194 # San Francisco point = s2sphere.LatLng.from_degrees(lat, lng) cell_id = s2sphere.CellId.from_lat_lng(point)
Get cell at level 16 (~119m cells)
cell_level_16 = cell_id.parent(16) print(f"Cell ID: {cell_level_16.id()}") # 64-bit integer
Cell properties
cell = s2sphere.Cell(cell_level_16) print(f"Level: {cell.level()}") print(f"Area: {cell.exact_area()} steradians")
Get neighbors
neighbors = [cell_level_16.get_edge_neighbors()]
Parent/child traversal
parent = cell_level_16.parent() # Level 15 children = [cell_level_16.child(i) for i in range(4)] # 4 children
The 64-bit Cell ID Encoding
The cell ID encodes the entire hierarchy in its bits
Bit layout (simplified):
- 3 bits: face (0-5, which cube face)
- 2 bits per level: quadrant within parent (0-3)
- 1 bit: sentinel marking the end
This means:
- Parent cell ID is a prefix of child cell ID
- Range queries on cell IDs = spatial range queries
- Sorting by cell ID clusters nearby cells together
cell_id = s2sphere.CellId.from_lat_lng( s2sphere.LatLng.from_degrees(37.7749, -122.4194) )
The magic: all descendants share a prefix
parent = cell_id.parent(12) range_min = parent.range_min() # Smallest descendant range_max = parent.range_max() # Largest descendant
"Find all points in this region" becomes:
SELECT * FROM locations
WHERE cell_id >= range_min AND cell_id <= range_max
Region Covering Algorithm
The killer feature: approximate any region with a set of cells
at varying levels, optimizing for minimal cells
from s2sphere import RegionCoverer, LatLngRect, LatLng
Define a region (e.g., bounding rectangle)
rect = LatLngRect( LatLng.from_degrees(37.7, -122.5), # SW corner LatLng.from_degrees(37.8, -122.4) # NE corner )
Configure the coverer
coverer = RegionCoverer() coverer.min_level = 8 # Don't go coarser than level 8 coverer.max_level = 16 # Don't go finer than level 16 coverer.max_cells = 20 # Use at most 20 cells
Get the covering
covering = coverer.get_covering(rect)
Result: a set of cells at different levels that
tightly approximate the region
for cell_id in covering: print(f"Level {cell_id.level()}: {cell_id.id()}")
These cell IDs can be used for:
- Geofencing (is point in any of these cells?)
- Spatial joins (do cell sets overlap?)
- Sharding (route requests to shard owning the cell)
Proximity Search Pattern
"Find all restaurants within 500m of me"
def find_nearby(center_lat, center_lng, radius_meters, max_results=100): """ S2 approach to proximity search: 1. Create a cap (spherical circle) around the point 2. Get a covering of that cap 3. Query the index for all covered cells 4. Post-filter by exact distance """ from s2sphere import Cap, LatLng, CellId, RegionCoverer import math
# Earth radius in meters
EARTH_RADIUS = 6371000
# Create center point
center = LatLng.from_degrees(center_lat, center_lng)
# Create a spherical cap (circle on sphere)
# Cap is defined by axis (center) and chord angle
angle = radius_meters / EARTH_RADIUS # radians
cap = Cap.from_axis_angle(
center.to_point(),
s1.Angle.from_radians(angle)
)
# Get covering cells
coverer = RegionCoverer()
coverer.max_cells = 8 # Balance: fewer cells = more false positives
covering = coverer.get_covering(cap)
# Query database for each cell range
candidates = []
for cell_id in covering:
# This is the key insight: spatial query → range query
range_min = cell_id.range_min().id()
range_max = cell_id.range_max().id()
# SELECT * FROM places WHERE cell_id BETWEEN range_min AND range_max
candidates.extend(db.query_range(range_min, range_max))
# Post-filter by exact distance (covering may include some outside radius)
results = []
for place in candidates:
dist = calculate_distance(center_lat, center_lng, place.lat, place.lng)
if dist <= radius_meters:
results.append((place, dist))
# Sort by distance, return top N
results.sort(key=lambda x: x[1])
return results[:max_results]
Geofencing with Cell Unions
"Is this user inside our delivery zone?"
class DeliveryZone: def init(self, polygon_coords, name): """ Pre-compute a cell covering for the delivery zone. Containment checks become cell ID lookups. """ self.name = name
# Build S2 polygon from coordinates
points = [LatLng.from_degrees(lat, lng) for lat, lng in polygon_coords]
loop = S2Loop(points)
polygon = S2Polygon(loop)
# Cover the polygon with cells
coverer = RegionCoverer()
coverer.max_cells = 100 # More cells = tighter fit
self.covering = coverer.get_covering(polygon)
# Store as sorted list for binary search
self.cell_ids = sorted([c.id() for c in self.covering])
def contains(self, lat, lng):
"""
Fast containment check:
1. Get cell ID for point
2. Check if any ancestor cell is in our covering
"""
point_cell = CellId.from_lat_lng(LatLng.from_degrees(lat, lng))
# Check this cell and all its ancestors
cell = point_cell
while cell.is_valid():
# Binary search in our covering
if self._cell_in_covering(cell.id()):
return True
cell = cell.parent()
return False
def _cell_in_covering(self, cell_id):
# Binary search for cell_id in sorted covering
import bisect
idx = bisect.bisect_left(self.cell_ids, cell_id)
return idx < len(self.cell_ids) and self.cell_ids[idx] == cell_id
Geographic Sharding
Partition data across shards by geography
class GeoShardRouter: """ Route requests to shards based on S2 cell ownership. Each shard owns a contiguous range of cell IDs. """
def __init__(self, num_shards, replication_factor=3):
self.num_shards = num_shards
# Divide the full cell ID space among shards
# Cell IDs range from 0 to 2^64-1
# Hilbert curve ensures geographic locality within ranges
self.shard_boundaries = []
step = (2**64) // num_shards
for i in range(num_shards):
self.shard_boundaries.append(i * step)
def get_shard(self, lat, lng, level=16):
"""Get the primary shard for a location."""
cell_id = CellId.from_lat_lng(
LatLng.from_degrees(lat, lng)
).parent(level).id()
# Binary search for owning shard
import bisect
shard = bisect.bisect_right(self.shard_boundaries, cell_id) - 1
return shard
def get_shards_for_region(self, covering):
"""
For a region query, return all shards that might have data.
This is why coverings are powerful: bounded shard fan-out.
"""
shards = set()
for cell_id in covering:
# Add shards for entire cell range
min_shard = self.get_shard_by_id(cell_id.range_min().id())
max_shard = self.get_shard_by_id(cell_id.range_max().id())
for s in range(min_shard, max_shard + 1):
shards.add(s)
return shards
Hilbert Curve Intuition
Why Hilbert curves? They preserve locality better than alternatives.
Z-order (Morton) curve: Hilbert curve: ┌───┬───┐ ┌───┬───┐ │ 0 │ 1 │ │ 0 │ 1 │ ├───┼───┤ ├───┼───┤ │ 2 │ 3 │ │ 3 │ 2 │ ← Notice: 2 and 3 are adjacent └───┴───┘ └───┴───┘
Z-order: cells 1 and 2 are sequential but not adjacent spatially. Hilbert: sequential cells are always spatially adjacent.
This matters because:
- Range queries on cell IDs return spatially coherent results
- Cache locality is preserved
- Fewer disk seeks for spatial scans
Mental Model
Think of S2 as giving every location on Earth a sort key where:
-
Nearby locations have similar sort keys (Hilbert property)
-
Containment is prefix matching (hierarchy property)
-
Resolution is adjustable (level selection)
-
Regions are cell sets (covering algorithm)
When you need to answer "what's near X?" or "what's inside Y?", you're really asking about ranges and sets of these sort keys.
Common Mistakes
Wrong Level Selection
BAD: Using level 30 (1cm) for city-scale queries
cell = CellId.from_lat_lng(point).parent(30) # Way too precise
GOOD: Match level to your use case
Ride-sharing pickup: level 16-18 (~30-120m)
Neighborhood search: level 12-14 (~500m-2km)
City-scale analytics: level 8-10 (~10-40km)
cell = CellId.from_lat_lng(point).parent(16)
Ignoring Covering Size
BAD: Unlimited cells = slow queries
coverer.max_cells = 10000 # Will fan out to too many ranges
GOOD: Balance precision vs. query cost
coverer.max_cells = 8 # For proximity search coverer.max_cells = 50 # For geofencing coverer.max_cells = 200 # For precise region analytics
Forgetting Post-Filtering
BAD: Assuming covering is exact
results = query_covering(covering) # May include points outside region
GOOD: Always post-filter for exactness
results = query_covering(covering) results = [r for r in results if region.contains(r.point)]
S2 Debugging Questions
When your spatial queries aren't working:
-
What level am I using? Is it appropriate for my precision needs?
-
How many cells in my covering? Too few = false positives. Too many = slow.
-
Am I handling the antimeridian? Regions crossing ±180° need special care.
-
Am I post-filtering? Coverings are approximate by design.
-
Is my data indexed at the right level? Query level should match index level.
References
-
S2 Geometry Library — Official documentation
-
S2 Cells Overview — Cell hierarchy details
-
Google Open Source Blog: Announcing S2
-
Hilbert Curves and S2
-
S2 at Foursquare — Production usage patterns
-
Uber H3 — Alternative (hexagonal) for comparison