ClickHouse’s H3 spatial indexing lets you query geographic data incredibly fast, but the real magic is that it doesn’t just give you nearest neighbors; it gives you hierarchical nearest neighbors.
Let’s see it in action. Imagine we have a table of sensor readings with latitude and longitude.
CREATE TABLE sensor_readings (
timestamp DateTime,
latitude Float64,
longitude Float64,
temperature Float32,
h3_index Int64 -- We'll populate this
) ENGINE = MergeTree()
ORDER BY timestamp;
First, we need to populate the h3_index column. You can do this with a one-off ALTER TABLE or during insertion if you have the H3 library integrated. For this example, let’s assume we’re populating it after the fact. We’ll use an H3 resolution of 10, which gives hexagons roughly 150-200 meters across.
-- Assuming you have the H3 functions available (e.g., via a ClickHouse plugin or UDF)
ALTER TABLE sensor_readings ADD COLUMN h3_index Int64;
-- Populate the h3_index column
UPDATE sensor_readings
SET h3_index = h3_geo_to_h3(latitude, longitude, 10);
Now, let’s say we want to find all sensor readings within a 5km radius of a specific point (latitude 34.0522, longitude -118.2437).
SELECT
timestamp,
latitude,
longitude,
temperature
FROM sensor_readings
WHERE h3_is_within(h3_index, h3_k_ring(h3_geo_to_h3(34.0522, -118.2437, 10), 3)); -- k=3 for resolution 10 covers ~5km
This query is lightning fast because h3_k_ring generates a set of H3 indexes that contain the desired geographic area. ClickHouse then efficiently filters rows where h3_index is in that set. The k value (here, 3) is directly related to the radius and resolution. For resolution 10, a k of 3 is a good approximation of a 5km radius.
The real power comes from the hierarchical nature. H3 indexes are organized in a grid system where higher resolutions are nested within lower resolutions. This means you can easily perform aggregation and spatial joins across different levels of detail.
Consider aggregating average temperature by H3 cell at resolution 8.
SELECT
h3_index_to_parent(h3_index, 8) AS h3_res8_index,
avg(temperature) AS avg_temperature
FROM sensor_readings
GROUP BY h3_res8_index;
This query efficiently groups readings by larger H3 hexagons (resolution 8) and calculates the average temperature within each. You can join this aggregated data with other datasets or perform further spatial analysis.
The mental model you need is one of nested hexagons. Each H3 index represents a specific hexagonal cell on the Earth’s surface. A higher resolution index (e.g., resolution 12) is located entirely within a single lower resolution index (e.g., resolution 11). This parent-child relationship is key. When you query a radius, you’re not just finding cells near the point; you’re finding cells that are within the geographic area defined by the radius. The h3_k_ring function is smart; it returns the minimal set of hexagons at a given resolution that fully enclose a radius around a central hexagon.
The one thing most people don’t realize is that the k parameter in h3_k_ring isn’t a direct distance in meters, but rather a number of "steps" or hexagons away from the center. The actual geographic distance covered by a k ring depends heavily on the H3 resolution you’re using. At lower resolutions (larger hexagons), a k of 1 might span several kilometers, while at higher resolutions (smaller hexagons), a k of 5 might still be within a kilometer. You need to know your resolution to accurately translate k into a geographic radius.
Understanding these H3 functions and their resolution-dependent behavior is crucial for accurate spatial querying and analysis in ClickHouse.