Elasticsearch can find documents based on how close they are to a specific point, but it’s not just about distance; it’s about how that distance is calculated and how you can tune it for performance.
Let’s say you’re building a real estate app and want to show properties within 5 kilometers of a user’s current location.
GET /properties/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "5km",
"pin.location": {
"lat": 40.7128,
"lon": -74.0060
}
}
}
}
}
}
This query tells Elasticsearch: "Give me all properties (match_all) that are within 5 kilometers of latitude 40.7128, longitude -74.0060." The pin.location field is where your property’s coordinates are stored, and it must be mapped as a geo_point type in your index.
The geo_distance filter is super efficient because it’s a filter. Filters are cached and don’t contribute to the relevance score, making them ideal for yes/no criteria like "is this within range?" Elasticsearch uses a technique called "geohashing" or "quadtrees" internally to quickly prune documents that are definitely outside the specified radius, significantly speeding up searches.
You can control the units of distance: km (kilometers), m (meters), mi (miles), yd (yards), ft (feet), in (inches), nm (nautical miles), nmi (nautical miles), snmi (speed nautical miles), yd (yards), ft (feet), in (inches), mi (miles).
But what if you don’t want to specify a single point? You can also search for points within a bounding box defined by two corner coordinates.
GET /properties/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"pin.location": {
"top_left": {
"lat": 40.73,
"lon": -74.01
},
"bottom_right": {
"lat": 40.70,
"lon": -73.99
}
}
}
}
}
}
}
This query finds all properties whose pin.location falls within the rectangle defined by the top_left and bottom_right coordinates. This is often faster than geo_distance for large areas because it’s a simpler geometric check, but it doesn’t guarantee a uniform distance from a central point.
The real power comes when you combine these with other queries. For example, you might want to find apartments (a type field) within 10km of a park.
GET /properties/_search
{
"query": {
"bool": {
"must": {
"term": {
"type.keyword": "apartment"
}
},
"filter": {
"geo_distance": {
"distance": "10km",
"pin.location": {
"lat": 40.7128,
"lon": -74.0060
}
}
}
}
}
}
Here, term on type.keyword filters for documents where the type field is exactly "apartment," and geo_distance ensures they are within the specified range. The must clause indicates that both conditions must be met.
A common gotcha is the precision of your geohashing. Elasticsearch uses a default precision that’s usually fine, but for very fine-grained searches or very large datasets, you might encounter edge cases where documents just outside your radius appear or vice-versa due to the discretization. You can tune the index.mapping.geo_shape.precision_factor setting at index creation time, but this is an advanced optimization.
The geo_distance query has a distance_type parameter, which defaults to "arc". This means it calculates the shortest distance on the surface of a sphere (like Earth). If you set it to "plane", it uses a simpler Euclidean distance calculation, which is faster but less accurate for larger distances or near the poles. For most applications, "arc" is the correct choice.
When you’re dealing with large numbers of geo-points or complex shapes, consider using the geo_shape query. It supports more advanced spatial relationships like intersects, within, contains, and disjoint, allowing you to query against polygons, lines, and multipoints, not just single points. This is crucial for finding data within a specific administrative boundary or near a river.
The real "aha!" moment is realizing that Elasticsearch doesn’t actually store raw latitude and longitude. It converts them into a specialized internal representation (often a geohash or a similar spatial index) that allows for incredibly fast spatial lookups. This conversion happens during indexing, so your search queries are lightning-fast.
The next step is exploring how to sort results by distance, which involves using a _geo_distance sort script.