NoSQL Geospatial with MongoDB
A quick intro to Vector Processing using native MongoDB capabilities
MongoDB is a popular NoSQL database — No SQL being that we aren’t seeing the traditional relationship table structure we may be used too, and thus do not write flavors of SQL to query the data. MongoDB stores data in a JSON structure — a “table” in MongoDB is called a Collection and the “Rows” in the Table are Documents. This popular database even comes with a bit of Geospatial capabilities which we’ll touch on in this article.
For my fellow experienced geospatial enthusiasts who are accustomed to spatial databases like PostGIS, it is worth noting that MongoDB’s geospatial capabilities are relatively limited in comparison. Specifically, its geospatial querying abilities are restricted to vector data. However, I encourage you to peruse this article and explore potential opportunities where this technology could be utilized to your advantage. I found the No SQL style is easy to get development started and especially beneficial if you have a lightweight Geospatial Application with minimal complexity— I’d consider it in situations where perhaps setting up a relational DB like PostGIS may be overkill.
Why MongoDB?
MongoDB offers several advantages that make it an excellent choice for GIS applications. Its flexible document-based data model allows for seamless handling of complex geospatial data structures, while its distributed architecture ensures scalability and high availability. Additionally, MongoDB’s powerful indexing capabilities and robust query language make it a versatile tool for efficient geospatial querying.
GIS Data formats
MongoDB utilizes the BSON data structure, which is highly compatible with the JSON data structure. Consequently, storing vector spatial data types becomes remarkably effortless. Arguably, the optimal format to employ with MongoDB is GeoJSON, encompassing all vector types such as points, lines, and polygons. Nonetheless, it can be as straightforward as storing legacy coordinates per record, consisting of latitude and longitude fields. It’s worth noting that while MongoDB does have the capacity to store raster data, it lacks built-in functionalities for geospatial querying of raster data. Thus, for the purpose of this article, we will solely focus on vector datasets.
Geospatial Indexing
MongoDB’s geospatial indexing enables the storage and retrieval of geospatial data in an optimized manner. The database provides support for two types of geospatial indexes: 2D indexes for flat Earth models and 2D-sphere indexes for spherical Earth models. These indexes support a wide range of geometric operations, such as point-in-polygon, distance queries, and spatial joins.
IMO I’d say keep MongoDB in the back of your mind if your overall goal is simple spatial capacity with easy flexible data storage.
Geospatial Queries
MongoDB’s query language, combined with geospatial indexing, empowers developers to perform geospatial queries with ease. Spatial operators like $geoWithin
, $geoIntersects
, and $near
allow for precise filtering and retrieval of data based on proximity, containment, or intersection with our data. These queries can be further combined with traditional MongoDB queries to incorporate attribute-based filters so you really get a flexible toolbox.
Example Finding the 5 closest points to me Lat, Lon
from pymongo import MongoClient
# Connect to the MongoDB server
client = MongoClient('mongodb://localhost:30000')
# Access the database and collection
db = client['johns_db']
collection = db['points_collection']
# User-defined latitude and longitude
latitude = 30.2672
longitude = -97.7431
# Create the query to find the five nearest points
query = {
"location": {
"$near": {
"$geometry": {
"type": "Point",
"coordinates": [longitude, latitude]
}
}
}
}
# Limit the results to the five nearest points
projection = {
"name": 1,
"location": 1
}
# Execute the query and find the five nearest points
results = collection.find(query, projection).limit(5)
# Print the results
print("Five nearest points:")
for point in results:
print(f"Name: {point['name']}, Coordinates: {point['location']['coordinates']}")
This query utilizes the $near
operator in MongoDB, which finds documents based on proximity to a specified point. It retrieves the five nearest points to my defined latitude and longitude, along with attributes I specify in the projection.
Geospatial Aggregations
MongoDB also leverages a data transformation method they call Aggregation Pipelines and for my data engineers out there it’s incredibly powerful. Currently one Geospatial Aggregation exists in MongoDB $geoNear
it’s considered an aggregation stage in MongoDB that enables geospatial querying and proximity-based analysis within an aggregation pipeline. By incorporating $geoNear
in an aggregation pipeline, developers gain the ability to combine geospatial querying with other powerful aggregation stages, unlocking advanced data transformations and analysis.
Following the example using $near
we’ll perform a $geoNear
aggregation (Note: This is too basic a use case for$geoNear
so consider aggregation pipelines when you want to string together transformation and query capabilities)
from pymongo import MongoClient
# Connect to the MongoDB server
client = MongoClient('mongodb://localhost:30000')
# Access the database and collection
db = client['johns_db']
collection = db['points_collection']
# User-defined latitude and longitude
latitude = 30.2672
longitude = -97.7431
# Create the aggregation pipeline
pipeline = [
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [user_longitude, user_latitude]
},
"distanceField": "distance",
"limit": 5,
"spherical": True
}
}
]
# Execute the aggregation pipeline
results = collection.aggregate(pipeline)
# Print the results
print("Five nearest points:")
for point in results:
print(f"Name: {point['name']}, Coordinates: {point['location']['coordinates']}, Distance: {point['distance']}")
Credit where it’s due: