Caching your requests to Nominatim with geopy to avoid timeouts using dynamodb

TL;DR; get the source code for this at my GitHub gist and use for your projects:

Github Gist

The Problem

Nominatim is a free way to turn address strings into latitude and longitude points for mapping.  This is really useful if you want to draw addresses on a map or something.  If you are like me you will just look up the example code and run with it till you realize something.  There is a terms of use and if you query the same address multiple times you will get temporarily blocked.  The terms of use state that you should cache the results.

The Plan

Here I’m going to show you how to do that with DynamoDB.  DynamoDB is a sql-less database that is supported on AWS.  It is simple to setup and use and is free for low volume use.  It is a great alternative to MongoDB and I plan on learning it and using it in future projects.  Before you get started you need to setup your local environment for using the AWS api.  Here is a link for how to do it: AWS API Quickstart.  Once you have that working I can talk you through how to make this work.

Create the Table

The first thing we are going to do is create a table.  For my hashed primary key I am specifying a string I’m calling ‘query’ that I will explain later.  This string is how I will look up results in the future as it is a representation of the string that I will send to Nominatim later.  You only have to specify the attributes that are also keys here as the database is schema-less.

Push first to database

To push a record to the database you use the function put_item.  It has one important parameter, Item, that you set to the dictionary that you want to push to the database.  It will create a new item or override any existing item where the key matches.

One thing I want to point out is that the floating point format the database accepts is only decimal.Decimal.  That is not the normal floating point value type in python (float) and is not what is returned by our geolocator for latitude and longitude.  There is a second complication, if you try to just initialize them as decimal.Decimal sometimes you will get an exception about storing an inexact decimal.  This is nastiness that results from python’s handling of floating point numbers and is part of why decimal.Decimal exists.  To get around this I just cast my float as a string and then initialize the decimal.Decimal with that string.  That will never cause it to be Inexact.  The downside is you loose a bit of precision doing this.  This is not a concern here as it is enough precision for what is returned by geocode.

You will notice that I’m doing some weird stuff with the query to address string.  This is something you might modify for you application but I did this as I am using this query as a filename for one of my projects and I hate spaces or comas in filenames.  Other than that this is rather simple.

Getting a record from the database

Here I show how I retrieve a record from the database.  When the item comes back from the database it is a json formatted string so we just use json.dumps to convert it back into a dictionary.   There is one other thing, DecimalEncoder.  This is used to help convert the decimal.Decimal objects back into either floats or ints depending on if they have a fractional part.

Putting it all together

Here we put it all together with two functions, get_address and get_query that are the real external interface to this short library.  These functions handle testing if the query string is in the database and if it is going and getting it from there.  If not it does the request out to Nominatim and stores the result in the database and gives you back your result.

 

And that is it folks.  Check out the gist, copy and paste it into your project and solve this annoying problem for good.

 

Leave a Reply

Your email address will not be published. Required fields are marked *