Purpose

The location service will used to get a universal location id for any given location string, it can be used to look up places by name or postcode or zipcode.
Geocoding

Geocoding will take any string and return an object containing that locations longitude and latitude (as well as some other location specific information) this will include zip codes and postcodes as well as full addresses or simply place names.
Reverse Geocoding

The process of taking a longitude and latitude and providing a human readable location.

Implementation

Our location service will be powered by the OSM Planet file. We will build and maintain our own version of this file and an additional API will sit in frontend of it. For the purposes of this service you will assume a working version of this file and accompanying APIs already exists.
Longitude & Latitude

Unless otherwise explicitly stated locations are always store long, lat (easting before northing) they should where possible be stored in appropriate geographic types (such as a Point in postgres) at the very least they should be stored as double.
Nominatim

This is the Open Street Map API that interacts with their planet file, this is what we’ll use to perform place searches. Searches will come into our API, they’ll passed along to nominatim, the results will be formatted and returned. Every time we retrieve a place from the nominatim API we will process and store a copy of its details in our local database.

If we already have a the osm_id in our local database, we should not store a new copy but update the stored one. When a new location is added to the local database we should emit the event location.created when a location is updated we should emit location.updated with the hash of the location created or updated.

Providers

Locations will be stored in our local database with a provider. Providers can be one of the following: OSM, LATLONG

Each location will store a provider_id with the job details, the value of this id should be determined by its associated provider:

OMS provider locations will use the osm_id value returned on the response for the location search from Open Street Map API
LATLONG provider locations will use a concatination of the lat & long values for a returned valid location.

by default we should always attempt to set a location with a provider of OMS, and should only use another provider if we are unable to retrieve the osm_id value for the location

Query Types

When passing queries to the /search endpoint of Nominatim, there are a few different query types.

?q=<string> used for a free text to convert a place into a location object.
?postalcode=<postcode> a search for a given postcode (or zip code)
?city=<city> a search for a specific city

All endpoints can also be passed an limit= parameter. country= should also be added where possible to help make results more accurate. You should also pass extrafields=1 so the additional data section below can fetch needed information.

Additional Data

When saving a location we retrieved from nominatim, we should also check for some additional fields in the reply and save them as meta data against the location.

Population, should be saved with the key population and the given value.
Importance, should be saved with the key importance and the given value.
Capital, should be saved with the key capital_city and a value of 1
OSM Type, if an OSM type doesn't match one of ours (place, city, county, country, town, postcode, state) then we should add a meta filed called provider_type to the meta data for that location.

⚡ The full docs for Nominatim can be found here https://nominatim.org/release-docs/

Meilisearch

We will maintain indexes in Meilisearch to be used for suggesting location as a user types. As well as for some other context based searching. The two index we’ll have will be,

locations which will contain a JSON version of our normal location object. (The centroid value should be encode as a geo_point when saving to the index. See here.
postcodes this will contain only location entries that are marked with type potcode and will only store in the index hash postcode and country

A command line tool should be written to clear out either, or both indexes and replace all the values with the current values in our local table. When the index is refreshed we should emit the event location.indexes.updated with the name of the index updated.

Features

These are the actual endpoints we’ll expose to our consumers.

Place Search

This is our most generic search. It will be /search you will pass it a query ?q= and it will perform an open search on the provider. You should also be able to pass it a limit= to the limit the results. It should also be able to receive an optional &country= value to limit results to a known country.

All places returned should be added to our local DB (if they haven't been already). If a type doesn't match one of our types default to place as a value and store the original as the OSM Type meta field above.

Autosuggest

Auto suggest is an attempt at suggesting place names, there requests will not pass to our provide and instead will perform a lookup directly on our meilisearch instance. We should return the results form meilisearch verbatim. The endpoint for this search will be /suggest?q=

An option value of &country= should be passable, when given it should use a filter on the meilisearch index to limit results to the given country.

Postcode Lookup

This endpoint will only lookup postcodes, it will be on /postcode it will also receive a query ?q= and it will search for postcodes only using our postcodes index on meilisearch. It should returns the results verbatim.

An option value of &country= should be passable, when given it should use a filter on the meilisearch index to limit results to the given country.

Reverse

The reverse endpoint will receive a longitude and latitude and look them up to convert them to a location. It will receive long=<double and lat= it may also receive a country= to limit results to a given country.

Fetch

This endpoint will receive a location hash and return that location. It will be on the endpoint /location/{location_hash}

Batch

The batch endpoint will take a file hash then fetch and parse that file into locations. Upon receiving the file hash, it will create a batch job in our batch jobs table with the status scheduled and return the hash of that batch job. There will be an additional endpoint to fetch the status of a batch job and then a further endpoint to fetch the results from a batch job, this endpoint should be paginated and limited to 500 records per page.

Once a batch job has been added to the batch jobs table, it should also dispatch a job to our queue, that job should fetch the file from our files service, update the status of the batch job to processing and then for each row in our file add a another job to our queue to process that line, such that every line in a CSV or JSON file we fetch gets its own job. When all the jobs are complete we should update the status of the batch job in our database to complete . When each job has parsed its row, it should update the completed or failed values for the batch job, it should also add an entry to the job_record table with the id of the location and the id of the batch job.

The endpoint to fetch results for a batch job should only return results when the status of the batch job is complete

When a batch job is created we should trigger the event location.batch.scheduled with the batch job hash and the file hash in the payload. When an event starts it should trigger the location.batch.started event with the batch job hash and file hash in the payload. When a row job parses a location it should trigger the event location.batch.tick with the batch job hash and the location_hash of the location it created (of fetched), if a parse failed it should trigger the location.batch.error with the batch job hash and a reason as the payload. The reason should just be row failed . When the batch has finished processing it should dispatch the event location.batch.completed with the batch job hash and file hash in the payload. If a job totally fails it should dispatch the event location.batch.failed with the batch job hash and file hash in the payload. All events should also have a timestamp.

We will only parse files in CSV and JSON format. The CSV should contain a single column, with no header, with the values to lookup. The JSON should be a JSON array with one value per row of the array. If a job can’t be processed you should update the status of the batch job to failed

There should be a command line tool that can also receive the hash of a file and will perform the same actions as the endpoint (without the fetching the results).

There should be a command line tool to show batch jobs, we should be able too filter by status.

There should be a command line tool to cancel a batch job.

⚠️ The Batch Job / Event queue for this service should be different (with a different connection!) to the one used to emit events to our events service. You should assume it will have access to its own local RabbitMQ instance.

Location Search Event

Anytime we perform a lookup for a location (except when fetching by location hash) we should emit the location.search event. This event will be a little more in depth than other events in our architecture. It should have the following universal payload.

{
    "query": "string" # The query used for the lookup
    "hit": "bool" # Was the location already in our database. 
    "hash": "string" # Optional. Location hash if found.
    "type": "string" # One of `place` `postcode` `autosuggest` `reverse` 
    "results": "int" # Count of the number of results found
}

Where a search goes straight to our meilisearch we should count the number of results for the results value, where a search goes to our provider to lookup we should count the number of results for the results value.

When a search goes to meilisearch we should consider empty results sets as a miss and set the hit value to false. Everywhere else if we have to create the location in our local db, we should consider that a miss and set hit to false

If the search is a reverse lookup the query should be the longitude and latitude used.

When auto suggesting each lookup should trigger its own events.

Locations Specification