Today’s guest post is by Robert Murray, one of our developer team at Ordnance Survey. Last year he used OS OpenData to map some of the tweets Hampshire Constabulary sent during Operation Fortress.
Hampshire Constabulary has been running an operation to combat drug-related crime in Southampton called Operation Fortress and posted tweets relating to this operation with the hashtag #OpFortress. It was an effective method of showing progress and engaging with the public, the tweets sometimes gave advice, asked for help or reported operation updates such as arrests, raids or crimes – often with the location at which the event took place.
I thought it would be quite interesting to see if you could geo-locate the position of these events or crimes solely from the tweets. You could also get this data from the police crime data API data.police.uk but I thought it would be interesting to pull the data from Twitter instead. On the premise that this works in this project then you could geolocate tweets from any feed, just from the text and to street level in Britain.
Processing the tweets
The location data I looked for in each tweet was in the form of road and place names but I didn’t know how it would be structured, if any abbreviations were used or even if the spelling was correct. So, I applied a fuzzy text search algorithm against the unstructured data matching against a known list of road names and place names.
Although I was doing this in my own time, I of course decided to use Ordnance Survey Open Data products as my source data, including OS Locator which contains a gazetteer of road names and 1:50 000 Scale Gazetteer which has a list of place and feature names.
The datasets are available through the Open Data license, so I downloaded them and loaded them into a database ready for searching. Having read a few of the tweets and noticed that they generally contain names of roads at which events have occurred, I primarily used the OS Locator product to find tweets that contain road names, falling back to the gazetteer if not previously matched.
The process of matching tweets against the OS OpenData products involved sifting through words testing if they existed and resembled something in the database of road and place names and then extracting the location data. After finding all of the #OpFortress tweets from @HantsPolice. I was able write some code to do all this and found some standard libraries to access Twitter data, so this process can be run again and again, capturing new tweets and data.
Visualising the results
The output from the matching process can be formatted into something that can be parsed by OpenLayers and projected onto a map. The results are overlaid on top of the OS OpenSpace backdrop mapping service. Take a look at the map and let me know what you think, you can comment on the blog below.