OSM data hacking for fun and profit - Part 1
There are many visulizations online that rely on maps, and there are many mapping solutions available. For simple large area maps such as those for countries/states, d3 with topojson/geojson will generally suffice. For other applications requiring more detailed street level mapping, OSM (Open Street Map) is pretty much the only game in town.
In this series of articles, we will be looking at how to extract features from OSM and displaying them on a map with Leaflet. In particular, we will be looking at extracting railways around San Francisco and the associated stations.
There are several ways of getting the exact data we want (especially since the Planet data is over 617GB uncompressed). If you just want a porton of a city, then the export tab on the main OSM website will probably do the trick. In my (limited) experience, the export tab is not very reliable, and frequently you cannot get the complete data needed (the zoom level often gets too large and the site gives you an “area too large” error).
A better way of getting city level data is to use Metro Extract. It is a service run by Mapzen that gives us a weekly extract of all major metropolitan areas in various formats for quick and painless download. For a big area like California, metro level download is your best bet, since the OSM XML bzipped file for the bay area is ~165MB (or about 2.5G uncompressed), and a similar file for California is ~900MB (which gets to a point of being unwieldy).
The next level up from city is state/country/region, which can all be easily downloaded from the Geofabrik-maintained distribution. They have a nice interface which tells you how large the files are before you download.
To see an updated list of OSM data sources, please refer to the Download Data section of OSM Wiki.
Before moving on to data loading and processing. It is good to talk about the things you will need in order to get started. For the purposes of this article, we will be using Postgresql + Postgis. (It is also very useful to install pgrouting extension for Postgresql as well).
For data loading, you will also need to install osm2postgresql (optionally osm2pgrouting)
On a Mac, it should be as simple as running the following commands (using Homebrew):
Optionally, I also recommend a good graphical SQL client. I personally use Postico, which is an execellent native Mac client that does the basics very well (executing SQL queries and seeing the results without lag). The demo version should work well until you need the full version features.
Your experience may vary. Google is probably your best bet in finding solutions to any issues that may arise.
Once you have gotten the data you want for the project you are doing, let’s see how we go about loading the data into Postgresql.
First, we need to create the database that we will be using for this guide, and then go into the database:
Then we will run a few commands to install the extensions we will be using in this database:
At this point, the databases should be all set up and ready to go.
Next up, we need to get some data from the afromentioned sources. Note, if you are using a mac,
.pbf (compresssed) format support is not compiled into the tool at the time of the writing. You will want to download either
.osm or where available
Once you have the
.osm file and ready to go, it is a simple matter of running
After it finishes running (~500s on a late 2013 MBP), you should get the following set of tables:
Now that we have our data in a easily queryable format, it is time to have some fun and see what we can get out of it.
One of the things I was interested in was how to get a map of public transportation.
To get started, I generally first head to the official osm map, and click around in the relevant area. In this case, I am after a station on the BART line (since it is easier to search for stations than a stretch of track). After looking around, I settled on Powell Street station:
Let’s see how we would go about finding this particular station in the database. If we execute the following query:
We should get the following results:
On a side note, if you tried to
select *, then you will see that the OSM table is in fact quite sparse. There are a lot of fields that are
null and only a few are filled in. This is one of the reasons why a graphical SQL client is useful.
Another reason why the official osm map is helpful is because it gives you all the attributes that are set for this particular object. It gives you a sense of what can be queried if you were trying to figure out how to query for something.
Referring to the table above, we see that for the actual BART station, there is one useful attribute
railway. As a result, if we query for
where railway = 'station', we can get a list of all the station that is in the
OSM table structure
At this point, it is useful to know a bit about how the data in OSM is organized. The core tables that we will need to know are the following:
Mostly line features, we will be extracting rail lines from this table
As the name suggests, this one contains features that can be described by a point. However, because some stations are bigger and/or more detailed than other stations (aka not fully described by a point), just selecting from this table is not enough as bigger stations will be in
This table contains shape features, such as buildings, larger stations, etc.
This table is not used for our purposes. As far as I can tell, it contains all the roads in the region selected.
If you will notice, we did not discuss other tables such as
planet_osm_rels. According to the wiki, these tables are intermediate tables used in the construction of the tables listed above. From my experimentation, it seems that they do contain the same data, albeit in a more concise format (less sparse). It’s fun to take a look if you have time.
To Be Continued…
That’s it for today! Join me next week for part 2, where we will talk about how to extract the data into GeoJSON format and put it on a map.