Some hypotheses that may help to measure how complete OpenStreetMap is

25 06 2008

These hypotheses below will hopefully help in measuring the completeness of OpenStreetMap. If anyone has any other ideas or comments please do let me know.

  • A complete map will have all of the roads in an area, but we cannot obtain stats on the amount of road in every area we may wish to test in OpenStreetMap. So an accurate way to predict the length of road in any area would be useful. My hypothesis is that the length of road in a given area is dependent upon population density. That is to say I expect that urban areas will have less road per person than rural areas.
  • For areas where OSM has aerial imagery I would like to compare the complexity (ie file size) of the Yahoo! jpeg and the corresponding OSM tiles. The hypothesis is that areas with very small aerial jpeg files (because they are simply one colour like the sea or vast expanses of desert) will have few if any entries on OSM, whereas areas with large file sizes (cities) will have a large density of nodes and ways in OSM and therefore large tile size. I do not have the technical knowledge to test this so any help would be great.
  • Another hypothesis is that more complete areas of OSM will have a higher level of edit activity. If no-one has ever edited an area then it may be unlikely that the map is complete there, obviously however there may just be nothing there, so this test could be used in conjunction with the Yahoo! Imagery test stated before. If we could produce some sort of heat map showing which areas are edited most frequently and monitor it over time this could certainly show us some interesting trends.
  • This is an attempt to solve the problem of missing roads. I would think it unlikely that there would a road which is completely cut off from others, or that there would be an entire settlement of roads not connected to rest of the country’s road network (as in the Madiera example shown below again). The hypothesis is that every road is connected to at least one other road of equal or higher classification. So if in OSM there are roads that are not, then maybe there are missing roads. This testing may require a lot of calculations and may not return that many missing roads. If some one can think of a way to do it simpy then I would very much like that input.

The Stages of Completeness

25 06 2008

First we must think about what it is important for a map to have, a complete map will obviously contain all of those important features. The features necessary may differ from user to user though, tourists may be interested in the location of landmarks whereas those travelling into work every day need accurate road maps with all turn restrictions and road names etc. In general I think named accurate roads are the most important feature so a lot of my analysis will be to do with the length of roads present in OpenStreetMap. That is not to say that POIs such as restaurants, post boxes etc are not important it is just that for these to be placed well we need a complete mapped road network. With this is mind I have developed a way in which we can follow the progress of the completeness of an area on OSM, using a stage system.

  • Preliminary stage – GPS tracks gathered or area has yahoo imagery.

  • Stage 1 – Nodes and ways mapped onto OSM using GPS track or aerial imagery.

  • Stage 2 – All roads named and roughly categorised

  • Stage 3 – Map good enough for satellite navigation. All one-way streets, and restrictions tagged along with accurate street categorisation.

  • Stage 4 – All POIs (i.e. post boxes, bus stops, pubs, restaurants, supermarkets etc.) tagged.

These stages may or may not occur in sequential order and each stage can be quoted complete in terms of percentages. For example we might say that London is 100% complete for Preliminary and stages 1 and 2, but only 60% stage 3 complete and 20% stage 4 complete. The hard question is how do we accurately measure these percentages. It is easy for a human to tell that the map of London below is more complete than that of Madeira with its limited amount of roads and dead ends but its a lot harder for a computer.

Central London, a complete map Madeira, an incomplete map