Refining urban area highlighting: the previous method didn’t work around the World!

18 07 2008

So the previous method with the lovely flowchart and everything proved to be great for London but when I tried to use exactly the same method in New York it took far too much urban area away leaving us with a rather dull image when it should be very bright to indicate the NYC metropolis. Take a look at the difference between New York and London! So I essentially went back to the drawing board and thought, “okay what are the main features to a LandSat image?” answer: urban areas, rural areas and water! So that means I need to distinguish between urban and rural and then distinguish between urban and water, then somehow combine the two. Thats exactly what I did.

The method is very similar to before, except I’m now using both hue and saturation from the 742 false colour image instead of simply hue. The pretty flowchart for the process is shown below and for those you you who thought the last one was a mess wait ’til you see this one!

The method essentially works due to the different properties of hue and saturation. There is a great difference between the hue value for urban and rural areas, therefore hue is used to distinguish between rural and urban. Here is the hue component of the 742 image with a colour threshold to highlight urban areas (notice that that water is also heavily highlighted).Now we apply a colour threshold to the saturation layer and take advantage of the great difference in value for land and sea here. So now we have two layers one highlighting urban and sea and one highlighting urban and rural, we need to pick out the urban. This is done by using a clever layer mode algorithm in GIMP called ‘darken only’. This compares the pixel value in the two layers (hue and saturation) and displays the lower value pixel, in this way water and rural areas are removed from the image. Result!

Here are the resulting finished images for London and New York.

You can compare the previous method and the new one here:

Old London
Old New York
New London
New New York

By the way if anyone is really interested in this stuff and would like more guidance or some high resolution images just get in touch.





Highlighting urban areas in LandSat

17 07 2008

The hypothesis I’m working on at the moment is that there should be more OpenStreetMap data (i.e. more nodes, more ways etc) in urban areas. To find out where these urban areas are I’m using freely available LandSat 7 data downloaded from http://www.landcover.org/data/landsat/. The data comes in the form of 8 different greyscale images corresponding to 8 different spectral bands ranging from visible blue light (0.45-0.52 µm) to thermal IR (10.40-12.5 µm).  Using bands 1,2,3 corresponding to blue, green and red a ‘true’ colour image of an area can be built, however this is not the best combination to use to highlight urban areas it turns out that the best combination is 7,4,2 for red, green and blue. Head over to my page on using LandSat imagery in GIMP to find a tutorial on all this.

What I’ve been working on over the past few days is how to use the false colour image I’ve produced using bands 7,4 and 2 to highlight the urban areas.Below is the 742 false colour image, and we can see that urban areas appear quite brown and purple, the aim is to extract that information and make everything else invisible. The problem is that some of the sea to the East is a similar sort of colour to the cities.

I decided the best way to extract the urban areas from  the image above would be to decompose the image into hue, saturation and value. The hue is the most interesting component, giving a good contrast between rural and urban areas, but as we can see there is little difference in colour still between some sea areas and some cities which could prove a problem in extracting just urban areas.

The next step is to apply a colour threshold on this image to try to pick out only urban areas and black out everything else. After a great deal of playing around with filters the urban threshold here appears to be within light levels 196-212, after applying this filter the image below is obtained.

As we can see the urban areas are nicely highlighted in white and everything else is black. Now the aim is to compare the brightness of every pixel in this image with OSM data. We know the coordinates of the image above and we kno that each pixel is (30m x 30m) so this should be easy enough using a simple bounding box query of the OSM database. Lets hope that there is a relationship now between OSM data and the brightness of the pixels above. I’ll be back with any results when I have them and then hopefully be rolling this out across the world.

Click on the image below for a nice flowchart of the whole process.





Have we hit the gem that will improve OpenStreetMap’s completeness

15 07 2008

OpenStreetBugs (http://openstreetbugs.appspot.com) is a wonderful application created by Xavier in Rennes, France which allows us to tag areas in OpenStreetMap which are incorrect, incoherent or incomplete. Its an idea I’ve been talking about since my project began, little did I know that this little gem was around, created about a month ago as can be seen from Xavier’s blog. If we all start to use this app I’m sure it will prove invaluable in our quest to make OSM complete.

Users can annotate OSM in pretty much anyway they want to, others can then come along and discuss the notes. Or even open Potlatch centered on a node or download a GPX dump of the visible nodes to load in JOSM (or in your GPS). With recent updates you can even obtain an RSS geo-feed for an area that you choose and monitor changes.

If OSB does become popular then it can become the one-stop shop for inaccuracies in OSM. People will be able to check how many flags there are in a particular areas and from that get an idea of how complete the map is in that area. It will also act as a guide for avid mappers who don’t know where to start next, just pick the area with lots of incomplete tags. Moreover we can encourage people to participate in the OpenStreetMap project just by visiting OSB and tagging everything they know is incomplete. That way people who feel a little bit frightened by the whole concept of going out mapping can still help improve the map. Perfect!





New project: Lets use LandSat

10 07 2008

Not sure about the licensing requirements for my beautiful Qgis model (which shows roughly which areas of the UK are complete and which incomplete) so I don’t think I can publish the full results from that yet. Essentially though I have found that there is a strong correlation between the length of road in a particular area and the population in that area, its good enough to accurately predict the length of road there should be in an area and compare that to OSM road length. But enough about that project for now, hopefully soon I will publish all my findings.

So instead its onwards and upwards to a new project which was inspired by the view of the Osmarender layer on OpenStreetMap, shown below. It is clear to see that there are vast areas in Asia and South America, where there is no OpenStreetMap data, the question is whether there is actually nothing there or OSM is just missing cities, roads etc. I plan to find out using open source aerial imagery.

The plan is to use LandSat and other forms of freey available imagery to work out where there should be cities and roads and where there shouldn’t be. Then take this information and compare to OSM. Easier said than done, I’m sure but it wouldn’t be a project if it wasn’t challenging. So below is a LandSat image of London and to the North, which I have applied a Yellow Contrast Gradient Map to, using GIMP. As you can see it emphases cities and rural areas quite well and I’m sure this is the starting point to predicting accurately where there should be OSM data.





Qgis Problem Solved

7 07 2008

I’ve moved on to using the models I generated with Local Authority DfT and ONS statistics to predict which areas of the country are complete and incomplete. I used the model on Lower Layer Super Output Areas (LLSOAs) which have an average population of 1500 and variable areas. Using Qgis I wanted to create a ratio of OSM road length in an area divided by the length of road the model predicts. In this way areas that have a value of 1 are complete, values under 1 show that there is not as many roads in OSM as I would predict and are therefore incomplete. Unfortunately there are a few areas with values larger than 1, indicating that the model is under predicting the road length. After extracting OSM road length for every boundary using Qgis I used OpenOffice spreadsheet to apply the model to every boundary, the problem came in re-importing the data to the shapefile for use in displaying heat maps in Qgis. The only way my colleague could find, involved a serious amount of hacking and command line stuff, which I am not very fluent in. Luckily with a bit of a search I found this solution.

All of the attribute data for the shapefile (i.e. all the data apart from the coordinates) is contained in a dBASE (.dbf) database file. Now if you attempt to open this file up as a database file in OpenOffice, ie by right clicking on it and make you open with OpenOffice Base then it will load up in OpenOffice spreadsheet in the correct file format. Select Unicode (UTF – 8 ) as the character set in the pop-up window and you’re good to go. You can manipulate the data in whatever way you like just be sure to save it as the same file name in the same file format. Then when you import the shapefile to Qgis it will contain all of your new attributes and you can make some new fancy heat maps as I intend to do.

Pretty soon I hope to have some nice image outputs from Qgis  which will show which areas of the UK are complete and which incomplete. Keep checking back for that and ask as many questions as you can muster, I’m always interested to hear from readers.





3D graphs rock! – a more technical way to help with OSM completeness

26 06 2008

So I’ve been working with ONS and DfT stats for the past few days and come pretty close to insanity with the statistics before realising that the simplest models give the nicest results.

Have a look at these 3D graphs I’ve been working with. They show us nice and simply that the amount of roads in a boundary, depend on the land area of that boundary and on the population within the boundary. Kinda straight forward and what you might expect, but its good to get some concrete results.

Now onwards and upwards I plan to use these results to predict how many roads there should be within any boundary on OpenStreetMap and then compare that results to how many there actually are on OSM. If any one has any simpler ideas on measuring completeness, then let me know!





Some hypotheses that may help to measure how complete OpenStreetMap is

25 06 2008

These hypotheses below will hopefully help in measuring the completeness of OpenStreetMap. If anyone has any other ideas or comments please do let me know.

  • A complete map will have all of the roads in an area, but we cannot obtain stats on the amount of road in every area we may wish to test in OpenStreetMap. So an accurate way to predict the length of road in any area would be useful. My hypothesis is that the length of road in a given area is dependent upon population density. That is to say I expect that urban areas will have less road per person than rural areas.
  • For areas where OSM has aerial imagery I would like to compare the complexity (ie file size) of the Yahoo! jpeg and the corresponding OSM tiles. The hypothesis is that areas with very small aerial jpeg files (because they are simply one colour like the sea or vast expanses of desert) will have few if any entries on OSM, whereas areas with large file sizes (cities) will have a large density of nodes and ways in OSM and therefore large tile size. I do not have the technical knowledge to test this so any help would be great.
  • Another hypothesis is that more complete areas of OSM will have a higher level of edit activity. If no-one has ever edited an area then it may be unlikely that the map is complete there, obviously however there may just be nothing there, so this test could be used in conjunction with the Yahoo! Imagery test stated before. If we could produce some sort of heat map showing which areas are edited most frequently and monitor it over time this could certainly show us some interesting trends.
  • This is an attempt to solve the problem of missing roads. I would think it unlikely that there would a road which is completely cut off from others, or that there would be an entire settlement of roads not connected to rest of the country’s road network (as in the Madiera example shown below again). The hypothesis is that every road is connected to at least one other road of equal or higher classification. So if in OSM there are roads that are not, then maybe there are missing roads. This testing may require a lot of calculations and may not return that many missing roads. If some one can think of a way to do it simpy then I would very much like that input.