I should have made a post about this a while ago, but I didn’t want a half complete post, and the scope of my project kept expanding!
Part 1: Scraping
I found two huge repositories of old digitised maps of Australia, many of which are in the public domain. The National Library of Australia and Parish Maps from the Department of Lands NSW. Unfortunately they didn’t really have a nice documented RESTfull API for the use of the images and metadata. My first step was to extract as much information as I could and convert it into an intermediate format. Most of my code and documentation for doing this is at https://github.com/andrewharvey/govscrape in those two respective folders. Unfortunately it’s not as easy as running one command from my repo to download and parse all the data. My goal was to get the data to my machine, not write a robust system that anyone could run to get a clone of the nla and pmap repositories.
Part 2: Georeferencing
It would be great if I could push out an easy to use API for the data I collected from the scrape stage, but I don’t have the resources (let me know if you are willing to help out with server resources to host these old public domain maps). Even without a nice interface to the data, I could still play around with it and to see what use I could make of it. I dabbled into using these maps as a source of data for OpenStreetMap. I only got through a few of the maps, I put this on hold as I figured it would be easier (especially for others) to do this if they were georeferenced. I tried out both http://warper.geothings.net/ and QuantumGIS, but both had way to much lagging. So I rolled out my own solution which was just a bunch of scripts which used Inkscape and a hacked libchamplain demo as the GUI. The code and documentation for this is at https://github.com/andrewharvey/georeferencing-scripts.
The georeferencing data that I have made so far (it’s a big task!) is at https://github.com/andrewharvey/georeferencing-data.
Part 3: Sharing
From the data and code from the last step, I’m able to push out these old maps in several formats. I used gdalwarp to convert the maps into Transverse Mercator (well actually I don’t really know what they are, but this seems to work), from here I can use gdal2tiles.py (…finally understanding the difference between OSM Slippy map tilesnames and the OGC TMS… take note that gdal2tiles.py produces TMS format tiles which differs from OSM style as it has the y axis going bottom to top, see http://groups.google.com/group/maptiler/browse_thread/thread/aa89fc726b8f7261/8bdc39d7829cc80c) to push out an OSM slippy map like tile directory, I can push out a KML GroundOverlay, or you could probably use a WMS server to push it out through WMS. I really wanted to leave it open.
I would post a Google Earth one too, but its too much effort to get a free background in there for the screenshot. I’m not convinced that this display of the data is user friendly. Having control of the transparency of the overlay is a must. Maybe one day, someone will crop out all the non-map parts of the parish maps so we can get a single whole of NSW parish map slippy map.
I suppose now I need to focus on the infrastructure. It should be really easy for a user to browse the available maps and view them either as a KML, an OpenLayers overlay. I should also plug this into the meta-data I scraped and have stored in CSV like files.
The problem I have with distribution right now is that many of the maps need warping and that means I need to host the warped image somewhere. Some could probably be georeferenced from their source image using just translate, scale and rotate, and hence should be able to use the source image from the government server to serve the georeferenced imagery. But the work flow I’ve set up so far, relies on using gdalwarp, and hence having access to the warped image.
I made this image to help my understand a slippy map tile naming system for maps that don’t always point North (it turns out nothing changes in terms of the tile numbering, you just have to apply additional view space transformations). This is how Nearmap does it, and when I get around to putting up http://osm.kyblsoft.cz/3dmapa/ like tiles of Sydney I will endeavor to use the same system (in other words you put your code to view the other views in your map viewer application, rather than just change the tile numbering so that you can use existing code for all the views). It doesn’t really change anything here if we have square or non-square tiles, the tile numbers and true coordinates don’t change because of this. Keep in mind that all the points of tile z/y/x for any of these views will be the geographic location, the views just have a different view space translation.
A little while back I sent the RTA an email to try to claify the copyright license of some of their data so I could determine if I could use it. Like this data feed, http://livetraffic.rta.nsw.gov.au/data/traffic-cam.json. The link to the copyright license is broken. This is the response I got,
The RTA supports and encourages open access information, and is determined to grow its range of traffic resources to provide developers with access to live updates and traveller information feeds.
Request for licensing agreements are assessed against the following considerations:
- Consumer benefit
- Legal constraints
- Road safety
- Technical capacity
- Availability of data
If you would like to apply for a license agreement for the RTA’s Live Traffic content, please submit a proposal for your service and how you would like to use the RTA’s content, including the above considerations. Your proposal will then be assessed by the RTA.
What a load of garbage, if you trully “support and encourage open access information” then you would release these data feeds under a public domain like license. Saying “we will make a case by case decision once you tell us your proposal” is just going to hinder the innovative use of that data. People will create their own datasets independent of you, which of course this could go either way. The peoples dataset could be either better or worse quality than yours, but either way you would save people some work if you helped out by releasing your data under a free and open license. You are a government department, not a company.
I’ve taken some time to look at the NearMap licenses (Community License, Free Commercial License) more closely. Here is some of my commentary of them. I’ll use the terminology used in the licences so please excuse me for using their language and jargon. When I say free I mean free as in “free software”, “free culture” and “free as in freedom”. Also this is just my interpretation from reading them, I am not a lawyer. Hopefully I’ve interpreted them as NearMap intended.
The licenses create a distinction between a PhotoMap or modified PhotoMap, and a work made using information derived or observed from the PhotoMaps. So they separate out works like vector information about roads derived or observed from their PhotoMap from other derived works (what they call modified works) like someone photoshoping fake roads into the imagery. My preconceptions of the Australian Copyright Act were that both of these things are “derived works” which are subject to the copyright of the original work, but despite this I really like the separation made in NearMap’s licenses here. I do think there is a difference and whatever their motives, they are essentially saying “information is free” so you must distribute any information you observe under the CC BY-SA license. Of course if information and facts were really free then the copyright law would say so (and you wouldn’t have to be in doubt due to the Telstra phonebook cases) and you wouldn’t need to add your defence of this freedom via the SA clause… to quote Nina Paley,
“ShareAlike is an imperfect solution to copyright restrictions, as it imposes one restriction of its own: a restriction against imposing any further restrictions. It’s an attempt to use copyright against itself. As long as we live in a world wherein everything is copyrighted by default, I will use ShareAlike or some other Copyleft equivalent to attempt to maintain a “copyright-free zone” around my works. In a better world, there would be no automatic copyright and thus no need for me to use any license at all. Should that Utopia come about, I will remove all licenses from all my work. Meanwhile I attempt to limit other peoples’ freedom to limit other peoples’ freedom.”
Another important observation that I previously overlooked is the fact that,
“You will own all Derived Works that you create. However, you may only distribute Derived Works to others on the terms of a Creative Commons Attribution Share Alike (CC-BY-SA) licence (and you may use any version of that licence you wish, whether localised for a particular country or not). For example, you may Use the Licensed PhotoMaps or Modified PhotoMaps to obtain information which you can then use, under the Creative Commons Attribution Share Alike (CC-BY-SA) licence, to populate or update community street mapping projects.”
Previously I had thought that any derived works that came from NearMap PhotoMaps used in OpenStreetMap needed to be attributed to NearMap. I guess I just incorrectly thought that all the information was CC BY-SA licensed by NearMap, but that is not the case. These works actually need to be attributed to the user who observed that information and turned it into a work by, for example, adding it into the OSM database. The works do not need to be attributed in any way to NearMap.1 This also means that any copyright that arises from any creativity in deciding what to trace, and any copyright that arises from the tracing being a derivative work can be treated as CC BY-SA licensed by that person or user. That person is the copyright holder but they are only allowed to distribute the work under a CC BY-SA license. This is a good thing! I’m glad that NearMap have not chosen to change the wording to make it compatible with the public-domain-like OpenStreetMap contributor terms, as (unless of course one has some other license from NearMap) it guarantees that this information remains free2.
The discussion and use of Yahoo imagery as a source of tracing for OpenStreetMap was before my time, but from http://wiki.openstreetmap.org/wiki/Yahoo it seems that strong legal foundations are lacking. In this respect I feel much safer tracing from NearMap. I know that my contributions can be licensed under the free CC BY-SA license and nothing can be done to unfree these works. Whereas the legalities of Yahoo imagery is, at least from my reading, very questionable and potentially a huge problem in the future.
Another reason I took a closer look at the licenses was to make sure that the works I posted earlier which were modified PhotoMaps complied with all licensing requirements, both the NearMap and OpenStreetMap requirements. I have come to the conclusion that unfortunately I am most likely unable to satisfy both the Mapnik share-alike requirement and the NearMap share-alike requirement in coexistence.
The NearMap “free community licence” gives me the “right to use, copy, modify and distribute our PhotoMaps”. The distribution to others clause says that I must give NearMap attribution for the distribution of any original or modified PhotoMaps, however the license also says “You may sublicense your rights to the Licensed PhotoMaps, Modified PhotoMaps or APIs to others on the same terms as this licence or our free commercial licence.” I interpret this as if I modify a PhotoMap I need release it under the “NearMap free community license”, that is it is share-alike.
On the other hand though, I also used the default OpenStreetMap Mapnik-style map images. My understanding is that like the data used to create these maps, the actual map images are copyrighted by all the OpenStreetMap contributors and released under the CC BY-SA license. The share-alike means that any derivative works (like overlaying NearMap terrain maps) must the released under a CC BY-SA compatible license, so you cannot impose non-commercial or non-government on it. However although the NearMap free community licence plus the NearMap free commercial license almost allow anyone to use or modify the work they don’t meet CC BY-SA because they exclude government and exclude commercial use made in a “competing manner” and use that is “material to their business”. This leaves me to believe that I cannot legally distribute any work that is a mash-up of OSM data/maps and NearMap PhotoMaps. Unless of course that it is only the default Mapnik tiles that are CC BY-SA, and that anyone can copyright map images made from OSM data. Because NearMap uses OSM data to create Mapnik tiles using their own map style. I assume then that it is only the OSM data that is CC BY-SA and someone is free to make a non-free map using their own style from this data. Then they would own the copyright to that map and hence you would be free to combine this with NearMap’s PhotoMaps and release the product under their free community license. This could also explain why NearMap can overlay their transparent tiles based on OSM data over their non CC BY-SA imagery.
It is a shame, but I can totally understand NearMap restricting use of their PhotoMaps in a specific field of endeavor, namely the government. The government is central to their current revenue stream, without it they probably could not produce the volume of work they currently do under their almost free, community license. It is almost CC BY-SA, except they exclude three fields of endeavor, “Competing Manner”, “Material to their business” and “Government Entities that use our PhotoMaps for their own governmental purposes”. The first two exclusions make the PhotoMaps near CC BY-NC-SA, but the last clause means they cannot be compatible with any of the Creative Commons licenses.
Let me use the example case of distribution of original NearMap PhotoMaps. For instance say I download a bunch of imagery tiles and distribute them through BitTorrent, the key question here is do I need to enforce that this distribution is to non-government entities. If I am only allowed to distribute it to non-government, I cannot do that, so the freedoms that the license grants are not as broad as I thought. If on the other hand if I can distribute the works to government entities along with the free community license as a LICENSE file, but leave the responsibility and liability on the government to not use the works I make available, then this would be much better. Hopefully the latter is the case. This was almost touched on here, but which party the liability lies on was not mentioned.
This is why I hate reading all this legal jargon, every word is important but has different interpretations. Code on the other hand has just one interpretation, and that is defined in the compiler… Anyway, at first I thought this termination clause meant NearMap could terminate the license grant at any time, however I missed the words “if the other party breaches this licence”. I view this to mean that NearMap cannot terminate the license grant unless you breach the license. But even if such a case arose, derived information is safe. So NearMap can do nothing to prevent the CC BY-SA distribution of derived information. Although it appears all the other parts of the license grant can be subject to this clause.
1 However I still think that one should attribute NearMap regardless. In the OSM case, attribution using the source tag should be done for other reasons as well; like so people know where the data came from, hinting some clues of the quality of the data.
2 Without turning this into a copyleft v permissive debate, I think that CC BY-SA is the most common license in use (there is a whole OSM database under it), so to avoid incompatibilities with other works under different share-alike licenses, CC BY-SA is probably the best choice at the moment. Some people say CC BY-SA is not for data, this is true, but that doesn’t stop the Australian government releasing data under the license, but more to the point, CC BY-SA has a clause that allows derivative works to also be licensed under a “Creative Commons Compatible License” that is listed at http://creativecommons.org/compatiblelicenses. One would hope that if someone creates an attribution share-alike license suited for data that would protect that data in jurisdictions that copyright data, and leave it free to use without restriction in jurisdictions that don’t copyright data (so don’t try to force attribution share-alike in these jurisdictions with contracts), Creative Commons could add this as a Creative Commons Compatible License, and all works like the current CC BY-SA OSM database could be relicensed under that data attribution share-alike license.