Letter to the ABS Re Census DVD License

June 24, 2012 3 comments
Categories: Uncategorized Tags: , ,

Recent code projects

May 14, 2012 1 comment

Since taking a break from OSM/FOSM[1] I’ve found myself with more time to work on other projects. You can see this by my jump in recent activity on github[2].

I’m pleased with my code to load the ABS ASGS into PostgreSQL/PostGIS[3] and that will prove a critical component of any further work I do with ABS data in the future. At the moment I’m writing scripts to load ABS 2011 Census data into PostgreSQL[4] (but I haven’t pushed it yet). I’m using the sample datapacks provided by the ABS which means I should have it done by June 21 when the actual data is release. So with very minimal work and time I should have the real data loaded not long after it’s release.

I think http://www.hackdays.com/knowwhereyoulive/postcodes/view/2000 is an excellent site, yet the creators say “The biggest hurdle we faced was the format of the data available.”[5]. I hope that my asgs2pgsql and abs2pgsql (census loader) tools will help future developers built great front-ends, and not have to worry about getting access to the data in a nice format.

I’ve been playing around with the GeoJSON tile capabilities of TileStache (with Polymaps as the client) and look forward to using this in building a great front-end like the knowwhereyoulive site. I’ve put that on hold while I focus on the data.

I’ve also ordered the first DataPack DVD[6] from the ABS. I’ve been assured by the ABS that the DVD contents are licensed CC-BY and the license will be present on the DVD. Assuming all this holds true I should be able to release the DVDs as torrents.

I plan to attend the Sydney http://www.govhack.org/, it is just a pity that the event happens before the 2011 Census data is released. I’m not convinced that these hack days are for me…(I prefer to take as much time as needed to get quality, rather than rushing things out the door; also borrowing a laptop and setting up a live bootable debian USB with my environment and wifi drivers takes more time than the actual event! ) but at least they get people talking about open data.

I’ve also produced some nice rainfall intensity interactive graphs from the BOM radar images[7]. I wish I could show a demo of this, but the free and open license to the radar image isn’t present.

I also have some other data loaders [8], [9] which I plan to create nice web front ends for, but I’ll only get to that once I get my 2011 Census loader in better shape.

Finally I took a sidestep to clean up my NSW Parish Map scraper[10]. The code is much cleaner now compared to the mess it was before. I’m still short of a parish map georeferenced base map mosaic, but the unclear licensing of the maps puts this down the bottom of my priority list.

[1] https://andrewharvey4.wordpress.com/2012/04/04/the-sand-storm/
[2] https://github.com/andrewharvey/
[3] https://github.com/andrewharvey/asgs2pgsql
[4] https://github.com/andrewharvey/abs2pgsql
[5] http://www.hackdays.com/knowwhereyoulive/about/
[6] http://www.abs.gov.au/websitedbs/censushome.nsf/home/datapacksdetails?opendocument&navpos=250
[7] https://github.com/andrewharvey/bom-rainfall-radar-intensity-stats
[8] https://github.com/andrewharvey/aec2pgsql
[9] https://github.com/andrewharvey/ausgrid-data2pgsql
[10] https://github.com/andrewharvey/nsw-parishmaps-scraper

Categories: Uncategorized

The Australian Bureau of Statistics… open data but closed documentation for that data?

April 16, 2012 Leave a comment

Update: The HTML equivalent of the PDF is available at http://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/2901.0Main%20Features12011. Unlike the PDF, the HTML version is licensed under the Creative Commons Attribution 2.5 Australia license (and confirmed via email). I prefer the HTML version, I just hadn’t noticed it earlier. Thanks to the Client Support Officer from the ABS who got back to me with this information.

Just a day after I posted my ASGS to PostgreSQL loader and plans to integrate with the ABS data, I’ve already hit resistance.

Once piece of documentation the ABS released for the 2011 Census data at http://www.abs.gov.au/ausstats/abs@.nsf/Lookup/2901.0Main%20Features802011/$FILE/2011%20Census%20Dictionary%2027102011.pdf (ABS Catalogue No. 2901.0) contains the full copyright notice without any reference to it being licensed under a free and open license such as the Creative Commons Attribution license. This notice is an “unless otherwise noted exception” to the CC license on the ABS website.

I made an enquiry to the ABS about this and was told “Written permission is required from the ABS if you wish reproduce this publication or extracts from it.”

I’m disappointed that the ABS refuse to release their documentation for the 2011 Census data under a free license such as a Creative Commons license. It doesn’t really make sense to me why they would release the data under an open license, but not release the documentation under the same or equivalent license. This only hinders the availability of the documentation and would lead to data consumers needing to make guesses or oversights about data and as such would be more likely to misinterpret it.

Categories: Uncategorized

Loading the ASGS into PostgreSQL in preparation for the ABS 2011 census data release

April 15, 2012 Leave a comment

Over the extended Easter period I found myself with some extra free time, the result is https://github.com/andrewharvey/asgs2pgsql – A bunch of scripts for loading the ASGS into PostgreSQL/PostGIS, and a database dump of the final product.

The ASGS is the geospatial fabric for the ABS 2011 Census data. My idea was to put in place a stable PostgreSQL schema for the ASGS and put together a well defined process for loading data into that schema.

As a small example of using the data I wrote some carto/qgis stylesheets for the various ASGS structures. Source code is at https://github.com/andrewharvey/asgs-stylesheets with a live example at https://tianjara.net/leaflet.html#map=asgs-2011-mb which shows the ASGS Mesh Blocks coloured by the landuse assigned to that mesh block.

With this building block now in place, when the actual census data starts to be released in June 2012 I will hopefully be able to load it into a relational data model with references to ASGS geometries all in PostgreSQL (and PostGIS).

I’m not sure if I’ll need to choose between wasting time scraping data from the ABS website or I should go straight to the DVD

If the $100 is really just for the cost of the DVD+admin surely the ABS can put the entire DVD contents on its webserver, all under the Creative Commons Attribution license. If I do purchase the DVD I sure as hell would want to ensure it notes that its contents are CC-BY licensed.

I also am interested if the census data will also be available as datacubes.

Categories: Uncategorized Tags: ,

The sand storm

April 4, 2012 Leave a comment

The OSM license change seems to be hitting a climax. OSMF are meant to be culling the database and shifting to their new license about now.

NearMap are continuing down a different path now that they had been previous, seeming making it harder (and perhaps dropping the clause altogether..?) to derive data and publish it under the CC-BY-SA license.

Bing imagery traced data in OSM/FOSM is marching ahead.

I can’t seem to keep up with it and stay on my two feet any more.

I’m this close to putting my head in the sand and letting it all blow over and seeing how my sand castle of work I’ve put into OSM and FOSM stands after the storm.

Perhaps I should shift gears and focus on something completely different until then.

Categories: Uncategorized

Malicious attack or just being paranoid?

March 10, 2012 Leave a comment

So as of now when I download the document at http://www.commbank.com.au/personal/international/travel-money-card/default.aspx using,

wget –save-headers -U ‘Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322)’ –server-response ‘http://www.commbank.com.au/personal/international/travel-money-card/default.aspx’

from both IP address (140.168.75.39, 140.168.129.72) that get resolved, within that page I get a link to https://www.commbank.prepaidcardsupport.com/cbacustomer/html/LoginFrameTravel.html

Looks weird. The commbank linking to http://www.commbank.prepaidcardsupport.com? At first I thought I was been man in the middle’ed, so I tried retrieving this document from various vantage points in the Internet with the same results. So either it wasn’t a MIM or the MIM was happening at a point common between both vantage points (ie. the banks network, or the telstra network above the banks network).

So maybe this is legit? I checked the whois for prepaidcardsupport.com but it is registered by proxy (not a good sign) and its HTTPS certificate isn’t trusted by the default iceweasel install (again not a good sign).

Anyway this reinforced to me a big problem surrounding sites that think it is okay to not offer HTTPS for most of their site but switch to HTTPS just for parts of the site where you log in. This opens you up to man in the middle attacks against your plain HTTP pages allowing the attacker to replace the switch to HTTPS for areas that you log in with just plain HTTP (hence allowing further man in the middle attacks). — Of course this is ignoring the issue that current implementation of PKI using CA’s isn’t terrible secure at all.

Categories: Uncategorized

community + git = value

March 4, 2012 Leave a comment

This is why open source development and open collaboration in a community is great:

  1. Someone posts a question about a problem they have: http://groups.google.com/group/mapnik/browse_thread/thread/85ede4787e2dc32b/b87dc582e2cf1035
  2. I see this question find it interesting so I have a go at writing a solution. I release this freely and openly to anyone on github under a free software license (CC0): https://gist.github.com/1675606/eb39d06c948bae471fee902a3cb688f28cefc9da
  3. Original poster gets back to me thanking me and finding the solution I wrote useful.
  4. Someone else comes along and forks my code https://gist.github.com/1953554 adding some cool extra functionality to, building on my work to make something new and useful.
  5. We continue to build on the solution collaboratively https://gist.github.com/1675606/e8bfe1525478ada610ebc7f4d14eb433ed2866b1

None of this would have been possible without a platform to openly and freely communicate inside a community (1), free licensing and open sourcing of solutions allowing others to legally build upon others works (2), git and github a program and platform that allows one to publish derivative works that are visible to the original author but without needed permission or interaction with the original author (4, 5).

Albeit small, it is extremely rewarding to see this unfold upon my own work.

Categories: Uncategorized