I’ve managed to do a couple things all in one here. I’ve made use of some Geoscience Australia Creative Commons licensed material, in a nice little program with a web API, and I’ve aggregated some data from the myschool scraper and parser. Putting them all together gives some nice images like this.
The program for generating these images basically takes an SVG template file with placeholder markers and then fills these values based on the CGI parameters. The API is fairly simple so one should be able to work out how to use it from the example in the README file. Here are the files I used to make the graphs (and the svg versions as WordPress.com won’t let me upload them to here).
ps. This gets cut off when viewing it from the default web interface of this blog, use print preview or even better look at the RSS feed to see the cut off parts. Also I tried to ensure the accuracy of the data, but I cannot be 100% sure that there are no bugs, in fact there are discrepancies with the averages I get from my scrape of myschool and the averages provided in the report on the NPLAN website. The numbers I get seem to be consistent (ie. the state rankings seem mostly the same), but nonetheless not exactly the same as those reported in the report. Although I would be very surprised if all the numbers I got were exactly the same as in the report. I mainly did this to use map/graph code I wrote, so if you really care about how certain state averages compare in these tests look at the reports on the NPLAN website.
The lighter the colour the higher the number.
Following up from my previous post, I have made improvements to the code, and I now have all the NPLAN data too. There are also some data files so you don’t need to run the scraper and parser which hopefully this makes the data more usable and to a wider range of people. Now that I have the NPLAN data you can compare schools in terms of their (I assume the numbers are averages) test results. I was going to put in the repository some tables mashing together some of the data in the database, but I’ve had to research about a silly NSW law first. I’m not exactly sure what I can publish and what the implication of that would be (so best make your own league tables and possibly publish them if you want). The NSW law says,
A person must not, in a newspaper or other document that is publicly available in this State: (a) publish any ranking or other comparison of particular schools according to school results, or, (b) identify a school as being in a percentile of less than 90 per cent in relation to school results.
The folks at the Sydney Morning Herald seem to think that “Published online the same tables infringe no law; printed on these pages they are illegal.” This is not what I interpret the law as. Publishing online means that the document is available for access from NSW. However I am confident I can get around this by not hosting anything myself and not hosting in Australia. For this I rely on the great services provided by wordpress.com (Automattic, Inc.) and/or github.com (GitHub, Inc.). Hopefully these US companies wouldn’t cave into any threats from the Australian government.
This section of the law carries a maximum of 50 penalty units. Which is currently a fine of $5500, that is a large enough sum for me to take extra care. This is why I’m still not sure if I should put such lists like schools ordered by certain NPLAN results in the github repository.
By the way, this censorship and damaging law raises the same questions and problems (problems for those that wish to avoid criminal or civil charges) about legal jurisdiction over the internet, the classic example is the “yahoo! nazi paraphernalia” debacle.
Footnote: This SQL query should give you an ordered list of schools based on the 2009 year 9 NPLAN results (but I guess if you can load the database dump you can probably write your own queries…).
SELECT s.name, n.score, sub.state FROM nplan n, school s, (SELECT distinct pcode, state FROM suburb) sub WHERE n.school = s.myschool_url AND s.postcode = sub.pcode AND n.year = 2009 AND n.grade = 9 AND n.area = 'numeracy' ORDER BY n.score DESC;
After overcoming a few problems I managed to write a scraper for the myschool.edu.au data. Unfortunately they choose to put data in HTML, so the scraping process may have led my data to have some unknown errors. I publish (see bottom) the scraped data as I believe that per the IceTV v Nine Network  HCA 14 case, any data that my scraper produces as output from the HTML input is not subject to the copyright of the original HTML content (this also means that I cannot publish the HTML pages) and the Telstra Corporation Limited v Phone Directories Company Pty Ltd  FCA 44 case, that the raw data that is scraped is not subject to copyright.
I wish I could bzip2 up all those HTML pages and give them to you just to save your download, because the myschool.edu.au site doesn’t compress their pages when I tell them I accept gzip over HTTP, so it took up almost 2GB of quota to download all the HTML pages, oh well.
Some preliminary statistics from the data.
- There are a total of 9316 (or 9279 after I ran a newer scraper at a later data) schools. Of these,
- 1538 are Secondary (of which 30% are non-government and 70% are government)
- 1407 are Combined (of which 68% are non-government and 32% are government)
- 6054 are Primary (of which 23% are non-government and 77% are government)
- 317 are Special (of which 15% are non-government and 85% are government)
- 6451 are Government (69%),
- 2865 are Non-government (31%)
- These 9316 schools contain a total of 3 366 351 students of which,
- 1 745 224 are male (51%)
- 1 651 127 are female (49%)
- The most schools in 1 postcode is 40, which are all in the postcode 2480.
- The average student attendance rate is 92.007%
- 91.870% for Government, 92.335% for Non-government
- 89.205% for Secondary, 92.982% for Primary, 90.675% for Combined, 89.170% for Special.
- There are a total of 265 960 teaching staff (full time equivalent of 241 408) and 124 117 non-teaching staff (full time equivalent of 86 511.9).
I could report a lot of stats like these above, all you need is a basic knowledge of SQL, but as much as I enjoy working out these stats I find graphs and graphics much more intuitive, so that is up next. Because of the vast dimensions to the data you can make all kinds of graphs so what would be best is a system to draw graphics dynamically which allows the user to decide what is graphed, but this takes more work so that is on the todo list.
I’ve also looked into doing some heatmaps using the geographical location of the schools, I could have used Google Maps, or I could use OpenStreetMap and libchamplain. Both have pros and cons… But for now I used Google Maps because their API is simple and I’ve always wanted to experiment with it, the downside is I’m not sure about the copyright of their maps and subsequently any derivative works. This image is just a test showing a dot for each school in the system, but its very easy to change the colour, size and opacity of the dots based on features of the school.
Another test (some markers will be missing or in the wrong place, like the ones in NZ!),
Source code? http://github.com/andrewharvey/myschool
Don’t want to scrape and parse but want the raw data in a usable form? http://github.com/andrewharvey/myschool/tree/master/data_exports/
Extra thought: Currently the code uses Google’s API for geting the geolocation of the school, I could use OpenStreetMap for this also, however it would take more investiagtion to determine what tools exist. At the moment all I know is I have an .osm file of Australia, but schools aren’t just one dot, they are a polygon so unless I find some other tools which probably exist, I would need to (probably) just use one of the points in the polygon.
Or I could used the Geographic Names Register for NSW, but that is just for NSW… http://www.gnb.nsw.gov.au/__gnb/gnr.zip
Here is a letter I wrote to the Board of Studies (email@example.com).
I’m not exactly sure who I should address this to, so I hope you can pass it along to the relevant person.
I am writing to ask that the Office of the Board of Studies considers licensing their syllabi and examination materials under an open content license (such as Creative Commons, GNU Free Documentation License or another open content license). Currently the Board’s course syllabi, HSC and SC examinations and Notes from the Marking Centre are licensed in a way that prevents redistribution and derivative works. The current status of the copyright licenses hiders students and teachers ability to use the syllabi and examination materials for study through sharing and collaboration of content.
For example it is to my understanding that students, teachers and anyone else cannot take all the syllabus “dot points” and annotate them with their own content, and republish this for the benefit of others. Similarly the current licence prevents use of syllabus extracts such as “dot points” for collaborative works using modern web tools (such as wiki’s).
Please note that I have published this letter on the internet (https://andrewharvey4.wordpress.com/2009/03/14/a-letter-to-the-board-of-studies-nsw). If you agree to any reply to this letter to be posted online (with credit of course), please let me know otherwise I will not publish any reply.
Thank you for you time,
(Past HSC student, Currently University Student)
This is old news but I’ve been meaning to write at least something about it.When I did my HSC back in 2007 I found that there were no comprehensive notes for my subjects that suited me. That’s no surprise to me, in fact I think most people would find that they to have not found a set of notes that already exists and suits them perfectly. So I wrote my own. I used as many different and variety of sources that I could find and I merged these together into a set of notes that I could understand and reread if I ever forgot.
Initially I released them as “all rights reserved”, with a disclaimer allowing reproduction for non-commercial use. Since then I’ve licensed them under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Australia License.
One thing I did when I was studying and writing my notes was to build upon the works of others. However this was difficult with the current copyright laws and the licensing of most of the material I used. So I had to change it enough to be new works that were not derivatives. Hopefully under this licensing of my notes I will save some others the trouble and allow them to take works that already exist such as my notes and change them or take extracts or add to them to produce and publish a set of study notes that suite their needs, without the need to be breaking the law and risk legal threats.
The license is one thing, but it’s still hard to add to my notes to make your own derivative version if I only supplied a PDF version. So to fix this problem I’ve released the source document (Microsoft Word, sorry but this is what I started it in) so that anyone can easily build upon my work.
Out of all the rights that the Copyright Act 1968 (Cth) grants me, there are at least two that I think all copyright owners should not waiver (most of the time). Those being two moral rights from the Copyright Act 1968 (Cth) (part IX, division 2-3), the right of attribution of authorship, and the right not to have authorship of a work falsely attributed. I agree these should definitely be part of the act I’m glad they were recently added.
I was only just made aware of the Government’s Draft Consultation Paper on “Digital Economy Future Directions” recently. The first consultation topic is “Open Access to Public Sector Information”. At least they have expressed interest. So I went over to see what EFA had drafted for their submission,
“The Commonwealth should endorse a default set of licensing conditions for intellectual property which it owns that foster re-use of information. The standard licences provided by the Creative Commons project provide an example of how this can be done in a manner which is both (relatively) simple and clear. Standardising these licenses across government not only makes clear that a liberal attitude towards intellectual property re-use is encouraged, it also lowers transaction costs incurred by consumers of the information in understanding the licensing conditions.
The Commonwealth is not a business – it should not be producing information which does not have an intrinsic public benefit, and so there is no imperative to recoup the cost of production of the information (although recouping the marginal costs of sharing the information, which will almost always be very low, may be justifiable). Allowing Australian companies and individuals to further develop intellectual property produced in the public sector can help to stimulate innovation in Australia’s digital economy.”
—Electronic Frontiers Australia. http://wiki.efa.org.au/doku.php?id=digital_economy:2009-digital_economy_future_directions_consultation&rev=1233789400, which is licensed under the Creative Commons Attribution-ShareAlike 2.5 (Australia) licence.
I could not agree more. I particularly agree with a set of (or even just one) government licenses named appropriately. This would simplify things greatly both for the government and the consumers of the material that would be licensed under the licenses terms.
I can’t say I completely agree with the the whole of the consultation paper, but at least they are looking the right direction for open access to public sector information. Lets hope they go along the lines of EFA’s suggestions (as per the wiki). I’m particularly concerned about their plans for ISP filtering, but that’s another story.
The consultation paper also talks about so called “media literacy” which it defines; “Media literacy is a step beyond digital literacy and refers to the ability to critically consume, comprehend and create media in all its modern forms…Media literacy equips school children with the skills to effectively research online … and gives people the capabilities to create their own diverse content and contribute to online communities such as forums and social networking sites”.
I have my own interpretation of “media literacy” but its hard to explain, but I think its something you can only get better at by experience. It says that “media literacy equips school children with the skills to effectively research online”, but this notion conflicts with the systems that are currently in place in NSW. A public school student in NSW using the Internet at their school will never be able to effectively research online. This is because the DET filters the Internet so vigorously that you can no longer research, and when you can find some relevant information you are only getting one side or opinion because the other side is likely blocked (eg. blogger.com & wordpress.com are blocked). The other contradiction is that, at least for NSW public school students they will find it extremely difficult to “create their own diverse content and contribute to online communities such as forums and social networking sites” simply because most forums and social networking sites out there are (or were when I was at school) blocked (MySpace, Facebook, Youtube, along with many other similar sites are all blocked). What makes this worse is that the DET does not publish a list of blocked web sites, there goes accountability and transparency. So the federal government needs to work with the state governments, and then the state governments need to work with school systems such as the DET.
The paper states, “The Digital Education Revolution, a major part of the Australian Government’s Education Revolution, is a vital step in developing the digital literacy of Australian students.” which if I’ve interpreted it right, they are heading in the right direction, they just need to get the DET on the same side.
EFA’s draft submission on their wiki emphasises that current Australian copyright law is stifling innovation, something that I very much agree with. Hopefully the government will not ignore the EFA’s submission.
In the past week (more like a month now) or so I’ve had a few requests asking me how I got access to my exam scripts (i.e. my exam responses) and how they (having just completed their HSC) could access theirs. In light of this I thought I would explain why I think exam scripts should be accessible to the student.
About a year ago I made a request for my HSC examination scripts under the Freedom of Information Act 1989 (NSW). The process for submitting a FOI request is documented by the Board here. I was granted copies of these documents[my exam scripts]. In the past people have requested things such as raw marks, I did request those too but that was denied for me. You should note that the Board may or may not grant access to these documents in the future.
Now to why I think students should have access to their scripts, which is mainly because it makes the whole process more transparent (even US President Obama is pressing this with his recent FOIA memo). There should be nothing to hide, students should be able to check what they wrote in the exam. They should be able to publish this along with how their response was marked so that it can be scrutinised and studied by future students. I’m not convinced that this is the best study approach in the long term but that is no excuse for disallowing access to scripts. It would also be great if students could also find out how their questions were marked on a question by question basis.
However I can see reasons why the Board would not want to release exam scripts. It is time and money consuming. Even if the process is automated it still costs money and some time. For this I would accept why the Board would charge a reasonable fee for giving you your scripts.
The Board of Studies is doing the right thing here, they did allow my FOI request so I cannot argue that they are hiding them. Kudos to them for this. I hope two things to happen now, more people become aware that they can get their scripts, and the Board continuing to allow these requests.