The EIF (European Internet Foundation) hosted a dinner debate on Open Data, also known as the Public Sector Information (PSI), at the European Parliament, Brussels on Tuesday 24th January 2012. Attendees came from a broad range of commerce (including Microsoft, Facebook and Google), education, NGOs, national and regional government departments, and the European Commission and European Parliament. The event was hosted by MEP and EIF governor Marietje Schaake, and led by three speakers: Marcus Dapp who drove an innovative open data project for the City of Munich, Richard Swetenham who heads the Access to Information unit of DG Information Society at the European Commission, and Willem van Valkenburg who is director of Delft University’s OpenCourseWare project. A link to the EIF event site and speaker podcast is here and we have added further reference links at the bottom of this article.
By Andrew Griffin, 22.2.2012, www.policybloggersnetwork.com
Summary
Such an “obviously” good idea
Just about everyone agrees open data is a good idea, but some public bodies are resistant to change, or perhaps to the release of data whose quality would be put in question if viewed by a wider audience. Talking to a group of delegates after the dinner it was pointed out that opening up data would mean that mistakes would be spotted more quickly. Half the group thought this a good thing, half thought it bad! This, it seems, is the main barrier to opening up of public data.But need evidence to persuade reluctant public bodies
Public sector attendees’ main plea was for examples and case studies that they could take back to persuade their departments to release data. The private sector, especially the start-up space, is well aware of apps and businesses that have sprung up on the back of publicly available data, but it seems these have not been collated anywhere for public servants to peruse. PolicyBloggers has done some digging and come up with a starter-resource for anyone needing examples, at the end of this article.Copyright probably a non-issue except in education
The issue of copyright was often raised during the evening, but it seems this worry is misplaced. Data, as opposed to human-created content, is not copyrightable. Time is wasted issuing data under a creative commons license that has no standing in law. Departments can charge a reasonable fee for provision of data, especially if its collection into publishable form adds to their costs, or if it is requested at a faster pace than needed by the department. In education, there is an opportunity to open up publicly funded research. Existing research publications fund just 2% of their content and often take copyright of their articles in exchange for them being published in a prestigious journal. Coincident to the event, we see academics beginning to boycott some research publications.Open data, or Public Sector Information (PSI)
Open Data refers to the release of public sector data, often called Public Sector Information (PSI), by public departments or public funded bodies. Open data means any data collected by, or whose creation is funded by, public institutions. Examples mentioned during the debate include the release of weather information in the Netherlands spurring mobile apps telling you if it is safe to go shopping without your raincoat; data on Netherlands’ schools mashed up with a social network allowing parents to compare notes; land registry data in the UK and Spain; live bus and metro information in London and New York City and the creation of “open data” cities such as Munich. It could also mean the opening up of research data from universities that is funded by public money, or of digital art collections, or of aggregated demographic or financial or tax data held by public institutions.What the European Commission is doing
The re-use of PSI has become a high priority for the EU, having garnered little attention at first. As Richard Swetenham noted, “open data is so obviously a good idea, how can you be against it?”. Swetenham outlined the commisson’s work to date. The November 2003 PSI directive outlined fair commercial reuse policies. Some public data is by nature a monopoly, so this has to be commercialised in a fair, non-discriminatory way. The directive worked, well, but not well enough and so a new proposal to amend and reinforce is being worked on. The scope of the directive is being extended to cultural heritage including libraries, museums and archives. Publicly funded academic research will also be addressed.Open Data in Munich
Marcus Dapp, whose background included research on open source software, led the first city open data project in Germany. He identified pride at being the first city to open up as being one of the drivers behind the decision. The project was a success but had some interesting learning points. He emphasised the need to reach out to the community on what it wanted well before the project started. Talking to him afterwards, he noted that the release of individual house sale price information, which is now freely available online in the UK, would not be tolerated by German citizens. His main issue with the process was measurement of success and the difficulty in persuading public sector management to release data. Tangible and also intangible (social return on investment, SROI) measures should be proposed up front. One group of citizens demanded the city’s financial accounts in digital form. The City pointed out that it published an online 500 page PDF of its accounts, but this wasn’t acceptable to the group and in the end the high level accounts data were published.Opening up Education
Willem van Valkenburg made the point that much more could be addressed in the directive on the subject of education. Each narrow sub-sector of academic research has its preferred publications, and the publishers make use of the exclusivity and reputation enhancement of being cited to charge for inclusion and, surprisingly, retain copyright on published articles. He estimates that only 2% of research publication content is actually funded by those publications, with the bulk of the cost coming from public purses. In an online age this seems ripe for disruption. True, publications have rigorous selection processes, but even these are driven by groups of often unpaid academics, also funded by the public. A few days before the event, mathematician Timothy Gowers sparked a heated debate in the press by blogging about his boycott of one academic publisher. But there is a less tangible angle to open data in education, and that is the opening up of coursework materials. OpenCourseWare at Delft University has 20,000 courses freely available and often used outside the EU. An example is Delft’s water management course, used in Indonesia and Africa, which in turn fed back local case studies that now enrich the course content. Stanford University in the US opened up an Artifical Intelligence course and saw 160,000 students sign up. There are drawbacks – course texts are copyright protected and cannot be changed. And there is a lack of direct tutor contact. However in the US OpenStudy set up a feedback site and found that 70% of posted student questions were answered within 5 minutes. Willem felt that European open course work is lagging behind the US and Asia.Barriers and issues
Richard Swetenham opened the debate on this question by noting that the behaviour of “Data Huggers” had already been described by Andrew Stott, the UK cabinet office’s Director of Digital Engagement. His 11 data hugging excuses were not listed at the event but for referenence are:- It’s held separately by n different organisations and we can’t join it up
- It will make people angry and scared without helping them
- It is technically impossible
- We do not own the data
- The data is just too large to be published and used
- Our website cannot hold files this large
- We know the data is wrong
- We know the data is wrong, and people will tell us when it’s wrong
- We know the data is wrong, and we will waste valuable resources inputting the corrections people send us
- People will draw superficial conclusions from the data without understanding the wider picture
- People will construct league tables from it
- It will generate more Freedom of Information requests
- It might be combined with other data to identify individuals/sensitive information
- It will cost too much to put it into a standard format
- Our IT suppliers will charge us a fortune to do an ad hoc extract