Open government data is used in initiatives aiming to improve intra-governmental workflows, engage citizens in the design of new digital products and services, and inform policy design.
The following interview was conducted as part of a year-long research project into the attitudes of Australian public servants towards open data initiatives, undertaken by Dale Leorke, postdoctoral researcher at Tampere University, Finland; and Suneel Jethani, Engagement Manager and Research Fellow at the Victorian State Government’s Department of Premier and Cabinet Digital, Design and Innovation Branch, which manages the Data.Vic portal. The research was conducted in partnership with the University of Melbourne.
What follows is an edited version of a conversation between Suneel Jethani and Will McIntosh, Open Data Team Leader at the City of Melbourne, a local government municipality responsible for Melbourne’s central business district and inner suburbs. You can read the full transcript here.
Suneel Jethani: The first thing that I want to ask is about your background. How did you end up working in open data?
Will McIntosh: I studied a double degree, IT and Information Systems at Deakin University. I majored in 3D. That gave me a lot of coding skills and an understanding of how information flowed across organisations or into a game or a physics environment… Over the next seven years I started to become interested in emerging areas of government. Something that kept coming up around that time — about 2009 onwards — was open data, and that’s also when we started hearing about things like Gov Hack and other hackathon events.
I then took a job at City of Melbourne as a 3D GIS specialist. City of Melbourne had a strong Open Data Program, so I started contributing to that, and then they made open data a dedicated team and I got the team leader job. It’s been great trying to use my coding skills to figure out how data flows across organisations or trying to understand what’s going to be of benefit for the community.
SJ: Can you tell us how you’ve seen the catalogue of open datasets grow, evolve and change?
WM: In 2014, City of Melbourne were thinking: “What’s the uniquely Melbourne data?” People in Melbourne, for whatever reason, love trees, so tree data was, I think, our most used dataset, which is probably common across councils. Also unique to Melbourne was pedestrian counts. Not all councils have got that kind of information. The third one was our Census of Land Use and Employment. That’s basically a survey going out to each individual building across the city across a two-year period to understand the demographics, the uses of each building or each property, and attributing that back in data. That is exciting because it’s a deeper view of what the city’s got inside its buildings.
I think there was just under 100 datasets at the time. It was a bit of a, “Let’s get data out there. Let’s introduce employees to our Open Data Program.” It probably embedded a bit of an open data culture in City of Melbourne. We have a very detailed release process, which ultimately ends up getting feedback from experts in our marketing, governance, risk, legal, IT, and branch managers. So, it’s quite well established.
It’s important at that initial stage to make sure the risks are minimised
But for the Open Data Program to be a success, it’s important at that initial stage to make sure the risks are minimised, because that allows the program to grow. I think those datasets we originally put out are now maturing. They’re getting more advanced in terms of the analysis you can do, because it’s now not just a snapshot of the city, it’s a snapshot across time… Now, we’re at just over 200 datasets.
SJ: That’s an amazing program of work. Where would you like to see it go from here, say by 2022?
WM: I just got goose bumps because it could go in some exciting directions, both internally, in terms of how the data culture can change within an organisation, and then also externally, how we harness the power of local talent and how government can use new students coming through with some modern skills.
If I’m thinking about 2022, there would hopefully be some sort of open innovation ecosystem where we release datasets that are cross-council, across local government areas and across authorities. So releasing information that gives understanding across those boundaries.
But then also put out an open marketplace where we’re saying, “We need these challenges solved.” We as government may not have all resources to be able to solve challenges ourselves, but we want to call out to our local universities, our local coding academies, and say, “If you want to work together on this with us… students can use real-world data, solve real-world problems. They can add attractive work to their portfolio and become highly attractive for roles in organisations.”
SJ: What do you think the barriers to realising that vision are?
WM: When people tell us “This team might be fearful of releasing this information because it isn’t perfect”, is it more work for them to release this information because in a way, they have to extract it manually?
Hopefully we can automate it and make their work easier. They’re already strapped for time. Is there a way we can actually talk to that team, address the barrier of efficiencies to say, “If we do release this information, if we do have complete metadata that answers 99% of questions that people might have, and that we can actually put that out with a call to action for solving a challenge or meeting a need,” that’s where I think we can start to address those barriers of teams that previously haven’t shared their data publicly.
There are some barriers of fear, but these can be addressed by clear communication and developing a good data culture and by demonstrating and celebrating the benefits of getting this right.
All data has a value in it that can be shared at scale by being open
There’s a balance here between putting out data that isn’t perfect and putting our low-quality data. I haven’t seen a perfect dataset yet. Most data have intricacies that eventually you get to know about over time. But it doesn’t have to be perfect to be useful. All data has a value in it that can be shared at scale by being open.
If you put out information and you know it’s of a reasonable quality, this helps you to see what the community can do with it. I’ve found in the past that when you put out data, someone can find a reading which might be wrong.
I’ve seen this one example in which there was a visualisation done of a tree dataset across the council. The 3D visualisation was beautiful. They were zooming across, and there was this tree which was just taller than the Eiffel Tower. I thought it was quite funny because it was so obviously wrong. But it meant that they contacted us and said, “Dear council, I’ve been using your dataset. I believe there might be a mistake.” More eyes on datasets mean that we can actually find those mistakes, and actually fix them back at the source.
If you’ve crafted that process and that data governance to set that up, it means that even though your data wasn’t perfect to start off with, it starts to get better over time. This is the balance between waiting until it’s perfect before we put it out. But the offset here is that you have to take that hit and potential brand risk of a council to say, “There was a mistake,” and this requires leadership from an organisation to say, “We believe in Open Data. We believe that we should be releasing this data with the community because the benefits are of such a high level, that yes, there might be a mistake, but we’re going to still believe in that, regardless if a mistake is found.”
Open data is still in its infancy and it needs strong leadership to champion it
Relatively speaking, open data is still in its infancy and it needs strong leadership to champion it.
SJ: Over the last year, Google have put out their Dataset Search tool. The Commonwealth Scientific and Industrial Research Organisation have built MAGDA for the Australian federal government. What are some of the things that you think are important for an open data portal to have? What’s your wish list?
WM: I’m a bit different in terms of how I see portals. We’re a Socrata site. Over the last few years we’ve seen a real maturing of different platforms. A lot of them offer incredible functionality.
We are still seeing a big push for the real-time information, as well as big data. We have our parking sensor data which is in the order of 240 million rows, which is not a lot, but how do we put that in a portal that people can query and access and have a robust API to actually pull out that information when they need to? I think our platform is great, but I think this is where we want to see that kind of growth area being addressed, of being able to handle that type of information.
When we think about high-value datasets like true orthographic aerial imagery which has just been released and is at a 10-centimetre accuracy, we need to have a platform which can support this type of file type and size. There’s still some growth with portals to actually handle particularly non-tabular, more geospatial information to support pulling data into live maps, and perform well.
Where I think my thoughts differ from others is I really don’t mind which data platform is chosen just as long as they do make it open. I think we’ve seen a good supply of data being put on data.gov.au [the national open data portal]. The data is updated in real time and can change in the background if the schema remains consistent, and people’s apps are still going to work well. I think that excites our user base.
Ultimately, the portals are there for our users to use
Ultimately, the portals are there for our users to use, we want them to have the best experience and their primary need is to access to the data.
I think what we’re going to see is potentially a data aggregation market where different portals can together feed a portal which says, “I can group those themes together, either through machine learning or some carefully curated search terms.” So app developers who want to pull all tree points onto a map can pull a vegetation layer and foliage information that crosses council boundaries.
I think that this is possible even though councils typically have different data schemas that might be using standards or not.
SJ: Thank you very much, Will, this has been a great discussion.
WM: Thank you.
(Picture credit: Unsplash)