Open Government Data, Complexity vs. Usefulness

On the flight yesterday I was thinking through the various models that we are seeing emerge for open government data. As with most computer systems you can begin considering a model that should help when it comes to thinking about how we might make raw data available, and then contrast that with the flexibility of what we can do with that data.

For the sake of simplicity I looked at two different examples, one from the United Nations, and a second based upon a proposal that has recently been put forwards by Tim Berners-Lee for the United Kingdom. While both are extremely valuable implementations, it is clear that they present very different options for government.

The United Nations data can be found at http://data.un.org, and at time of writing represents twenty two data sets and over sixty million records. The data can be downloaded as a series of CSV files that can be quickly read into a spreadsheet, database or other tool of the users choosing. Making data available in this form would take less time an effort by the agencies involved, but would obviously still make it possible to build a wide variety of applications using the information available.

The proposal put forward by Dr. Berners-Lee is a little more complex in nature, suggesting that government data should be mapped through RDF and semantically tagged as it is published. In the long run this would represent significant value as individuals and organizations started work to understand and analyze the data but at the same time would represent a much larger project across government to enable the data to be published in the first place.

The overall model might help us evaluate options for implementation plans, end user tools, timelines and milestones and the value that the project would deliver.

In essence it might look a little like the graphic below, of course it would be a lot more useful with a significantly larger set of projects and case studies mapped against it;

open data complexity model

As with the early stages of eGovernment I suspect we will see governments take a more pragmatic approach in the short term, implementing systems that fall on the left hand side of this graph, while holding aspirations and a vision that firmly maps to the right hand side.

This entry was posted in Cloud Computing, eGovernment and tagged , . Bookmark the permalink.

5 Responses to Open Government Data, Complexity vs. Usefulness

  1. Interesting ideas Oliver.

    The graphic may be misleading – is there an value represented on the Y axis? If not, then what you are saying is that publishing government data can take place on a continuum between

    Simple (short timeframe, useful, many tools available)

    to

    Complex (long timeframe, very useful, few tools available).

    Which seems pretty intuitive.

    However, the low end (CSV) publishing model misses the real point of 2.0, which is that information is not an end product, but a step in a bigger information supply chain. If all we achieve with open data is allowing people to move from data scraping, it will have been an underwhelming revolution, relative to the potential.

  2. oliver says:

    Yes, you’re right, when I look at the graphic it has a level of clarity consummate with my 30 hour flight from NZ to India. :) The value should certainly be on the Y-axis.

    A couple more comments;

    1/ The model would make a lot more sense with several other projects mapped against it, the two extremes that are currently there will only represent a small handful of projects around the world – the rest will sit between the two.

    2/ For some government entities, publishing as CSV isn’t quite as weak as it might sound. There needs to be some thought around issues like available budget, how apps will be built, which groups might use the data etc. For developing countries, or for countries with a very devolved governance structure then very basic raw data may (only may) be the right answer. It certainly isn’t a revolution, but it is a step beyond openness as we know it today.

    Maybe I’ll spend some time on the way home refining this a little… moving that troubling Y-axis and positioning a few more of the projects that we’re involved in around the world.

  3. Interesting post. Seems to me that cost is a factor as well (obviously). Would the costs of the drive towards complexity map favorably to the value? I’m assuming the costs would also reach well beyond the processing tools. People issues would seem to be a big factor as well.

  4. Lorenzo Madrid says:

    Very interesting indeed. This is a move beyond GOV 2.0 but it does not seem possible to provide fully semantic capabilities. Shall we call it GOV 2.5? :)

  5. álvaro says:

    Looking back at the History of the web I miss something… plain old HTML! :)

    I would impose a single requirement, a principle of design of the data, as to speak:

    - it must be searchable with the tools we use every day. And that means Google. And that would require that the documents are plain old HTML. Which is the easiest and cheapest format to deploy.

    With the data searchable and foundable with everyday tools, we would find that information usable, we would find new uses for that information that we might not be aware of now. Mashups will emerge…

    How we do know if some info/data is valid for me If I have to download it, process it, then search inside an Excel document… That is not pretty :) And does not lead us to making lots of use of the data, which is what we need as a society.

    Once Google or your favourite search engine can index the info, make it machine-usable. Not person-machine-usable as CSV implies. Give me an API. Give me XML although is not perfectly composed RDF.

    Great tons of APIs of custom XML are working nowadays and provides great functionality. The cost of making RDF is too high. Let people produce cheapest XML and then loosely couple it…

    :)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>