A tale of two datasets

 
 
 

We’re a social enterprise with the word “data” in our name. We are obsessed with the concept of data maturity. The fact that you are reading this post suggests to me that you already have more interest in datasets than the average internet user.

Even so I’m a bit worried that this post might be a bit geeky even for you. 

But stick with me and I think you’ll find it really quite interesting.

We are delighted to be part of the Mapio Cymru project. Mapio Cymru translates into English as “Mapping Wales” and it is a project all about encouraging more maps in Welsh and more mapping in the Welsh language.

OpenStreetMap

Our first dataset is OpenStreetMap: a free and open mapping dataset of the world. It is managed by a community and anyone in the world can access the data and use it without having to pay a fee or ask permission.

Mapio Cymru publishes a map of Wales in Welsh which you can view at openstreetmap.cymru. 

Every day the map is re-drawn to take account of changes that volunteers may have made to the map. (Actually it’s slightly more complicated than that which this post explains.)

As the map is redrawn, we label features (like roads and rivers) based on name information volunteers have added. Our approach is very strict which means that each label must be explicit that it is the name used in Welsh. If we can’t be sure, we don’t include it. This makes the map a bit austere, and in some places a bit empty.

Like this map of Llanfair-ym-Muallt (Builth Wells in English). Many of the roads aren’t labelled. For example, there is a long road called, in English, Hospital Road. In Welsh it is also called Hospital Road (rather than Ffordd Ysbyty which would be a direct translation). So the volunteer who added the name didn’t see the need to add it twice, but when we draw the map we can’t be sure, so we leave it off.

We’d like to add more labels to the map and we have two main approaches to that:

  • we’re asking volunteers who edit the map to be more explicit about what things are called in Welsh

  • we’re looking for additional sources of labelling data.

That’s what brings us to the second dataset.

Wikidata

Wikidata is a completely different dataset, though it is also maintained by a global community. In essence it aims to provide a unique identifier for every notable thing in existence. It’s an amazing project with an astonishing range of possible uses, for example as a dataset of public art in Herefordshire.

The National Library of Wales makes great use of Wikidata to make its collections more accessible to the public. Volunteers work with Wales’ National Wikimedian to ensure that all manner of records are available there. This provides us with a real opportunity.

Many records in Wikidata (whether added by library volunteers or not) refer to physical locations. And when they do they will often have the name in English and Welsh included.

So this year we worked with the National Library to change the way our map is redrawn every day to include Wikidata. It now works like this:

  • if an OpenStreetMap volunteer has included the name in Welsh for a feature we’ll use that to label the object

  • if not, we’ll see if a volunteer has added a reference to the Wikidata entry for that feature

  • if there is a reference, we’ll look up the Wikidata entry for the Welsh name there and use that.

This has added thousands of new labels to our map directly. We also had the help of a student placement in the summer who added many hundreds of new links to Wikidata into the OpenStreetMap dataset.

You can read more about this project over on the National Library of Wales website.

Why should you care?

Amazing. You’ve stayed with me long past the point when my family and friends would have wandered off to see what was on Netflix.

Obviously the main point of this project is to improve our map. I think I’ve learned a few things beyond how to drawn a better map in Welsh.

Three things I’ve learned:

1. Working with public datasets enables unforeseen benefits

OpenStreetMap started as way of providing a free alternative to Ordnance Survey data in the UK. It’s subsequently developed into a huge global resource with uses in humanitarian response, shipping, analysis and countless other areas. 

When we decided to create a Welsh language map almost all of the work had been done. Those previous efforts hadn’t envisaged a Welsh-language version but they had enabled it. 

The more we can work with datasets like OpenStreetMap, the better we can make them and the more opportunities we can create for others that we haven’t even considered.

For example, Jean-Baptiste Robertson built this application that allows you to browse a map of Wales in Welsh and read Wikipedia articles about places you can see on the map. This makes use of our map and the links to Wikidata that have been added to OpenStreetMap but in a very different way to how we had imagined it.

2. Data is cultural

The same dataset will be interrogated in different ways by different people based on their lived experiences, their interests and their culture. 

Different people would capture data about different aspects of the same environment based on these same issues. The act of capturing data and creating datasets carries with it the influence of our culture. 

This is not bad. What would be bad would be to ignore this fact, to assume that data is in some way culturally neutral. 

The more diversity we can bring to data, to the design of datasets, to the collection and analysis of data and to the application of insights the better. 

3. Democratising maps is powerful

What excites me most about OpenStreetMap is the fact that it places mapping in the hands of any community that wants to map things. It’s not the only technology to do this, open source GIS packages like QGIS and increasingly easy to access online mapping tools all move us in the same direction.

We inherit mapping from people who wanted to move armies about efficiently and effectively or navigate oceans safely and speedily. Of course both of these remain important uses of mapping. They are no longer the only (or perhaps even the primary) uses of maps.

What we have now is the ability to use mapping for our own purposes. To navigate by bicycle, to make sure you’re never too far from a public loo or to organise an 1821 pub crawl. 

And I feel we are just at the beginning of seeing what people and communities will do when they are given control over their own maps. 

And finally

We’re very grateful to the Welsh Government who have supported Mapio Cymru over several years. We have several really exciting developments planned for mapping in Wales over the coming year so watch this space.

And if you are as enthused about maps as I am, please do get in touch.

Ben Proctor is Data and Digital Innovation Director at Data Orchard.

 
Ben Proctor