In my previous post about the end of expertise, I made the point that the root concerns of conspiracy theorists are valid. The suspicion and mistrust of powerful players are valid. That, if governments and corporations are not going to make the actual shift then we need to make that shift. Everyday people must make that shift.
Here I want to add some comments about the role of structured data in making it easier to access relevant evidence, or supporting insights, and hence, maintaining accountability.
This post got a bit long, so I will add some thoughts about how to go about a strong, action based, engaging strategy that is also about connecting people, in a next post.
Structured data
Structured data is less about the data itself, your photo or how much it rained, and more about how that data fits together with everything else. Who took similar pictures, what other pictures did you take, who took pictures in that same location? How much did it rain in that location? Did it flood there? Did that make the sewers overflow? Did that cause disease outbreaks downstream? Can your picture give information on that?
Resource Description Framework (RDF)
In other words, it’s about the metadata; the data that gives information about something. For example, the date a photo was taken, who took it, and where. The way I understand what is written in the wiki page, RDF is about connectedness. RDF is about connectedness. An RDF “unit” or triple, has three components. The two things that are being connected, and the third thing that does the connecting. Using the photo as example, the photo is the subject, the one thing being connected. It has an entry with the actual photo. Then it has a connection, some piece of code, or text, that says the photo is connected to another object. The person who took the photo is an object here, and that photographer-connection-photo is a triple. An entry with the information about where it was taken is a new object, so a triple with photo-connection-location is a separate triple. The person-connection-location is yet another triple.
So this linking shows the connections between the photo and who took it and where. But looking at the person who took the photo, considering that as the subject this time, one could see all the nodes – all the photos this person took. Or, one can look at all the locations where this person did … anything.
At this point we’re probably familiar with things like this, for example you can see all your photos you uploaded to Instagram, and possibly you can find photos of the same location. But you can’t connect those subjects to objects outside of Instagram, it is a “walled garden”. And if Instagram decides to take those images away from you, you can’t do anything to stop it.
My interest here is to connect data about urban resources and gain insight from that. How much rain falls, and how much of that turns to stormwater runoff. How often does this turn into flooding? What happens to the sewage works, or wastewater treatment, generally, and during storms? All that needs to be connected in a structured way, and be available across the internet, so that other people in other places can share what happens to their sewage works, and then we can make connections about how this is perhaps not an isolated problem. The metadata that is embedded in your photos can also help me, and it definitely helps the companies that own the walled garden platforms. For them, however, it helps with advertising and they don’t care about your privacy. For me, and others like me, aka everyone, it helps to make sense of our world and helps us working to make our lives better in a digital ecosystem, but it can only do this when it is open.
Any move towards structured, open data is progress. The idea of 5 star data, like the star hotel rating system, is to guide people to take steps to make their content more accessible, starting with the easiest, and moving to more sophisticated data management. The website provides these guidelines:
★ Make it open: Make your stuff available on the Web (whatever format) under an open license. This can be on your own website, or on a wiki page, for example.
★★ Make data available as structured data. A spreadsheet, a text file of numbers (comma separated values, or .csv file) instead of a scanned image. A photo or image can become structured when it has its metadata added, which means more information about what is in the image, where it was taken, why was it taken, what elements does the image contain, and so on.
★★★ Make your data available in a non-proprietary open format. A comma separated values, or .csv text file will always be readable, you can open it in any program, or even in a web page, but if Microsoft decides that only Microsoft users can access their Excel spreadsheets, your data is lost.
★★★★ One weblink per dataset. Each image in your Instagram feed has it’s own web identifier, or a Uniform Resource Identifier (URI). In the same way, the open web also has these links serving as unique addresses for your objects. Using my water flows as example, each dataset of the rainfall of a specific place would have it’s own web address, or URI.
★★★★★ Link your data to other data to provide context. Continuing with the water example, the dataset of rainfall in my village would then be connected, through that Resource Description Framework (RDF) arc, to datasets on flooding, and also to how well sewage was treated for the same time period. And then that can connect to the water quality in the lakes and oceans close by. So it becomes much, much easier to make the connections and provide evidence of how seemingly independent events affect our health and safety.
Open, structured, connected data can over time build a story of what happens in our world. That story has depth and complexity, and can’t, by design, be high-jacked and twisted by people in power.
So, where does this data live? Where do things on the internet live? Basically, computers. Special computers that mostly only do one thing: store data. These are called servers. The people owning these servers, combined with the walled gardens that controls how you upload your content to the internet, is basically starting to control the internet. They can control what information you see, they control our stories.
To take back our stories, we need to take back the critical bits of our data.
We need local servers, storing the locally relevant data; critical bits of the internet relevant to a local place that cannot be censored, or lost, or corrupted. From the Solid project website:
Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone’s Pod, they control which people and applications can access it.
Interestingly, the man who is credited with creating the web of information we now call the internet, Tim Berners-Lee, designed this SOLID approach, and created the 5 star data idea. I trust him on this: To me, this seems like a good next iteration of the internet. And so this structured data, stored on local servers, is what I call the metaverse. Visualising it in a 3D or virtual reality context is a bonus, a nice to have.
The next challenge, and what I want to formally research now, is the human factor. How do people engage with data? Why would people want to get involved with this? What challenges would we face?