I’ve been curious about how to move an online collection from HTML into more of a Linked Data model. When I first started looking at it, schema.org was the Place to Be. Lately, though, it seems like Wikidata is where all the Cool Kids hang out.
So, I thought I would learn a bit about SPARQL, a query language for RDF-type semantic databases like wikidata. RDF (resourced description framework) is a structure based on triples, variously described as subject-predicate-object, thing-relation-thing, or another analogy. It lets you write queries on the pile of wikidata by asking, for example, for all the objects with same predicate.
I took a couple of workshops at conferences about all this, and it left me both daunted and bowed.
But now I’m ready to figure it out.
So, I found a really good little video explaining the SPARQL language by Navino Evans, one of the founders of Histropedia, who used an example of the women who graduated from the University of Edinburgh. It goes through how to make stuff appear (using SELECT), how to define what you want in your table (using WHERE), how to use labels, and some of the different visualisation tools you can use.
It was really easy to follow along, so I did — substituting University of Toronto for his institution.
How to define your SPARQL query
On the first line, use SELECT to define the columns you want to see in you table. You will define these in the next section. For example:
SELECT ?person ?personLabel ?birthPlaceLabel ?coordinates ?birthDate ?deathDate ?image
This says show the person’s name, place of birth and its latitude and longitude, their dates of birth and death (I am a librarian) and their image.
In the WHERE section, you indicate the predicate (in this case, ?person, this is about a person), the property (wdt: P27, country of citizenship) and the value (wd: Q16, Canada). That is, give me a list of Canadians. You must end statements with a period. For example:
WHERE { ?person wdt:P27 wd:Q16 . }
So, that would be a pretty huge result set!
It also shows you how to use a service, in this case the Label service. This puts the name onto the identifier. For example, the Canadian author Gail Bowen is entity Q1491217. Without the Label service, she would be referred to only as Q1491217 in your table. Labels are fun!
Here is the code I ended up running with (pun not really intended) in the Wikidata Query Service. The text after the # is a comment, explaining the code. It will run that way in the WQS, but not in some of the other ways of displaying your results.
SELECT ?person ?personLabel ?birthPlaceLabel ?coordinates ?birthDate ?deathDate ?image WHERE { ?person wdt:P27 wd:Q16 . #country of citizenship (P27) is Canada (Q16) ?person wdt:P69 wd:Q180865 . #educated at (P69) UofT (Q180865) ?person wdt:P21 wd:Q6581072 . #sex or gender (P21) is female (Q6581072) ?person wdt:P19 ?birthPlace . #place of birth (P19) is named ?birthPlace, add ?birthPlace to query, with Label to call the Label service ?birthPlace wdt:P625 ?coordinates . #co-ordinate location (P625) is named ?coordinates, returns latlong ?person wdt:P569 ?birthDate . #date of birth (P569) is named ?birthDate OPTIONAL {?person wdt:P570 ?deathDate .} #date of death (P570) is named ?deathDate OPTIONAL {?person wdt:P18 ?image .} #image (P18) is called ?image SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } #label service, adds label to query results (which are just numbers), add ?personLabel to query }
This says, basically: give me a list of people of Canadian citizenship, educated at the University of Toronto, who are female. Show me where they are born, and the latitude and longitude of that place. Give me their death dates and image, if available. Oh, and please use the names of the things I am looking for, not just their ID numbers.
So, if you don’t make the two OPTIONAL fields optional, it makes them mandatory. That means, if they aren’t dead or if no picture of them exists, they don’t make the list. For example, this result set currently has 289 results.
- If the image is not optional, but death date is, there now are 67 results.
- If the death date is not optional, but image is, there now are 98 results.
- If neither is optional, there are only 15 results. For readability’s sake, this is the one pictured in the graphic interpretations, below.
I like this example, because you really do run up against the invisibility of the female gender in the Wikiverse (and, let’s face it, in the Universe). Imagine if I were looking for aboriginal women!
This visualisation is a table. By clicking on the little eye (top left, under the code), you have the option for a number of other ways of seeing your data, including map and timeline. To run the code, press the little blue arrow on the left, above the table and below the code.
Here is a timeline of the women who went to UofT, who have already died, and who have wikidata records and who have images uploaded. (NOTE: You can change this, by adding data to the dataset. Anyone can do it. Surely, more than 15 women of note graduated from UofT, lived, made a valuable contribution, and died. The eldest was born in 1868, for goodness’ sake.)
This is a timeline visualisation using Historpedia’s wikidata viewer. By clicking the little eye-shaped icon on the right, below the code in the SPARQL query window, you can see a number of other options, including map (if you select the P625, coordinate location, field in your query).
Here is a grid view of the same query.
So, daunted I should not have been, really. I must address my fear of acronyms (FOA) and plough forward undaunted in the future. This was NHAA (not hard at all).
Next, I need to learn how to contribute to Wikidata. I did do a foray a year or two ago, using a wikidata game, but it didn’t give me enough information and I mis-attributed a couple of pieces of data. They didn’t slap me down with as much glee and (un)intentional violence as editors on the Wikipedia site, but it still left me feeling a bit back-footed.
Because I have a slower learning curve, I need to understand 100% before contributing so that I am not attacked. These projects are not exactly transparent to someone who doesn’t already understand the ecosystems (or who, like me, has been away from it for a couple of decades). Don’t bite the newbies, dude. It’s a thing.
Wikidata is very much more welcoming than Wikipedia, and less ambiguous, to boot. Be not afraid.