Here we describe the process of building the interactive SWIB20 participants map, created by a query to Wikidata. The map was intended to support participants of SWIB20 to make contacts in the virtual conference space. However, in compliance with GDPR we want to avoid publishing personal details. So we choose to publish a map of institutions, to which the participants are affiliated. (Obvious downside: the 9 un-affiliated participants could not be represented on the map).
We suppose that the method can be applied to other conferences and other use cases - e.g., the downloaders of scientific software or the institutions subscribed to an academic journal. Therefore, we describe the process in some detail.
We started with a list of institution names (with country code and city, but without person ids), extracted and transformed from our ConfTool registration system, saved it in CSV format. Country names were normalized, cities were not (and only used for context information).
We created an OpenRefine project, and reconciled the institution name column with Wikidata items of type Q43229 (organization, and all its subtypes). We included the country column (-> P17, country) as relevant other detail, and let OpenRefine “Auto-match candidates with high confidence”. Of our original set of 335 country/institution entries, 193 were automaticaly matched via the Wikidata reconciliation service. At the end of the conference, 400 institutions were identified and put on the map (data set).
We went through all un-matched entries and either
a) selected one of the suggested items, or
b) looked up and tweaked the name string in Wikidata, or in Google, until we found an according Wikipedia page, openend the linked Wikidata object from there, and inserted the QID in OpenRefine, or
c) created a new Wikidata item (if the institution seemed notable), or
d) attached “not yet determined” (Q59496158) where no Wikidata item (yet) exists, or
e) attached “undefined value” (Q7883029) where no institution had been given
The results were exported from OpenRefine into a .tsv file (settings)
- Again via a script, we loaded ConfTool participants data, built a lookup table from all available OpenRefine results (country/name string -> WD item QID), aggregated participant counts per QID, and loaded that data into a custom SPARQL endpoint, which is accessible from the Wikidata Query Service. As in step 1, for all (new) institution name strings, which were not yet mapped to Wikidata, a .csv file was produced. (An additional remark: If no approved custom SPARQL endpoint is available, it is feasible to generate a static query with all data in it’s “values” clause.)
During the preparation of the conference, more and more participants registered, which required multiple loops: Use the csv file of step 5 and re-iterate, starting at step 2. (Since I found no straightforward way to update an existing OpenRefine project with extended data, I created a new project with new input and output files for every iteration.)
Finally, to display the map we could run a federated query on WDQS. It fetches the institution items from the custom endpoint and enriches them from Wikidata with name, logo and image of the institution (if present), as well as with geographic coordinates, obtained directly or indirectly as follows:
a) item has “coodinate location” (P625) itself, or
b) item has “headquarters location” item with coordinates (P159/P625), or
c) item has “located in administrative entity” item with coordinates (P131/P625), or
c) item has “country” item (P17/P625)
Applying this method, only one institution item could not be located on the map.
The way to improve the map was to improve the data about the items in Wikidata - which also helps all future Wikidata users.
For a few institutions, new items were created:
- Burundi Association of Librarians, Archivists and Documentalists
- FAO representation in Kenya
- Aurora Information Technology
- Istituto di Informatica Giuridica e Sistemi Giudiziari
For another 14 institutions, mostly private companies, no items were created due to notability concerns. Everything else already had an item in Wikidata!
Improvement of existing items
In order to improve the display on the map, we enhanced selected items in Wikidata in various ways:
- Add English label
- Add type (instance of)
- Add headquarter location
- Add image and/or logo
And we hope, that participants of the conference also took the opportunity to make their institution “look better”, by adding for example an image of it to the Wikidata knowledge base.
Putting Wikidata into use for a completely custom purpose thus created incentives for improving “the sum of all human knowledge” step by tiny step.