skos-history: New method for change tracking applied to STW Thesaurus for Economics

“What’s new?” and “What has changed?” are questions users of Knowledge Organization Systems (KOS), such as thesauri or classifications, ask when a new version is published. Much more so, when a thesaurus existing since the 1990s has been completely revised, subject area for subject area. After four intermediately published versions in as many consecutive years, ZBW's STW Thesaurus for Economics has been re-launched recently in version 9.0. In total, 777 descriptors have been added; 1,052 (of about 6,000) have been deprecated and in their vast majority merged into others. More subtle changes include modified preferred labels, or merges and splits of existing concepts.

Since STW has been published on the web in 2009, we went to great lengths to make change traceable: No concept and no web page has been deleted, everything from prior versions is still available. Following a presentation at DC-2013 in Lisbon, I've started the skos-history project, which aims to exploit published SKOS files of different versions for change tracking. A first beta implementation of Linked-Data-based change reports went live with STW 8.14, making use of SPARQL "live queries" (as described in a prior post). With the publication of STW 9.0, full reports of the changes are available. How do they work?


Publishing SPARQL queries live

SPARQL queries are a great way to explore Linked Data sets - be it our STW with it's links to other vocabularies, the papers of our repository EconStor, or persons or institutions in economics as authority data. ZBW therefore offers since a long time public endpoints. Yet, it is often not so easy to figure out the right queries. The classes and properties used in the data sets are unknown, and the overall structure requires some exploration. Therefore, we have started collecting queries in our new SPARQL Lab, which are in use at ZBW, and which could serve as examples to deal with our datasets for others.

A major challenge was to publish queries in a way that allows not only their execution, but also their modification by users. The first approach to this was pre-filled HTML forms (e.g. Yet that couples the query code with that of the HTML page, and with a hard-coded endpoint address. It does not scale to multiple queries on a diversity of endpoints, and it is difficult to test and to keep in sync with changes in the data sets. Besides, offering a simple text area without any editing support makes it quite hard for users to adapt a query to their needs.

And then came YASGUI, an "IDE" for SPARQL queries. Accompanied by the YASQE and YASR libraries, it offers a completely client-side, customable, Javascript-based editing and execution environment. Particular highlights from the libraries' descriptions include:

Other editions of this work: An experiment with OCLC's LOD work identifiers

Large library collections, and more so portals or discovery systems aggregating data from diverse sources, face the problem of duplicate content. Wouldn't it be nice, if every edition of a work could be collected beyond one entry in a result set?

The WorldCat catalogue, provided by OCLC, holds more than 320 million bibliographic records. Since early in 2014, OCLC shares its 197 million work descriptions as Linked Open Data: "A Work is a high-level description of a resource, containing information such as author, name, descriptions, subjects etc., common to all editions of the work. ... In the case of a WorldCat Work description, it also contains [Linked Data] links to individual, oclc numbered, editions already shared in WorldCat." The works and editions are marked up with semantic markup, in particular using schema:exampleOfWork/schema:workExample for the relation from edition to work and vice versa. These properties have been added recently to the spec, as suggested by the W3C Schema Bib Extend Community Group.

ZBW contributes to WorldCat, and has 1.2 million oclc numbers attached to it's bibliographic records. So it seemed interesting, how many of these editions link to works and furthermore to other editions of the very same work.

Link out to DBpedia with a new Web Taxonomy module

ZBW Labs now uses DBpedia resources as tags/categories for articles and projects. The new Web Taxonomy plugin for DBpedia Drupal module (developed at ZBW) integrates DBpedia labels, stemming from Wikipedia page titles, via a comfortable autocomplete plugin into the authoring process. On the term page (example), further information about a keyword can be obtained by a link to the DBpedia resource. This at the same time connects ZBW Labs to the Linked Open Data Cloud.

The plugin is the first one released for Drupal Web Taxonomy, which makes LOD resources and web services easily available for site builders. Plugins for further taxonomies are to be released within our Economics Taxonomies for Drupal project.

Extending econ-ws Web Services with JSON-LD and Other RDF Output Formats

From the beginning, our econ-ws (terminology) web services for economics produce tabular output, very much like the results of a SQL query. Not a surprise - they are based on SPARQL, and use the well-defined table-shaped SPARQL 1.1 query results formats in JSON and XML, which can be easily transformed to HTML. But there are services, whose results not really fit this pattern, because they are inherently tree-shaped. This is true especially for the /combined1 and the /mappings service. For the former, see our prior blog post; an example of the latter may be given here: The mappings of the descriptor International trade policy are (in html) shown as:

concept prefLabel relation targetPrefLabel targetConcept target
<> "International trade policy" @en <> "International trade policies" @en <> <>
<> "International trade policy" @en <> "Commercial policy" @en <> <>

That´s far from perfect - the "concept" and "prefLabel" entries of the source concept(s) of the mappings are identical over multiple rows.

Thesaurus-augmented Search with Jena Text

How can we get most out of a thesaurus to support user searches? Taking advantage of SKOS thesauri published on the web, their mappings and the latest Semantic Web tools, we can support users both with synonyms (e.g. "accountancy" for "bookkeeping") for their original search terms as well as with suggestions for neighboring concepts.

ZBW Labs as Linked Open Data

As a laboratory for new, Linked Open Data based publishing technologies, we now develop the ZBW Labs web site as a Semantic Web Application. The pages are enriched with RDFa, making use of Dublin Core, DOAP (Description of a Project) and other vocabularies. The vocabulary, which is also applied through RDFa, should support search engine visibility.

With this new version we aim at a playground to test new possibilities in electronic publishing and linking data on the web. At the same time, it facilitates editorial contributions from project members about recent developments and allows comments and other forms of participation by web users.

As it is based on Drupal 7, RDFa is "build-in" (in the CMS core) and is easy done by configuration on a field level. Enhancements are made through the RDFx. and SPARQL Views modules. A lot of other ready-made components in Drupal (most noteworthy the Views and the new Entity Reference modules) make it easy to provide and interlink the data items on the site. The current version of Zen theme enables the HTML 5 and the use of RDFa 1.1, and permits a responsive design for smartphones and pads.

