Wikidata

The PM20 commodities/wares archive: part 4 of the data donation to Wikidata

After the digitized material of the persons, countries/subjects and companies archives of the 20th Century Press archives had been made available via Wikidata, now the last part from the wares archive has been added.

This ware archive is about products and commodities. Founded in 1908 at the Hamburg "Kolonialinstitut" (colonial institute) as part of the larger press archives, it was maintained by the Hamburg Institute of International Economics (HWWA) until 1998. Now it is part of the cultural heritage, which ZBW has decided to make freely available to the largest possible extend, as part of its Open Access and Open Science policy. While the digitized pages are provided reliably under stable URIs on https://pm20.zbw.eu, the metadata has been donated to Wikidata.

For each ware (e.g., coal), there was a folder - or a series of folders - about the ware or commodity in general, its cultivation, extraction or production, trade, industry, and utilization. For each country, for which this ware was important, separate folders were created. For some important wares, such as coal, that amounted to thousands of documents in the general section, as well as for traditional production countries like the UK, but also more ephemeral deposits like the Philippines. In total, almost 37,000 press articles about coal production and consumption in the first half of the last century are accessible online.

The coverage of the archives (overview) extends to quite special sectors, such as amber or cotton machines. Sadly, only a small part of the commodities/wares archives is freely available on the web. The labour-intensive preparation of the folders was inevitable due to intellectual property law, but could be only achieved for one ninth of the documents. The rest of this material up to 1946, and another time slice with 600,000 pages until 1960, can be accessed as digitized microfilms on the ZBW premises (film overview 1908-1946, 1947-1960, systematic structure). Additionally, 15,000 microfiches cover the full time range of the archives until 1998.

Integration of the metadata into Wikidata

For the country category structure of the archive we used, as in the countries/subjects archive, existing Wikidata items. Most of the commodities and wares categories were also already present as items, and we matched and linked them via OpenRefine. Only a handfull of special, artifical categories (like Axe, hatchet, hammer) had to be created.

We then, for each folder of the archive, built an item in Wikidata, defined by a commodity/ware and a country category, and linking to the according folder in the press archives (e.g., Coal : United States of America). For the general, non-country specific folders, the commodities/ware category was combined with the item for "world", as in Banana : World. The diversity of the archive's topics in Wikidata shows up

first results from Wikidata query on PM20 wares

in a colorful picture (live query), providing an entry point into the archive.

In total, 2891 items representing PM20 ware/category folders were created. As this last archive, the integration of the 20th Century Press Archives' metadata into Wikidata is completed. Every folder of the archives is represented in Wikidata and links to digitized press clippings and other material about its topic. How these Wikidata items can be used in queries and applications will be the subject of another ZBW labs blog entry.

Integrating the PM20 companies archive: part 3 of the data donation to Wikidata

ZBW inherited a large trove of historical company information - annual reports, newspaper clippings and other material about more than 40,000 companies and other organizations around the world. Parts of these, in particular all about German und British entities until 1949, are available free and online in the companies section (list by country) of the 20th Century Press Archives. More digitized folders with material about companies in and outside of Europe up to 1960 are accessible only on ZBW premises, due to intellectual property rights.

As a part of its support for Open Science, ZBW has made all metadata of the 20th Century Press Archives available under a CC0 license. In order to make the folders more easily accessible for business history research as well as for the general public, we have added links for every single folder to Wikidata. In addition to that, the metadata about companies and organizations, such as inception date or links to board members, has been added to the large amount of company data already available in Wikidata. This continues the PM20 data donation of ZBW to Wikidata, as described earlier for the persons archives and the countries/subjects archives. The activities were carried out - with notable help of volunteers - and documented in the WikiProject 20th Century Press Archives.

The mapping process to Wikidata items

Many of the PM20 company and organization folders deal with existing items in Wikidata. If GND identifiers were assigned to these items, we directly created links to PM20 companies with the same id, and were done. Matching and linking to Wikidata items without the help of a unique identifier however provided some challenge. Different from person names, company names change frequently, or are spelled differently in different times or languages. Not too uncommon, the entities themselves change through mergers and acquisitions, and may or may not have been represented by a new folder in PM20, or by a different item in Wikidata. Subsidiaries may be subsumed under the parent organization, or be separate entities. While it is relatively easy to split items in Wikidata, in the folders with printed newspaper clippings and reports it meant digging through sometimes hundreds of pages to single out a company retrospectively. So early decisons about the cutting and delimitation of folders often stuck for the following decades. All of that made it more difficult not only to obtain matches at all, but also to decide if indeed the same entity is covered.

Data donation to Wikidata, part 2: country/subject dossiers of the 20th Century Press Archives

The world's largest public newspaper clippings archive comprises lots of material of great interest particularly for authors and readers in the Wikiverse. ZBW has digitized the material from the first half of the last century, and has put all available metadata under a CC0 license. More so, we are donating that data to Wikidata, by adding or enhancing items and providing ways to access the dossiers (called "folders") and clippings easily from there.

Challenges of modelling a complex faceted classification in Wikidata

That had been done for the persons' archive in 2019 - see our prior blog post. For persons, we could just link from existing or a few newly created person items to the biographical folders of the archive. The countries/subjects archives provided a different challenge: The folders there were organized by countries (or continents, or cities in a few cases, or other geopolitical categories), and within the country, by an extended subject category system (available also as SKOS). To put it differently: Each folder was defined by a geo and a subject facet - a method widely used in general purpose press archives, because it allowed a comprehensible and, supported by a signature system, unambiguous sequential shelf order, indispensable for quick access to the printed material.

Wikidata as authority linking hub: Connecting RePEc and GND researcher identifiers

In the EconBiz portal for publications in economics, we have data from different sources. In some of these sources, most notably ZBW's "ECONIS" bibliographical database, authors are disambiguated by identifiers of the Integrated Authority File (GND) - in total more than 470,000. Data stemming from "Research papers in Economics" (RePEc) contains another identifier: RePEc authors can register themselves in the RePEc Author Service (RAS), and claim their papers. This data is used for various rankings of authors and, indirectly, of institutions in economics, which provides a big incentive for authors - about 50,000 have signed into RAS - to keep both their article claims and personal data up-to-date. While GND is well known and linked to many other authorities, RAS had no links to any other researcher identifier system. Thus, until recently, the author identifiers were disconnected, which precludes the possibility to display all publications of an author on a portal page.

To overcome that limitation, colleagues at ZBW have matched a good 3,000 authors with RAS and GND IDs by their publications (see details here). Making that pre-existing mapping maintainable and extensible however would have meant to set up some custom editing interface, would have required storage and operating resources and wouldn't easily have been made publicly accessible. In a previous article, we described the opportunities offered by Wikidata. Now we made use of it.

Economists in Wikidata: Opportunities of Authority Linking

Wikidata is a large database, which connects all of the roughly 300 Wikipedia projects. Besides interlinking all Wikipedia pages in different languages about a specific item – e.g., a person -, it also connects to more than 1000 different sources of authority information.

The linking is achieved by a „authority control“ class of Wikidata properties. The values of these properties are identifiers, which unambiguously identify the wikidata item in external, web-accessible databases. The property definitions includes an URI pattern (called „formatter URL“). When the identifier value is inserted into the URI pattern, the resulting URI can be used to look up the authoritiy entry. The resulting URI may point to a Linked Data resource - as it is the case with the GND ID property. This, on the one hand, provides a light-weight and robust mechanism to create links in the web of data. On the other hand, these links can be exploited by every application which is driven by one of the authorities to provide additional data: Links to Wikipedia pages in multiple languages, images, life data, nationality and affiliations of the according persons, and much more.

Bini Agarwal - Sqid screenshot

Wikidata item for the Indian Economist Bina Agarwal, visualized via the SQID browser

Subscribe to RSS - Wikidata