The world's largest public newspaper clippings archive comprises lots of material of great interest particularly for authors and readers in the Wikiverse. ZBW has digitized the material from the first half of the last century, and has put all available metadata under a CC0 license. More so, we are donating that data to Wikidata, by adding or enhancing items and providing ways to access the dossiers (called "folders") and clippings easily from there.
Here we describe the process of building the interactive SWIB20 participants map, created by a query to Wikidata. The map was intended to support participants of SWIB20 to make contacts in the virtual conference space. However, in compliance with GDPR we want to avoid publishing personal details. So we choose to publish a map of institutions, to which the participants are affiliated. (Obvious downside: the 9 un-affiliated participants could not be represented on the map).
We suppose that the method can be applied to other conferences and other use cases - e.g., the downloaders of scientific software or the institutions subscribed to an academic journal. Therefore, we describe the process in some detail.
by Franz Osorio, Timo Borst
- With the rise of social networks, platforms and their use also by academics and research communities, the term 'metrics' itself has gained a broader meaning: while traditional citation indexes only track citations of literature published in (other) journals, 'mentions', 'reads' and 'tweets', albeit less formal, have become indicators and measures for (scientific) impact.
- Altmetrics has influenced research performance, evaluation and measurement, which formerly had been exclusively associated with traditional bibliometrics. Scientists are becoming aware of alternative publishing channels and both the option and need of 'self-advertising' their output.
- In particular academic libraries are forced to manage their journal subscriptions and holdings in the light of increasing scientific output on the one hand, and stagnating budgets on the other. While editorial products from the publishing industry are exposed to a global competing market requiring a 'brand' strategy, altmetrics may serve as additional scattered indicators for scientific awareness and value.
Against this background, we took the opportunity to collect, process and display some impact or signal data with respect to literature in economics from different sources, such as 'traditional' citation databases, journal rankings and community platforms resp. altmetrics indicators:
- CitEc. The long-standing citation service maintainted by the RePEc community provided a dump of both working papers (as part of series) and journal articles, the latter with significant information on classic impact factors such as impact factor (2 and 5 years) and h-index.
- Rankings of journals in economics including Scimago Journal Rank (SJR) and two German journal rankings, that are regularly released and updated (VHB Jourqual, Handelsblatt Ranking).
- Usage data from Altmetric.com that we collected for those articles that could be identified via their Digital Object Identifier.
- Usage data from the scientific community platform and reference manager Mendeley.com, in particular the number of saves or bookmarks on an individual paper.
Results and evaluation
Outlook and future work
William W. Hood, Concepcion S. Wilson. The Literature of Bibliometrics, Scientometrics, and Informetrics. Scientometrics 52, 291–314 Springer Science and Business Media LLC, 2001. Link
R. Schimmer. Disrupting the subscription journals’ business model for the necessary large-scale transformation to open access. (2015). Link
Mike Thelwall, Stefanie Haustein, Vincent Larivière, Cassidy R. Sugimoto. Do Altmetrics Work? Twitter and Ten Other Social Web Services. PLoS ONE 8, e64841 Public Library of Science (PLoS), 2013. Link
The ZBW - Open Science Future. Link
Sarah Anne Murphy. Data Visualization and Rapid Analytics: Applying Tableau Desktop to Support Library Decision-Making. Journal of Web Librarianship 7, 465–476 Informa UK Limited, 2013. Link
Christina Kläre, Timo Borst. Statistic packages and their use in research in Economics | EDaWaX - Blog of the project ’European Data Watch Extended’. EDaWaX - European Data Watch Extended (2017). Link
Back in 2015 the ZBW Leibniz Information Center for Economics (ZBW) teamed up with the Göttingen State and university library (SUB), the Service Center of Götting library federation (VZG) and GESIS Leibniz Institute for the Social Sciences in the *metrics project funded by the German Research Foundation (DFG). The aim of the project was: “… to develop a deeper understanding of *metrics, especially in terms of their general significance and their perception amongst stakeholders.” (*metrics project about).
In the practical part of the project the following DSpace based repositories of the project partners participated as data sources for online publications and – in the case of EconStor – also as implementer for the presentation of the social media signals:
- EconStor - a subject repository for economics and business studies run by the ZBW, currently (Aug. 2019) containing round about 180,000 downloadable files,
- GoeScholar - the Publication Server of the Georg-August-Universität Göttingen run by the SUB Göttingen, offering approximately 11,000 publicly browsable items so far,
- SSOAR - the “Social Science Open Access Repository” maintained by GESIS, currently containing about 53,000 publicly available items.
In the work package “Technology analysis for the collection and provision of *metrics” of the project an analysis of currently available *metrics technologies and services had been performed.
As stated by [Wilsdon 2017], currently suppliers of altmetrics “remain too narrow (mainly considering research products with DOIs)”, which leads to problems to acquire *metrics data for repositories like EconStor with working papers as the main content. As up to now it is unusual – at least in the social sciences and economics – to create DOIs for this kind of documents. Only the resulting final article published in a journal will receive a DOI.
Based on the findings in this work package, a test implementation of the *metrics crawler had been built. The crawler had been actively deployed from early 2018 to spring 2019 at the VZG. For the aggregation of the *metrics data the crawler had been fed with persistent identifiers and metadata from the aforementioned repositories.
At this stage of the project, the project partners still had the expectation, that the persistent identifiers (e.g. handle, URNs, …), or their local URL counterparts, as used by the repositories could be harnessed to easily identify social media mentions of their documents, e.g. for EconStor:
- handle: “hdl:10419/…”
- handle.net resolver URL: “http(s)://hdl.handle.net/10419/…”
- EconStor landing page URL with handle: “http(s)://www.econstor.eu/handle/10419/…”
- EconStor bitstream (PDF) URL with handle: “http(s)://www.econstor.eu/bitstream/10419/…”
ZBW is donating a large open dataset from the 20th Century Press Archives to Wikidata, in order to make it better accessible to various scientific disciplines such as contemporary, economic and business history, media and information science, to journalists, teachers, students, and the general public.
The 20th Century Press Archives (PM20) is a large public newspaper clippings archive, extracted from more than 1500 different sources published in Germany and all over the world, covering roughly a full century (1908-2005). The clippings are organized in thematic folders about persons, companies and institutions, general subjects, and wares. During a project originally funded by the German Research Foundation (DFG), the material up to 1960 has been digitized. 25,000 folders with more than two million pages up to 1949 are freely accessible online. The fine-grained thematic access and the public nature of the archives makes it to our best knowledge unique across the world (more information on Wikipedia) and an essential research data fund for some of the disciplines mentioned above.
The data donation does not only mean that ZBW has assigned a CC0 license to all PM20 metadata, which makes it compatible with Wikidata. (Due to intellectual property rights, only the metadata can be licensed by ZBW - all legal rights on the press articles themselves remain with their original creators.) The donation also includes investing a substantial amount of working time (during, as planned, two years) devoted to the integration of this data into Wikidata. Here we want to share our experiences regarding the integration of the persons archive metadata.
At 27th and 28th of October, the Kick-off for the "Kultur-Hackathon" Coding da Vinci is held in Mainz, Germany, organized this time by GLAM institutions from the Rhein-Main area: "For five weeks, devoted fans of culture and hacking alike will prototype, code and design to make open cultural data come alive." New software applications are enabled by free and open data.
For the first time, ZBW is among the data providers. It contributes the person and company dossiers of the 20th Century Press Archive. For about a hundred years, the predecessor organizations of ZBW in Kiel and Hamburg had collected press clippings, business reports and other material about a wide range of political, economic and social topics, about persons, organizations, wares, events and general subjects. During a project funded by the German Research Organization (DFG), the documents published up to 1948 (about 5,7 million pages) had been digitized and are made publicly accessible with according metadata, until recently solely in the "Pressemappe 20. Jahrhundert" (PM20) web application. Additionally, the dossiers - for example about Mahatma Gandhi or the Hamburg-Bremer Afrika Linie - can be loaded into a web viewer.
As a first step to open up this unique source of data for various communities, ZBW has decided to put the complete PM20 metadata* under a CC-Zero license, which allows free reuse in all contexts. For our Coding da Vinci contribution, we have prepared all person and company dossiers which already contain documents. The dossiers are interlinked among each other. Controlled vocabularies (for, e.g., "country", or "field of activity") provide multi-dimensional access to the data. Most of the persons and a good share of organizations were linked to GND identifiers. As a starter, we had mapped dossiers to Wikidata according to existing GND IDs. That allows to run queries for PM20 dossiers completely on Wikidata, making use of all the good stuff there. An example query shows the birth places of PM20 economists on a map, enriched with images from Wikimedia commons. The initial mapping was much extended by fantastic semi-automatic and manual mapping efforts by the Wikidata community. So currently more than 80 % of the dossiers about - often rather prominent - PM20 persons are linked not only to Wikidata, but also connected to Wikipedia pages. That offers great opportunities for mash-ups to further data sources, and we are looking forward to what the "Coding da Vinci" crowd may make out of these opportunities.
Technically, the data has been converted from an internal intermediate format to still quite experimental RDF and loaded into a SPARQL endpoint. There it was enriched with data from Wikidata and extracted with a construct query. We have decided to transform it to JSON-LD for publication (following practices recommended by our hbz colleagues). So developers can use the data as "plain old JSON", with the plethora of web tools available for this, while linked data enthusiasts can utilize sophisticated Semantic Web tools by applying the provided JSON-LD context. In order to make the dataset discoverable and reusable for future research, we published it persistently at zenodo.org. With it, we provide examples and data documentation. A GitHub repository gives you additional code examples and a way to address issues and suggestions.
* For the scanned documents, the legal regulations apply - ZBW cannot assign licenses here.
In the EconBiz portal for publications in economics, we have data from different sources. In some of these sources, most notably ZBW's "ECONIS" bibliographical database, authors are disambiguated by identifiers of the Integrated Authority File (GND) - in total more than 470,000. Data stemming from "Research papers in Economics" (RePEc) contains another identifier: RePEc authors can register themselves in the RePEc Author Service (RAS), and claim their papers. This data is used for various rankings of authors and, indirectly, of institutions in economics, which provides a big incentive for authors - about 50,000 have signed into RAS - to keep both their article claims and personal data up-to-date. While GND is well known and linked to many other authorities, RAS had no links to any other researcher identifier system. Thus, until recently, the author identifiers were disconnected, which precludes the possibility to display all publications of an author on a portal page.
To overcome that limitation, colleagues at ZBW have matched a good 3,000 authors with RAS and GND IDs by their publications (see details here). Making that pre-existing mapping maintainable and extensible however would have meant to set up some custom editing interface, would have required storage and operating resources and wouldn't easily have been made publicly accessible. In a previous article, we described the opportunities offered by Wikidata. Now we made use of it.
The Journal of Economic Literature Classification Scheme (JEL) was created and is maintained by the American Economic Association. The AEA provides this widely used resource freely for scholarly purposes. Thanks to André Davids (KU Leuven), who has translated the originally English-only labels of the classification to French, Spanish and German, we provide a multi-lingual version of JEL. It's lastest version (as of 2017-01) is published in the formats RDFa and RDF download files. These formats and translations are provided "as is" and are not authorized by AEA. In order to make changes in JEL tracable more easily, we have created lists of inserted and removed JEL classes in the context of the skos-history project.
Wikidata is a large database, which connects all of the roughly 300 Wikipedia projects. Besides interlinking all Wikipedia pages in different languages about a specific item – e.g., a person -, it also connects to more than 1000 different sources of authority information.
The linking is achieved by a „authority control“ class of Wikidata properties. The values of these properties are identifiers, which unambiguously identify the wikidata item in external, web-accessible databases. The property definitions includes an URI pattern (called „formatter URL“). When the identifier value is inserted into the URI pattern, the resulting URI can be used to look up the authoritiy entry. The resulting URI may point to a Linked Data resource - as it is the case with the GND ID property. This, on the one hand, provides a light-weight and robust mechanism to create links in the web of data. On the other hand, these links can be exploited by every application which is driven by one of the authorities to provide additional data: Links to Wikipedia pages in multiple languages, images, life data, nationality and affiliations of the according persons, and much more.
Wikidata item for the Indian Economist Bina Agarwal, visualized via the SQID browser
Authors: Timo Borst, Konstantin Ott
In recent years, repositories for managing research data have emerged, which are supposed to help researchers to upload, describe, distribute and share their data. To promote and foster the distribution of research data in the light of paradigms like Open Science and Open Access, these repositories are normally implemented and hosted as stand-alone applications, meaning that they offer a web interface for manually uploading the data, and a presentation interface for browsing, searching and accessing the data. Sometimes, the first component (interface for uploading the data) is substituted or complemented by a submission interface from another application. E.g., in Dataverse or in CKAN data is submitted from remote third-party applications by means of data deposit APIs . However the upload of data is organized and eventually embedded into a publishing framework (data either as a supplement of a journal article, or as a stand-alone research output subject to review and release as part of a ‘data journal’), it definitely means that this data is supposed to be made publicly available, which is often reflected by policies and guidelines for data deposit.