Integrating altmetrics into a subject repository - EconStor as a use case

2019-11-21 by Wolfgang Riese

Back in 2015 the ZBW Leibniz Information Center for Economics (ZBW) teamed up with the Göttingen State and university library (SUB), the Service Center of Götting library federation (VZG) and GESIS Leibniz Institute for the Social Sciences in the *metrics project funded by the German Research Foundation (DFG). The aim of the project was: “… to develop a deeper understanding of *metrics, especially in terms of their general significance and their perception amongst stakeholders.” (*metrics project about).

In the practical part of the project the following DSpace based repositories of the project partners participated as data sources for online publications and – in the case of EconStor – also as implementer for the presentation of the social media signals:

EconStor - a subject repository for economics and business studies run by the ZBW, currently (Aug. 2019) containing round about 180,000 downloadable files,
GoeScholar - the Publication Server of the Georg-August-Universität Göttingen run by the SUB Göttingen, offering approximately 11,000 publicly browsable items so far,
SSOAR - the “Social Science Open Access Repository” maintained by GESIS, currently containing about 53,000 publicly available items.

In the work package “Technology analysis for the collection and provision of *metrics” of the project an analysis of currently available *metrics technologies and services had been performed.

As stated by [Wilsdon 2017], currently suppliers of altmetrics “remain too narrow (mainly considering research products with DOIs)”, which leads to problems to acquire *metrics data for repositories like EconStor with working papers as the main content. As up to now it is unusual – at least in the social sciences and economics – to create DOIs for this kind of documents. Only the resulting final article published in a journal will receive a DOI.

Based on the findings in this work package, a test implementation of the *metrics crawler had been built. The crawler had been actively deployed from early 2018 to spring 2019 at the VZG. For the aggregation of the *metrics data the crawler had been fed with persistent identifiers and metadata from the aforementioned repositories.

At this stage of the project, the project partners still had the expectation, that the persistent identifiers (e.g. handle, URNs, …), or their local URL counterparts, as used by the repositories could be harnessed to easily identify social media mentions of their documents, e.g. for EconStor:

handle: “hdl:10419/…”
handle.net resolver URL: “http(s)://hdl.handle.net/10419/…”
EconStor landing page URL with handle: “http(s)://www.econstor.eu/handle/10419/…”
EconStor bitstream (PDF) URL with handle: “http(s)://www.econstor.eu/bitstream/10419/…”

This resulted in two datasets: One for publications identified by DOIs (doi:10.xxxx/yyyyy) or the respective metadata from Crossref and one for documents identified by the repository URLs (https://www.econstor.eu/handle/10419/xxxx) or the items metadata stored in the repository.

During the first part of the project several social media platforms had been identified as possible data sources for the implementation phase. This had been done by interviews and online surveys. For the resulting ranking see the Social Media Registry. Additional research examined which social media platforms are relevant to researchers at different stages of their career and if and how they use them (see: [Lemke 2018], [Lemke 2019] and [Mehrazar 2018]). This list of possible sources for social media citations or mentions had then been further reduced to the following six social media platforms which are offering free and open available online APIs:

Facebook
Mendeley
Reddit
Twitter
Wikipedia
Youtube

Of particular interest to the EconStor team were the social media services Mendeley and Twitter, as those had been found being among the “Top 3 most used altmetric sources …” for Economic and Business Studies (EBS) journals “… - with Mendeley being the most complete platform for EBS journals” [Nuredini 2016].

*metrics integration in EconStor

In early 2019 the EconStor team finally received a MySQL data dump of the compiled data which had been collected by the *metrics crawler. In consultations between the project partners and based on the aforementioned research, it became clear, that only the collected data from Mendeley, Twitter and Wikipedia were suitable to be embedded into EconStor. It was also made clear, by the VZG, that it had been nearly impossible to use handle or respective local URLs to extract social media mentions from the free of charge provided APIs of the different social media services. Instead, in case of Wikipedia ISBNs had been used and for Mendeley the title and author(s) as provided in the repository’s metadata. Only for the search via the Twitter API the handle URLs had been used.

The datasets used by the *metrics crawler to identify works from EconStor included a dataset of 15,703 DOIs (~10% of the EconStor content back then), sometimes representing other manifestations of the documents stored in EconStor (e.g. pre- or postprint versions of an article), their respective metadata from the Crossref DOI registry and also a dataset of 153,807 EconStor documents identified by the handle/URL and metadata stored in the repository itself. This second dataset also included the documents related to the publications identified by the DOI set.

The following table (Table 1) shows the results of the *metrics crawler for items in EconStor. It displays one row for each service and the used identifier set. Each row also shows the time period during which the crawler harvested the service and how many unique items per identifier set were found during that period.

social media service (set)	harvested from	harvested until	unique EconStor items mentioned
Mendeley (DOI)	2018-08-06	2019-01-11	7,800
Mendeley (URL)	2019-01-10	2019-01-11	24,953
Twitter (DOI)	2018-02-13 (date of first captured tweet 2018-02-03)	2019-01-11 (date of last captured tweet 2019-01-10)	418
Twitter (URL)	2018-12-14 (date of first captured tweet 2018-12-05)	2019-01-11 (date of last captured tweet 2019-01-09)	32
Wikipedia (DOI)	2018-10-05	2019-01-11	93
Wikipedia (URL)	2019-01-11	2019-01-11	100

Table 1: Unique EconStor Items found per identifier set and social media service

The following table (Table 2) shows how many of the EconStor items were found with identifiers from both sets. As you can see, only for the service Mendeley the sets have a significant overlap. Which shows, that it is desirable for a service such as EconStor, to expand the captured coverage of its items in social media by the use of other identifies than just DOIs.

social media site	unique items identified by both DOI and URL
Mendeley	4,323
Twitter	0
Wikipedia	2

Table 2: Overlap in found identifiers

As a result of the project, the landing pages of EconStor items, which have been mentioned on Mendeley, twitter or Wikipedia during the time of data gathering, have now, for the time being, a listing of “Social Media Mentions”. This is in addition to the already existing cites and citations, based on the RePEc - CitEc service and the download statistics, which is displayed on separate pages.

Image 1: “EconStor item landing page”

The back end on the EconStor server is realized as a small RESTful Web service programmed in Java that returns JSON formatted data (see Figure 1). Given a list of identifiers (DOIs/handle) it returns the sum of mentions for Mendeley, Twitter and Wikipedia in the Database, per specified EconStor item, as well as the links to the counted tweets and Wikipedia articles. In case of Wikipedia this is also grouped by the language of the Wikipedia the mention was found in.

{
    "_metrics": {
        "sum_mendeley": 0,
        "sum_twitter": 3,
        "sum_wikipedia": 0
    },
    "identifier": "10419/144535",
    "identifiertype": "HANDLE",
    "repository": "EconStor",
    "tweetData": {
        "1075481976793116673": {
            "created_at": "Wed Dec 19 20:04:19 +0000 2018",
            "description": "Economist Wettbewerb Regulierung Monopole Economics @DICEHHU @HHU_de VWL Antitrust Düsseldorf Quakenbrück Berlin FC St. Pauli",
            "id_str": "1075481976793116673",
            "name": "Justus Haucap",
            "screen_name": "haucap"
        },
        "1075484066949025793": {
            "created_at": "Wed Dec 19 20:12:37 +0000 2018",
            "description": "Twitterkanal des Wirtschaftsdienst - Zeitschrift für Wirtschaftspolitik, hrsg.  von @ZBW_news; RT ≠ Zustimmung; Impressum: https://t.co/X0gevZb9lR",
            "id_str": "1075484066949025793",
            "name": "Wirtschaftsdienst",
            "screen_name": "Zeitschrift_WD"
        },
        "1075486159772504065": {
            "created_at": "Wed Dec 19 20:20:56 +0000 2018",
            "description": "Professor for International Economics at HTW Berlin - University of Applied Sciences; Senior Policy Fellow at the European Council on Foreign Relations",
            "id_str": "1075486159772504065",
            "name": "Sebastian Dullien",
            "screen_name": "SDullien"
        }
    },
    "twitterids": [
        "1075486159772504065",
        "1075484066949025793",
        "1075481976793116673"
    ],
    "wikipediaQuerys": {}
}

Figure 1: “Example json returned by webservice - Twitter mentions”

Image 2: “Mendeley and Twitter mentions”

During the creation of the landing page of an EconStor item (see Image 1), a JAVA servlet queries the web service and, if some social media mentions is detected, renders the result into the web page. For each of the three social media platforms the sum of the mentions is displayed and for Twitter and Wikipedia even backlinks to the mentioning tweets/articles are provided as a drop-down list, below the number of mentions (see Image 2). In case of Wikipedia this is also grouped by the languages of the articles in Wikipedia in which the ISBN of the corresponding work has been found.

Conclusion

While being an interesting addition to the existing download statistics and citations by RePEc/CitEc, that are already integrated into EconStor, currently the gathered “social media mentions” offer only a limited additional value to the EconStor landing pages. One reason might be, that only a fraction of all the documents of EconStor are covered. Another reason might be according to [Lemke 2019], that there is currently a great reluctance to use social media services among economists and social scientists, as it is perceived as: “unsuitable for academic discourse; … to cost much time; … separating personal from professional matters is bothersome; … increases the efforts necessary to handle information overload.”

Theoretically, the prospect of a tool for the measurement of the scientific uptake, with a quicker response time than classical bibliometrics, could be very rewarding, especially for a repository like EconStor with its many preprints (e.g. working papers) provided in open access.

As [Thelwall 2013] has stated: “In response, some publishers have turned to altmetrics, which are counts of citations or mentions in specific social web services because they can appear more rapidly than citations. For example, it would be reasonable to expect a typical article to be most tweeted on its publication day and most blogged within a month of publication.” and “Social media mentions, being available immediately after publication—and even before publication in the case of preprints…”.

But especially these preprints, that come without a DOI, are still a challenge to be correctly identified, and therefore to be counted as social media mentions. This is something the *metrics crawler has not changed, since it is using title and author metadata to search in Mendeley, which does not give a 100% sure identification and ISBNs to search in Wikipedia.

Even though a quick check revealed that at the time of writing this article (Aug. 2019) at least Wikipedia offers a handle search. A quick search for EconStor handles in the English Wikipedia returns now a list of 184 pages with mentions of “hdl:10419/”, the German Wikipedia 13 - but these are still very small numbers (Aug. 22nd, 2019: currently 179,557 full texts are available in EconStor).

https://en.wikipedia.org/w/api.php?action=query&list=search&srlimit=500&srsearch=%22hdl:10419%2F%22&srwhat=text&srprop&srinfo=totalhits&srenablerewrites=0&format=json

search via API in english wikipedia

Another problem is, that at the time of this writing, the *metrics crawler is not continuously operated, therefore our analysis is based on a data dump of social media mentions from spring 2018 to early 2019.

Since it is one of the major benefits of altmetrics that it can be obtained much faster and is more recent then classical citation-based metrics, it reduces the value of the continued integration of this static and continuously getting older dataset being integrated into EconStor landing pages. Hence, we are looking for more recent and regular updates of social media data that could serve as a ‘real-time’ basis for monitoring social media usage in economics.

As a consequence, we are currently looking for:
a) an institution to commit itself to run the *metrics crawler and
b) a more active social media usage in the sciences of Economics and Business Studies.

References

[Lemke 2018] Lemke, Steffen; Mehrazar, Maryam; Mazarakis, Athanasios; Peters, Isabella (2018): Are There Different Types of Online Research Impact?, In: Building & Sustaining an Ethical Future with Emerging Technology. Proceedings of the 81st Annual Meeting, Vancouver, Canada, 10–14 November 2018, ISBN 978-0-578-41425-6, Association for Information Science and Technology (ASIS&T), Silver Spring, pp. 282-289 http://hdl.handle.net/11108/394

[Lemke 2019] Lemke, Steffen; Mehrazar, Maryam; Mazarakis, Athanasios; Peters, Isabella (2019): “When You Use Social Media You Are Not Working”: Barriers for the Use of Metrics in Social Sciences, Frontiers in Research Metrics and Analytics, ISSN 2504-0537, Vol. 3, Iss. [Article] 39, pp. 1-18, http://dx.doi.org/10.3389/frma.2018.00039

[Mehrazar 2018] Maryam Mehrazar, Christoph Carl Kling, Steffen Lemke, Athanasios Mazarakis, and Isabella Peters (2018): Can We Count on Social Media Metrics? First Insights into the Active Scholarly Use of Social Media, WebSci ’18: 10th ACM Conference on Web Science, May 27–30, 2018, Amsterdam, Netherlands. ACM, New York, NY, USA, Article 4, 5 pages, https://doi.org/10.1145/3201064.3201101

[Metrics 2019] Einbindung von *metrics in EconStor, https://metrics-project.net/downloads/2019-03-28-EconStor-metrics-Abschluss-WS-SUB-G%C3%B6.pptx

[Nuredini 2016] Nuredini, Kaltrina; Peters, Isabella (2016): Enriching the knowledge of altmetrics studies by exploring social media metrics for Economic and Business Studies journals, Proceedings of the 21st International Conference on Science and Technology Indicators (STI Conference 2016), València (Spain), September 14-16, 2016, http://hdl.handle.net/11108/261

[OR2019] Relevance and Challenges of Altmetrics for Repositories - answers from the *metrics project. https://www.conftool.net/or2019/index.php/Paper-P7A-424Orth%2CWeiland_b.pdf?page=downloadPaper&filename=Paper-P7A-424Orth%2CWeiland_b.pdf&form_id=424&form_index=2&form_version=final

[Social Media Registry] Social Media Registry - Current Status of Social Media Plattforms and *metrics, https://docs.google.com/spreadsheets/d/10OALs5kxtmML4Naf1ShXh0cTmONE8q9EFhTzmgPINv4/edit?usp=sharing

[Thelwall 2013] Thelwall M, Haustein S, Larivie`re V, Sugimoto CR (2013): Do Altmetrics Work? Twitter and Ten Other Social Web Services. PLoS ONE 8(5): e64841. http://dx.doi.org/10.1371/journal.pone.0064841

[Wilsdon 2017] Wilsdon, James et al. (2017): Next-generation metrics: Responsible metrics and evaluation for open science. Report of the European Commission Expert Group on Altmetrics, ISBN 78-92-79-66130-3, http://dx.doi.org/10.2777/337729

Integrating altmetrics data into EconStor