Building an Infrastructure for Cost Data Transparency
The lack of cost transparency in scholarly publishing poses significant challenges for institutions and researchers. Our project, openCost, aims to fix that.
The lack of cost transparency in scholarly publishing poses significant challenges for institutions and researchers. Our project, openCost, aims to fix that.
The transition to open access (OA) has increased the visibility of scholarly work, but understanding and managing publication costs remains complex. Shifting from subscription-based to fee-based or institution-financed models fundamentally changes processes, financial flows, and the roles of stakeholders. Additionally, the lack of cost transparency from publishers poses significant challenges for institutions and researchers, who often face hidden or unclear fees, making budgeting and financial planning more complicated. Our project, openCost, addresses this issue head-on.
The transition to OA has been characterized by a wide range of business models and workflows as well as an often complex and incomplete documentation of publication costs. As a result, institutions and researchers regularly have to deal with hidden or unclear fees, making budget planning a challenge. In addition, although many institutions already track publication costs in some form, this tracking is neither uniform nor complete. Clearly, sharing cost data can help maintain the principles of fairness and cost transparency in scholarly publishing.
In speaking with librarians and repository providers, our team learned that these organizations are willing to share cost data, but, given the magnitude of information involved, they need help achieving the necessary degree of automation.
This is where openCost comes into the picture.
Initiated by three project partners—Bielefeld University Library, Deutsches Elektronen-Synchrotron (DESY) in Hamburg, and the University Library of Regensburg—openCost has developed a standardized metadata schema (opencost, 2023) to record, retrieve, and map a scientific institution’s publication costs. The project partners record their publication cost data in their repositories and make it available for free exchange in the openCost format, enabling automated harvesting by aggregators like OpenAPC (OpenAPC, n.d.-a). For this harvesting process, the openCost project team recommends the well-established Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) interface (Open Archives, n.d.)., which allows any interested party, from global aggregators to local evaluation tools, to freely aggregate cost data from the primary sources, and allows institutions to easily correct and update data. Due to its incremental nature, it also scales well, even for large numbers of records. Additionally, we’ve expanded the Electronic Journals Library (EZB), one of the most comprehensive bibliographic databases of scientific electronic journals (Electronic Journals Library, 2022), to include information regarding publication costs and their funding. This enables participating institutions to use the EZB as a central platform for communicating OA information to researchers.
By enabling efficient data exchange and harvesting by aggregators, the openCost approach facilitates the sharing of cost information not only within institutions, but also at the national and international levels, making price increases transparent and improving the negotiating position of libraries and consortia.
A central aspect of the openCost project is community involvement. In order to cover as many use cases as possible and to identify missing elements or inaccuracies, we invited international experts on publication costs into the discussion via a 2022 workshop, “openCost: the road to publication cost transparency,” at which we identified problems related to cost monitoring and determined solutions. The results (Schweighofer & Wagner, 2023) laid the foundation for further project work and the development of the metadata schema. We successively published our project results in our GitHub repository (to which community members can contribute), gathered feedback, and adapted the schema accordingly (openCost, 2023).
openCost provides a metadata block for publication costs that can be embedded in other metadata schemas defined by an institution. The openCost schema focuses only on costs. The mandatory elements were designed to be simple enough to prevent major hurdles during implementation, yet complex enough to cover as many use cases as possible. We did not want to reproduce or repeat data from other well-established schemas (for example, bibliographic description). To be usable as a standalone exchange format, the schema deploys unique persistent identifiers (PIDs) to identify the cost-bearing entity (for example, for a journal article, a DOI) (Schweighofer, 2024).
Our aim is to record and exchange not only OA fees, such as Article Processing Charges (APCs), but to capture all the payments that an institution makes to publishers, including color or page charges, submission fees or costs from cofinancing, and the costs of memberships or transformative agreements (German Science, 2022). In addition, the standardized, machine-readable format enables the easy exchange of cost data worldwide.
Initially, openCost focused on journal publications, with our first metadata schema covering various cost items, including processing fees, for fee-based, individual articles—for example, articles in gold open access journals funded by APCs. Next, we developed a schema that includes costs from contracts like so-called transformative agreements, abstracting and mapping special payment modalities, including, for example, those negotiated by DEAL in Germany (2022). Individual publications—for example, an article that is paid centrally under a non-APC payment model—can be linked to contract elements of the schema to model relationships.
In addition to mapping OA costs for journal articles and costs associated with transformative agreements, we wanted to ensure that the schemas could be easily adapted for subscription fees. To this end, we’ve made the development of the relevant schemas and their documentation available in a GitHub repository (opencost, 2023).
The openCost metadata schema provides standardized vocabularies and a standardized structure, enabling the exchange of cost data and improving cost transparency in scientific publishing. Solid and transparent data on payments is essential if stakeholders are to monitor and manage the open access transformation, avoid ever increasing costs, and gain a comprehensive overview of financial flows and cost trends.
As previously mentioned, cost data provided by institutions in the openCost format can be harvested by aggregators like OpenAPC. Established in 2014 at Bielefeld University Library, OpenAPC collects datasets on fees paid for OA publishing and disseminates them on GitHub under an open database license (OpenAPC, 2024a), enabling others to reuse the data and perform their own analyses and calculations. The initiative aggregates APCs, Book Processing Charges (BPCs), and data on articles published under transformative agreements. The overall aims of OpenAPC are to increase cost transparency in the area of OA publishing and to enable cost comparisons between institutions. In addition, OpenAPC tracks the development of costs over time.
Data is provided voluntarily by 432 institutions worldwide, including universities and other higher education institutions (HEIs), funders, and national consortia. This extensive participation shows that institutions recognize the need for financial transparency if the OA transformation is to be professionally managed. In the APC and BPC datasets, OpenAPC currently provides the following cost data:
Caption FIGURE 1 OpenAPC cost data as of 08/13/2024
All data aggregated by OpenAPC is stored in the project’s own repository on GitHub (OpenAPC, 2024a). In addition, OpenAPC runs an OLAP server (OpenAPC, n.d.-b) that serves as a backend for treemap visualizations (OpenAPC, n.d.-c) and also provides a public application programming interface (API) to automatically query the OpenAPC data.
An autonomous service, OpenAPC is nevertheless closely linked to openCost; as part of the openCost project the OpenAPC team is currently modifying and extending the OpenAPC infrastructure to standardize workflows and increase their degree of automation.
To date, most data is submitted to OpenAPC as csv files via email or GitHub. A few institutions that already store their publication costs in their institutional repositories have made it possible for OpenAPC to harvest this data via OAI-PMH, initially using a basic data exchange format. Switching to the openCost schema standardizes and unifies this process and provides the means for broad reuse. As a proof of concept, OpenAPC is already harvesting APC cost data from the repositories of the project partners and institutions of the JOIN² consortium—a collaborative repository solution consisting of eight institutions of which DESY is part (JOIN², 2023)—that have implemented the openCost standard. OpenAPC acts as a service provider for openCost, with the project partners contributing data.
OpenAPC initially focused on collecting and analyzing OA charges. However, switching to the openCost schema has made it possible for OpenAPC to record additional publishing costs alongside OA fees (OpenAPC, 2024b).
The Electronic Journals Library (EZB) offers a fast, structured, and unified interface through which users can access full-text articles from scholarly journals. As of August 2024, it comprises 117,026 titles from all areas of research, 29,049 of which are only available online. In addition, it lists 145,619 journals available through aggregators and contains 81,217 journals that are accessible free of charge. Participating libraries can also provide access through EZB to journals they subscribe to.
As part of openCost, the EZB team expanded the user interface to display expected publication costs at the journal level and to provide researchers information on funding options. EZB now includes data fields with information from the Directory of Open Access Journals (DOAJ) (2024), allowing users to assess journal quality, publishing procedures, and funding opportunities. Cost information from OpenAPC has also been integrated into the user interface, so that, provided OpenAPC holds enough data, EZP will display the average costs of a journal both globally and on an institution-specific basis. This enables a quick cost comparison at the journal level and provides researchers a user-friendly way to weigh different publication options.
In addition, EZB administrators can now manually enter information about options to fund publication costs—for example, contact details, specific restrictions, upper limits for assumed publication costs, or the procedure for submitting invoices. For a fine-grained presentation of the cost coverage conditions, EZB journals can also be assigned categories, either automatically or manually. Each journal category (for instance, Open Access, Mirror Journal, Diamond Open Access Journal, etc.) is a unique keyword and can be used as an additional filter criterion.
In addition to providing further detail for researchers considering different publication options, these features help library staff to report information on their institution’s cost coverage in a straightforward way and to maintain it with little effort.
An essential consideration: where should data on publication costs be stored to ensure comprehensive and accessible information management?
Most bibliographic, technical, legal, and administrative metadata are already stored in institutional repositories. These repositories are also well-suited for consolidating cost data—including fees, funding details, and information about transformative agreements—that are currently scattered and incomplete across systems. Centralizing this data at the level of individual publications allows for flexible metadata combinations and supports in-depth statistical analyses.
Unlike library systems, repositories offer well-connected interfaces that facilitate efficient data exchange and meet funder requirements for long-term storage and open accessibility. Storing cost data in repositories ensures greater transparency and usability while addressing the current lack of detailed publication information in administrative systems.
Our project partners have successfully added their cost data to their institutional repositories and make them available for exchange in the openCost format. Furthermore, they have incorporated cost information from OpenAPC to show, for instance, who paid for a publication if it was not the institution itself. Details on costs and paying institutions are now accessible on each publication’s detail pages.
Institutions have pursued two prototypical approaches: the University Library of Regensburg uses data from OpenAPC to enrich the metadata of its publications directly within its repository, which makes it possible to determine how much and how often one institution benefits from others (see Figure 4, which shows that the University of Regensburg benefited from the TU Munich, which paid for the publication).
DESY and Bielefeld University Library, on the other hand, utilize the DOI of a publication to query the OpenAPC OLAP server, enabling immediate updates upon OpenAPC data refreshes without altering the repository’s internal format (OpenAPC, 2023a). Evaluations of financial interactions among institutions, on the other hand, require the use of external tools to access the OpenAPC dataset.
Both approaches leverage the OAI harvesting favored by openCost to provide frequent and prompt updates to OpenAPC data.
By creating a technical infrastructure that allows users to share publication costs via standardized interfaces, openCost has taken an important step toward addressing the complexity of cost documentation in scientific publishing.
In the future, we plan to develop a metadata schema to cover other types of publication costs. We hope by continuing to streamline cost documentation, we will promote fair and transparent scholarly publishing.
More information about our project can be found at:
https://www.opencost.de/en/project/
DEAL Konsortium. (2022). DEAL agreements. MPDL Services gGmbH. https://deal-konsortium.de/en/agreements
Directory of Open Access Journals (DOAJ). (2024). Directory of Open Access Journals. https://doaj.org/
Electronic Journals Library. (2022). About the EZB. https://ezb.ur.de/about.phtml?lang=en
German Science and Humanities Council | Wissenschaftsrat. (2022). Recommendations on the Transformation of Academic Publishing: Towards Open Access. https://doi.org/10.57674/0gtq-b603
JOIN². (2023). JOIN² - Just anOther INvenio INstance. https://join2.de/
OpenAPC. (n.d.-a). OpenAPC Home. Bielefeld University Library. https://openapc.net/
OpenAPC. (n.d.-b). OpenAPC OLAP server. https://olap.openapc.net/
OpenAPC. (n.d.-c). OpenAPC treemaps. https://treemaps.openapc.net/
OpenAPC. (2023a). openapc-olap/HOWTO.md. https://github.com/OpenAPC/openapc-olap/blob/master/HOWTO.md
OpenAPC. (2024a). OpenAPC data. https://github.com/OpenAPC/openapc-de/tree/master/data
OpenAPC. (2024b). apc_de_additional_costs.csv. https://github.com/OpenAPC/openapc-de/blob/master/data/apc_de_additional_costs.csv
Open Archives Initiative. (n.d.) Open Archives Initiative Protocol for Metadata Harvesting. https://www.openarchives.org/pmh/
openCost. (2023). The openCost metadata schemas. https://github.com/opencost-de/opencost
openCost. (2024). openCost XSD. https://github.com/opencost-de/opencost/blob/main/doc/opencost.xsd
Schweighofer, B., & Wagner, A. (2023). openCost on the road to publication cost transparency. In openCost: The Road to Publication Cost Transparency. Verlag Deutsches Elektronen-Synchrotron DESY. https://bib-pubdb1.desy.de/record/583603
Schweighofer, B. (2024, June 11-13). Enhancing Cost Transparency: The Role of Persistent Identifiers in the openCost Metadata Schema. [Conference presentation]. PIDfest Conference on Persistent Identifiers, Prague, Czech Republic. https://doi.org/10.3204/PUBDB-2024-04976
10.1146/katina-121824-1