Can the World’s Research Ecosystem be Openly Indexed?
OpenAlex offers a promising and reliable alternative to traditional subscription-based citation databases. But its creators still have a lot of work to do.
OpenAlex offers a promising and reliable alternative to traditional subscription-based citation databases. But its creators still have a lot of work to do.
OpenAlex (Priem et al., 2022) is a free, web-based academic search engine and comprehensive bibliographic catalog of scientific authors, institutions, and publications that has gained attention for its free-to-use datasets, user-friendly interface, and effective indexing of diamond open access journals. It has indexed over 260 million scholarly works from 260,800 sources (OpenAlex, 2024l) and continues to grow.
This review evaluates the latest version of OpenAlex, which has a significantly improved graph data model and offers a valuable alternative to traditional citation databases for scientists and research institutions (Simard, 2024; Tay et al., 2021).
OpenAlex employs artificial intelligence and automated algorithms to index scientific articles and journals. Named after the ancient Library of Alexandria, it began operations in January 2022 using the open dataset from the Microsoft Academic Graph (MAG) (Sinha, Shen, et al., 2015), the second-largest academic search engine after Google Scholar (OpenAlex, 2024a).
OpenAlex is stewarded by OurResearch, the fifth organization to commit to the Principles for Open Scholarly Infrastructure (POSI) (OpenAlex, 2024; POSI, 2024). Its web-based tools and services are maintained by Impactstory, Inc (OpenAlex, 2024m). As observed in a comparative study by Scheidsteger and Haunschild (2022), it doesn’t capture patents. Otherwise, in terms of bibliometric data sources, Open Alex operates effectively as an expansion of the MAG (Culbert et al., 2024).
OpenAlex’s data is licensed under Creative Commons Zero (CC0) (OpenAlex, 2024g), allowing for its free use and distribution.
Features and Functionality
Since its launch in 2022, OpenAlex has achieved several significant milestones: the introduction of full-text search (August 2022), a customer support ticket system (December 2022), a premium subscription offering (March 2023), enhanced author disambiguation (July 2023) (OpenAlex, 2023a).
It provides data and analytics through three main channels: data snapshots, OpenAlex API, and OpenAlex Web (user interface, UI) as illustrated in Figure 1.
Data snapshots can be downloaded freely, with standard snapshots updated approximately once a month. Full periodic snapshots of data are updated fortnightly and are suitable for users needing comprehensive datasets. The OpenAlex Application Programming Interface (API)—a fast, modern Representational State Transfer (REST) API—enables real-time information retrieval from the underlying database, supporting advanced research capabilities. The REST API has no rate limits, but users who are generating query loads over 100,000 requests per day are encouraged to use the data snapshot instead. The web-based graphic user interface, which is built on the REST API, allows for easy browsing and searching of the data.
The underlying code of OpenAlex is fully open-source and available on the GitHub OurResearch Repositories (GitHub, 2024). The cloud-based platform repositories (Figure 2) allow OpenAlex users to store, share, and showcase the code and files.
OpenAlex automatically indexes new journals and articles, requiring no action from end users. It primarily retrieves new records from Crossref, along with sources like PubMed and arXiu repositories that meet the criteria for new works. The search results are highly relevant and well-displayed on OpenAlex’s user-friendly web interface.
In addition to access to sources, OpenAlex offers citation information, including the number of citations, h-index, and i10-index.
Users can opt for a paid subscription that provides data updates as frequently as every few hours and higher API limits (OpenAlex, 2024d).
Scholarly Data Sources and Metadata
The two main data sources for OpenAlex are the MAG and CrossRef, supplemented by ORCID, the Research Organization Registry (ROR), the Directory of Open Access Journals (DOAJ), Unpaywall, PubMed, PubMed Central, the ISSN International Centre, and other institutional repositories (OpenAlex, 2024a, 2024f).
When compared to other scholarly data sources (e.g., Clarivate’s Web of Science and Elsevier’s Scopus, Dimensions, or Crossref), OpenAlex has a significantly higher volume of works, with approximately 50,000 added daily (OpenAlex, 2024j). As of this review, OpenAlex has started ingesting DataCite and has added HAL as a primary source for new works (OurResearch, 2024c).
As of December 29, 2024, OpenAlex has metadata for more than 260 million works, including journal articles and books, 23 percent of which are open access; over 100 million authors; 260,000 sites hosting works (including journals, conference, repositories, and ebook platforms); 110,000 institutions; 4,500 topics; and 65,000 Wikidata concepts linked to works through an automated multi-tag hierarchical classifier (OpenAlex, 2024e).
Schema and Persistent Identifier
OpenAlex’s API entities and graph data model have undergone significant changes since 2022 (Figure 3a). The current dataset (Figure 3b) comprises a heterogeneous directed graph consisting of eight types of scholarly API entities: WORKS, AUTHORS, SOURCES, INSTITUTIONS, TOPICS, PUBLISHERS, FUNDERS, and GEOS, with hundreds of millions of entities and billions of connections among them.
Works are scholarly documents such as journal articles, books, datasets, and theses. They are connected to CONCEPTS and objects: “grants,” “locations,” and “authorship.”
Authors are individuals who create works, connected to them via the “authorship” object, which formalizes the claim that “author A, affiliated with Institution(s) I, is a creator of work W.” Each Institution is identified by a ROR ID.
Sources are defined as places that host works. They are linked to the Publishers and connected to Works via “locations” objects.
Likewise, Funders provide financial support for research, and are connected to Works via “grants” objects. Works in OpenAlex are tagged with primary Topics based on a scoring model that considers information about the work, including its title, abstract, source (journal) name, and citations.
Geos refers to the geographic categorization of scholarly data; OpenAlex uses United Nations data to allow filtering by continent, Global South or North, and region.
Future Developments
OpenAlex continues to evolve. Planned improvements, based on user feedback, include bug fixing, adding entitles and user interface features, and refining the author and affiliation data, which will involve creating a full-fledged system of user-claimed author profiles.
In March 2024, OpenAlex received a US $7.5 million grant from Arcadia to become a sustainable and completely open index of the world’s research ecosystem (OurResearch, 2024a & 2024a). In November 2024, OpenAlex received a 2-year grant from the Navigation Fund totaling US $688,000 to enhance the OpenAlex user interface (OurResearch, 2024d). An Advisory Board will be established in early 2025 to advise on strategy, budget, fundraising, partnerships, staffing, and organizational structure.
When I examined OpenAlex in October 2024, bugs had recently been found in its author name disambiguation, with some works being linked to incorrect authors. OpenAlex claims that the issue was caused by its integration with ORCID and that it should be resolved in January 2025 (OpenAlex, 2024c).
I examined the product from the perspectives of frontend researchers, backend developers, and institutional administrators, applying common research intelligence use cases, as demonstrated in the Understand your University’s Research with OpenAlex Webinar (OpenAlex, 2023b).
Research Analyses
OpenAlex enables institutions with limited budgets for subscriptions to databases like Web of Science or Scopus to understand research works both within their institutions and internationally, without cost, as illustrated in Figure 4.
By utilizing OpenAPIs, users can identify potential collaborating institutions, visualize research analytics with tools like VOSviewer, conduct research impact analyses in R, or create dashboards for research insights with their preferred entities (Figure 5).
As I learned in a recent OpenAlex webinar, research dissemination often occurs independently of universities, leaving many institutions without a comprehensive record of their researchers’ contributions. This gap can limit the ability of institutions and funders to make informed management decisions without relying on costly third-party databases.
OpenAlex provides a completely open database of the global research ecosystem, equipping research leaders with the data they need about the research conducted by their institutions and peers. During the webinar, I learned how to leverage OpenAlex for common bibliometric analyses, including: identifying areas of research strength, assessing current and potential research collaborators, and providing objective evidence of impact and positioning within a field for reports and grant proposals.
Although I have not yet implemented the OpenAlex model in this capacity, I can clearly see its potential to transform how institutions analyze and utilize their research data.
Search Capabilities and Relevancy
OpenAlex has received positive feedback from users for its openness, quality datasets, accessibility, and availability of API (OpenAlex, 2024k). I particularly appreciated its intuitively designed platform.
The full-text search function is robust. Users can use keyword search with Boolean operators and apply up to 49 filters in their search results (Figure 6). I found that the search engine returned highly relevant results.
That said, the relevancy and accuracy of search results can vary. There are occasional inconsistencies in the indexing of certain works or authors, especially with new entries that haven’t been fully validated. When multiple authors have similar names, it can be difficult for users to identify the author they are looking for in the search results. There are still problems with author disambiguation, even though some improvement has been made.
For example, my author record contains an incorrect institution. When I reported this to OpenAlex support, I received a response within two days explaining that the problem was caused by an algorithm’s failure to match the affiliation text on my publication to a ROR record and stating that it would be resolved within a few weeks. At the time of this writing, seven weeks later, the record has still not been updated.
Account Management
Users can create an optional account to set alerts for new items and to save searches. The sign-up process is straightforward, and the password requirements are minimal—only five digits with no additional security measures and no verification of the email address provided. Once an account is created, the user cannot delete it.
Community and Customer Support
OpenAlex has recently implemented a new help desk system to centralize the answers to common questions, a proactive development that reflects their commitment to keeping users informed and engaged. The FAQ section in the Help Center is great for common issues. For more complicated inquires, users should submit a ticket or post in the OpenAlex community forum (Figure 7).
When I submitted a ticket, I received a quick, useful response. However, the time it takes to resolve issues varies and is often beyond the control of the customer support team.
Posting in the community forum generally results in quicker responses and advice from other users. Along with the mailing list, it serves as a valuable source of important information, including about new features. The community fosters collaboration and support.
OpenAlex also offers real-time assistance during regular office hours in US Eastern Time (ET), although the timing is inconvenient for users in other time zones. An inaugural Virtual User Conference, held in May 2024, was a notable success, providing participants the opportunity to share insights and best practices. I appreciate OpenAlex’s plans to make this an annual event.
In addition, OpenAlex provides a comprehensive suite of resources, including technical documentation and webinars, designed to help users navigate the platform with confidence and improve their skills. These resources laid a helpful foundation for my use of OpenAlex.
OpenAlex is entirely free. As a nonprofit organization, OpenAlex generates revenue from two primary sources: paid subscriptions and consulting services.
Optional Paid Subscription and Services
OpenAlex offers two tiers of paid subscriptions—Premium and Institutional—that provide users with enhanced features, including more frequent data updates and higher API limits. Users who opt for a Premium subscription also receive priority support. Subscriptions can be cancelled with 30 days’ notice (OpenAlex, 2024m).
OpenAlex also offers a range of paid consulting services, including affiliation and author curation services, custom research classification, disambiguation, and custom bibliometric analyses and reports (OpenAlex, 2024d). I did not evaluate OpenAlex’s paid subscriptions or services for this review.
Warranty and Terms of Service
The service is provided “as-is,” as outlined in the OpenAlex Terms of Service (October 2024). OpenAlex does not warrant the accuracy, completeness, or usefulness of the information presented on or through its service.
Users should be sure that they understand and comply with the data export restrictions. OpenAlex reserves the right to modify or discontinue services at any time. The platform is committed to user privacy and data protection, emphasizing transparency in how data is handled.
The three channels for data—OpenAlex Web, OpenAlex API, and data snapshots—are all free and require no registration or permission.
While OpenAlex does not require authentication for general access, users handling sensitive data or integrating the API into their systems are advised to implement additional security measures such as API keys, secure connections, user access control, etc.
Scopus, Web of Science, Google Scholar, and Dimensions all compete with OpenAlex. Of these, Scopus and Web of Science are most comparable in terms of coverage and data sources. But OpenAlex is distinguished by its commitment to open access and transparency (Priem et al., 2022).
OpenAlex covers a wide range of works, including those in languages other than English. The complete dataset is available under a CC0 license, promoting transparency and enabling unrestricted reuse. This fosters collaboration and knowledge-sharing across disciplines, particularly benefiting researchers in low-resource settings.
OpenAlex could do a better job disambiguating entities like authors and institutions, which can have serious implications for metadata, especially in high-stakes research evaluations (Priem J. et al., 2022). With the rapid monthly additions of sources and data, users are bound to wonder about trustworthiness (Culbert J. et al., 2024). Initial studies indicate that OpenAlex’s metadata completeness trails that of traditional citation indexes like Scopus or Web of Science, especially when it comes to abstracts, affiliation details, and references (Alperin et al., 2024; Culbert J. et al., 2024). For instance, when comparing a set of common articles, Web of Science and Scopus identified an average of 6.4 and 6.2 more references respectively than OpenAlex (Culbert et al., 2024). Similarly, a study looking at backward citation searches showed that OpenAlex had a mean error rate of -23.9 percent for captured references, while Web of Science Core Collection and Scopus had a mean error rate of +2.9 percent and +2.8 percent respectively (Gusenbauer, 2024).
The open nature of the platform adds to the difficulty of maintaining data quality. The incorrect information in my own author record demonstrates the work that still needs to be done on this front, especially when compared to similar tools.
The API allows users to tag their own text with OpenAlex’s topics, keywords, and concepts (OpenAlex, 2024i). This helps with better categorization and analysis of research subjects and allows researchers to associate their work with relevant academic themes. But making the most of the feature requires some technical know-how, and the accuracy of tagging varies.
Users should keep in mind that the API limits each non-premium user to 100,000 requests per day, with a cap of 10 calls per second. Users with high-volume needs may find this restriction challenging, especially for large-scale analyses.
Furthermore, the reliance on artificial intelligence and automated algorithms for indexing leads to the occasional inclusion of erroneous or irrelevant content. As a result, users might need to spend some time verifying the credibility of indexed works.
OpenAlex pulls from sources like CrossRef and MAG and does not have indexing rules, giving users the freedom to choose which data they want to include. It is important for users to understand that the exclusivity criteria—which include having an English abstract, specific publication statuses, the exclusion of theses and dissertations, the type of peer review, a relatively low citation count for journals, and ensuring geographical diversity among authors—could reinforce their biases (OpenAlex, 2023a).
Less tech-savvy users might struggle with a lack of user-friendly support tools. In the technical documentation, I came across inconsistent information and even some missing pages. For instance, the section on Author Profile Curation should point to a Google Form that was announced on October 26, 2024, following the author profile issues that resulted from the ORCID integration, but the page is missing, as are several others detailing where the works in OpenAlex come from. There is also a discrepancy in the technical documentation about the number of works added daily: one section says 50,000, while another says 10,000.
Comparing OpenAlex in late 2024 to the 2022 version, it is clear that OurResearch has made significant advancements in service offerings, data quality, and organizational structure.
As OpenAlex continues to grow, the new Advisory Board will help to strengthen its framework and ensure the platform stays responsive to user needs and the scholarly community. It will also play a key role in succession planning and maintaining operational continuity. To provide the necessary insight into best practices, user needs, and trends in scholarly research, the board should include a range of international experts from different backgrounds such as academic leadership, research institutions, data science, and software development.
It is crucial for OpenAlex to communicate clearly with its users, especially after major updates like the integration with ORCID. Users should report any issues they encounter and join the OpenAlex Users Group Community to stay informed about updates, concerns, and feature releases.
In conclusion, OpenAlex presents a promising and reliable alternative to traditional subscription-based citation databases for researchers, university administrators, research institutions, and government funding bodies that are interested in research activities and potential collaborations. But there is a lot of work ahead. By focusing on improving data quality and user experience, and by actively engaging with its community, OpenAlex can strengthen its position in the scholarly ecosystem.
Alperin, J., Portenoy, J., Demes, K., Larivière, V., & Haustein, S. (2024). An analysis of the suitability of OpenAlex for bibliometric analyses. arXiv. https://arxiv.org/pdf/2404.17663
Culbert, J., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2024). Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus. arXiv. https://arxiv.org/pdf/2401.16359
GitHub. (2024). OurResearch Repositories: OpenAlex. Retrieved on December 28, 2024, from https://github.com/orgs/ourresearch/repositories
Gusenbauer, M. (2024). Beyond Google Scholar, Scopus, and Web of Science: An evaluation of the backward and forward citation coverage of 59 databases’ citation indices. Research Synthesis Methods. 2024; 15(5): 802-817. doi:10.1002/jrsm.1729
OpenAlex. (2023a). OpenAlex Support Webinar: Introducing OpenAlex: An open, comprehensive index of scholarship. Retrieved on October 30, 2024, from https://youtu.be/dKJgLK3wrTM
OpenAlex. (2023b). OpenAlex Support Webinar: Understand your University’s research with OpenAlex. Retrieved on October 31, 2024, from https://youtu.be/dKJgLK3wrTM
OpenAlex. (2024a). OpenAlex About: Comparison with other scholarly data sources. Retrieved on November 1, 2024, from https://openalex.org/about#comparison
OpenAlex. (2024b). OpenAlex Community. Retrieved on November 1, 2024, from https://groups.google.com/g/openalex-users
OpenAlex. (2024c). OpenAlex Community: Recent bugs in name disambiguation. Retrieved on December 29, 2024, from https://groups.google.com/g/openalex-users/c/86O7SFfocI8
OpenAlex. (2024d). OpenAlex Pricing: Paid Subscriptions. Retrieved on December 23, 2024, from https://help.openalex.org/hc/en-us/articles/24397762024087-Pricing
OpenAlex. (2024e). OpenAlex Search and Analyze the World’s Research. Retrieved on December 23, 2024, from https://openalex.org/
OpenAlex. (2024f). OpenAlex Support: About the data. Retrieved on November 1, 2024, from https://help.openalex.org/hc/en-us/articles/24397285563671-About-the-data
OpenAlex. (2024g). OpenAlex Support: About Us. Retrieved on November 1, 2024, from https://help.openalex.org/hc/en-us/articles/24396686889751-About-us
OpenAlex. (2024h). OpenAlex Technical Documentation: Entities Overview. Retrieved on November 2, 2024, from https://docs.openalex.org/api-entities/entities-overview
OpenAlex. (2024i). OpenAlex Technical Documentation: Aboutness Endpoint. Retrieved on December 29, 2024, from https://docs.openalex.org/api-entities/aboutness-endpoint-text
OpenAlex. (2024j). OpenAlex Technical Documentation: Works. Retrieved on December 23, 2024, from https://docs.openalex.org/api-entities/works
OpenAlex. (2024k). OpenAlex Testimonials. Retrieved on November 2, 2024, from https://openalex.org/testimonials
OpenAlex. (2024l). Sources. Retrieved on December 6, 2024, from https://openalex.org/sources
OpenAlex. (2024m). Terms of Services. Retrieved on November 2, 2024, from https://openalex.org/OpenAlex_termsofservice.pdf
OurResearch. (2024a). Our Projects. Retrieved on November 1, 2024, from https://ourresearch.org/projects
OurResearch. (2024b). OurResearch Blog: Open Science. Retrieved on November 1, 2024, from https://blog.ourresearch.org/category/open-science/
OurResearch. (2024c). OurResearch Blog: OpenAlex 2024 in Review. Retrieved on January 5, 2025, from https://blog.ourresearch.org/openalex-2024-in-review/
OurResearch. (2024d). OurResearch Blog: OurResearch Receives $688k Grant from Navigation Fund to Enhance the OpenAlex User Interface. Retrieved on January 5, 2025, from https://blog.ourresearch.org/ourresearch-receives-688k-grant-from-navigation-fund-to-enhance-the-openalex-user-interface/
POSI. (2024). The Principles of Open Scholarly Infrastructure: Posse “Who has committed to the POSI principles?” Retrieved on November 1, 2024, https://openscholarlyinfrastructure.org/posse/
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts (arXiv:2205.01833). arXiv. https://doi.org/10.48550/arXiv.2205.01833
Scheidsteger, T., & Haunschild, R. (2022). Comparison of metadata with relevance for bibliometrics between Microsoft Academic Graph and OpenAlex until 2020. Zenodo. https://doi.org/10.5281/zenodo.6975102
Simard, M. (2024). The Open Access Coverage of OpenAlex, Scopus, and Web of Science. arXiv. https://arxiv.org/abs/2404.01985
Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B. , & Wang, K. (2015). An Overview of Microsoft Academic Service (MAS) and Applications. In WWW ’15 Companion: Proceedings of the 24th International Conference on World Wide Web, 243–246. https://doi.org/10.1145/2740908.2742839
Tay, A., Martín-Martín, A. and Hug, S. (2021). Goodbye, Microsoft Academic– hello, open research infrastructure? Impact of Social Sciences Blog. http://eprints.lse.ac.uk/id/eprint/111325
10.1146/katina-01062025-1