1932

Illustration by Sarah Bissell for Katina Magazine/Underlying image via Rawpixel

Like It or Not, AI Has Arrived in Archives. Now Is the Time for Archivists to Take the Reins.

Chronic resource constraints have diminished archivists’ capacity to make primary sources legible to users. AI may intensify the crisis. But it may offer a way out.

By Emilie Hardman

|

LAYOUT MENU

Insert PARAGRAPH
Insert H2
Insert H3
Insert Unordered List
Insert Ordered List
Insert IMAGE CAPTION
Insert YMAL WITH IMAGES
Insert YMAL NO IMAGES
Insert NEWSLETTER PROMO
Insert QUOTE
Insert VIDEO CAPTION
Insert Horizontal ADVERT
Insert Skyscrapper ADVERT

LAYOUT MENU

Archival description has always relied on aggregation. How else to manage the sprawling records of European bureaucracy that helped give rise to the field of archival science in the first place? In a purely analog era, aggregation and summary description were necessities of access. Over time, archivists came to defend this approach as the art of “just enough” description: a calibrated labor that balanced evidentiary care against finite resources.

In the last few decades, however, chronic resource constraints have pushed many repositories past calibration and into triage, thinning description further and resetting the threshold for what is considered acceptable. The question now is whether we will accept that lowered threshold as the new normal, or rebuild descriptive capacity so we can make primary sources legible in the digital systems where research increasingly happens, especially as AI becomes a common layer of discovery and interpretation. This is a question of the future of archival work.

That question of legibility hinges on our sense of what is “enough.” In 2024, I prepared a report on the current state of archives and special collections work for JSTOR Stewardship. In my fieldwork, archival processing teams often refered to the “golden minimum.” The golden minimum (which has a more gilded ring than its parent concept, “MPLP,” or “More Product, Less Process”) is description sufficient for baseline use under conditions of constraint. The problem, as most any reference archivist or special collections librarian will tell you, is that what is sufficient for the baseline is not always actually, well, sufficient.

Broken Descriptive Scaffolds

In digital environments, the mismatch between the golden minimum and users’ actual needs is starker than in analog environments. Material may be considered accessible by archivists, yet remain effectively unavailable if users cannot actually discover it. That problem is well-known, but we talk less about a related issue: that when users do find something entirely applicable to their research, they do not always have enough information to make sense of what they are seeing. (I say “they,” but I also mean myself.) Without adequate descriptive scaffolding, digital primary source materials that are searchable may not be richly usable, especially for those who do not already know the names, terms, and contexts that unlock a collection. Why, for example, I had to wonder recently, was I looking at a Truman Capote story printout in a digital collection focused on HIV, AIDS, and the arts? There are reasons, culturally important ones, but in the moment such an encounter is the definition of inscrutable.

That confusion is not incidental. It is a predictable outcome of a discovery model that has long prioritized findability over fine-grained intelligibility. Digitization did not create this problem, but it has exposed it. Within the contextual container of a finding aid, meaning accrues through provenance and the hierarchical logic of inheritance. When cues from physical archives are pulled into other systems and encountered as atomized items, description often stops functioning as description. An online collection can be easy to misread, not because anyone failed in their professional work, but because the descriptive cues were never designed to stand alone. For example, a folder label like “1989” may be sufficient in a finding aid, but it does little to help a user interpret a digitized item encountered on its own. Presented online, that same image requires fuller description to be intelligible as a benefit flyer advertising a special appearance by New York’s famed Rollerena Fairy Godmother.

This is the archival landscape into which AI has arrived. This is, of course, a technology shift, but it also creates the conditions for two more important changes: a transformation of the role of archivists and the emergence of new expectations from the users of primary sources. AI will change what staff are asked to produce, what leaders come to expect, and what users assume is possible. Those new expecations will either intensify the crisis of capacity in repositories by expanding what is demanded without expanding what is supported or help reframe archival labor around public value and a new centrality for the vast primary source record within the scholarly research ecosystem.

What comes next will not be determined by whether archives “adopt AI.” It will be determined by whether the field directs AI tools toward user-serving ends in ways that preserve accountability, context, and care.

Bringing AI into Archives

Any honest discussion about the utility of AI in archives and special collections should start by acknowledging and contextualizing the field’s hostility to it. In many quarters, the reaction is not mild skepticism. The AI for Access project has presented participant comments from its survey work that the authors call “visceral reactions as data.” They include blunt thoughts: “AI is injurious to the human spirit.” And blunter: “[AI] tools are SLOP.” Still others compress multiple concerns into a single verdict, describing AI as “environment-destroying, labor-devaluing, untrustworthy garbage.”

Those words are fired up, but they are not only rhetorical heat; they are likely also professional memory. They reflect a pattern that archivists and librarians have experienced repeatedly: technology introduced as a substitute for labor, tools procured without adequate governance, and efficiency narratives used to justify disinvestment. They also reflect specific and credible concerns: privacy and surveillance risks in collections that contain sensitive personal information, documented bias and representational harm, extractive data practices, and environmental costs. On top of all that sits a professional conviction that archival description is not simply text production. It is judgment work, context-building, and responsibility for how records can be used.

For these reasons, these reactions should not be treated as an obstacle to be overcome with enthusiasm or force. Rather, they should be treated as a constraint that clarifies what will fail. If AI is pursued as a throughput mandate, it will deskill judgment work into content production, reward volume over trust, and tempt institutions to publish text that looks authoritative but has not earned authority.

These reactions are also evidence that the field has learned, through experience, what happens when technology is introduced as substitution rather than service. That lesson matters now because AI is already becoming an inextricable layer in the scholarly and educational ecosystem. The question is not whether scholars and students will encounter AI-mediated discovery for archival materials. They already do. Instead, we should be asking whether primary source materials will be legible within that ecosystem and how archivists can act within that ecosystem as professionals with important contributions to make.

Directing AI Toward Archival Ends

If archives and special collections practitioners do not shape how AI intersects with archival records, other systems will. Those systems will privilege what is already well-described, already standardized, already “clean” enough to be handled without human oversight. The result will not be a dramatic disappearance of archives, but a quieter drift toward invisibility, where primary sources become less present in teaching and research because they are harder to interpret than sources packaged for generic retrieval, and where archival research becomes even more unevenly distributed, concentrated among those with the methodological training, time, and institutional access to do the work of interpretation that systems fail to support.

Going forward, the success of archival work will increasingly be judged not only on whether archives are available digitally, but on whether they are usable at scale and within the interfaces where users will expect orientation to and clarity around what they are viewing. AI does not provide the only means by which to affect the needed changes, but it provides one of the most significant opportunities to adjust our response to the chronic atmosphere of constraint that has brought us to this moment. The future can look more like a response to opportunity than to constraint. We can, further, endeavor to make moves that will situate primary sources at the center of the scholarly research ecosystem, powered by a new sense of capacity for description, context, interlinking, and overall stewardship within a fundamentally changed world.

If that is the aspiration, then it helps to say plainly what it is not. It is not a promise that AI will finally allow us to “finish” archival description. It is not a fantasy of item-level completeness. It is also not, or should not be, a permission slip for leadership to demand more output without expanding support. The most seductive promise in the current moment is speed. AI is often framed as a backlog solution, and it is easy to understand why, in a field shaped by scarcity and deferral, that pitch lands. But if “efficiency” becomes the primary story, the field will lose control of outcomes. The core question is not how quickly repositories can generate more description. The core question is whether users can locate, understand, and responsibly interpret primary sources at scale.

Legibility is the bridge between access and use. It is the difference between encountering a record as a fragment and encountering it as evidence. And it is often produced through surprisingly small interventions.

This is where my Truman Capote printout turns from a private moment of confusion into a diagnostic tool. On its own, the material reads as odd literary ephemera. A set of relational cues change the encounter: the item is part of Robert Coffman’s papers. Coffman, an actor at San Francisco’s Theatre Rhinoceros, the United States’ oldest continuously producing queer theater, used annual Capote readings as holiday benefits during the AIDS crisis. The events provided a gathering space, a fundraising platform, and a comforting cultural steadiness in a period shaped by loss and profound grief. That dimension of the crisis does not announce itself from the item alone, but even this small set of contextual placesettings opens up powerful nuance and quickly reveals something new about the crisis. The record has not changed. The conditions of interpretation have. The value of AI here is not that it ‘understands’ the item, but that it can quickly and easily help surface the names, affiliations, events, and institutional relationships that make the item legible.

If we are serious about centering primary sources in the scholarly research ecosystem, this kind of shift has to become commonplace. Not because every encounter should be frictionless, but because too many encounters are stalled by the wrong kind of friction: not the productive difficulty of interpretation, but the preventable opacity that comes from absent relational and contextual cues.

Relational Description

Of course, such cues are not in the archival record currently for the same reason that there are only a handful of words used to describe two cartons of Coffman’s photographs in his archive, four of which are “The photographs are unarranged.” There isn’t time or money enough in the world. The consequence is not only thin description, but thin relationality, too: the absence of the cues that tell a reader how an item belongs, what it is adjacent to, and what interpretive frame makes it legible.

The aspiration to make relationships first class is not new, and it is not a naïve linked data dream either. Encoded Archival Context for Corporate Bodies, Persons, and Families (EAC-CPF) made agents and their relationships describable as entities rather than only as names embedded in narrative description. Records in Contexts (RiC) extends that relational logic more broadly, proposing archival description built around entities and relationships rather than only hierarchical containers. The point is not standards for their own sake. It is portability: relationships that can travel across systems, persist across collections, and become user-facing pathways for discovery and interpretation.

The persistent stumbling block has always been operational. Relationship-rich description is labor intensive to create and maintain. It is also unevenly distributed work, with capacity clustered in some environments and absent in others. AI has great potential in proposing candidate entities and candidate links that professionals can accept, reject, and refine. AI can also help generate entry points at scales that would otherwise be impossible. It can support vocabularies and suggest patterns. The goal is discovery that behaves more like research, where relationships are surfaced as navigational.

Agentic Systems

Up to now, much of the conversation about AI in archives and special collections has centered on generative models. But AI development is shifting toward systems that do multi-step work. They retrieve iteratively, compare candidates, call tools, and assemble outputs across steps, a constellation of approaches known as agentic AI.

Primary-source research is rarely a one-query task. Researchers triangulate. They test terms. They follow a name, then search variants, then look for related organizations, then try to determine whether a date is a creation date or a depicted date, then return to series context to understand what kind of record they are dealing with. Conventional search tends to flatten this process into the retrieval of fragments. Agentic systems, at least in principle, could support something closer to guided orientation.

But agentic systems also raise the stakes because they are not only producing text. They are producing an account of what matters. When such a system presents context without showing how it was assembled, it can substitute coherence for accountability. The more fluent the scaffolding becomes, the easier it is for users to mistake it for authority.

The Work Ahead

In the analog world, the archival encounter was enacted through recognizable forms and practices. The finding aid not only provided the data for a call slip, it also taught users how to navigate the possibilities of opaque masses of material. The reading room supplied a social and procedural frame for interpretation. Reference work offered a live form of orientation: an iterative translation between a researcher’s question and the archive’s structures, where context was supplied, terms were tested, and materials became usable. Those structures did not eliminate difficulty, but they made difficulty productive, and they kept archivists centered in the work in valuable ways.

What is arriving now is not a finished technology regime, but an evolving expectation about what digital systems should provide at the point of encounter. That expectation is not, at its core, a demand for frictionless access. It is a request for more capacity: for materials to be reviewed at scale in ways that can surface rights, permissions, and ethical concerns that often delay availability; for richer context; for stronger linkages; for forms of support that help users interpret what they find. If archives meet those expectations only by defending inherited constraints, we will preserve our familiar forms while conceding the space where scholarly attention is increasingly organized.

AI is an opportunity to rescale and reassert the value of the primary source record. The center of archival labor shifts, then, toward authoring the terms under which machine assistance operates. Engagement with AI is not capitulation to novelty or a forced hand grasping for speed and throughput. It is a way out of “golden minimums” as a horizon of professional imagination. It is the chance to treat archival work as a renewed contribution to the research ecosystem, translated into infrastructures that can support legibility, context, and responsible use at a scale that repositories have rarely had the resources to pursue.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error