1932
A photorealistic illustration of a tiny live preserver sitting on the keyboard of a laptop

CREDIT: dencg via Shutterstock

With Government Data at Risk, These Volunteers Are Taking Action

Read or watch Lauren Collister’s interview with the Data Rescue Project’s Lynda Kellam, who discusses the effort to preserve government data and the special role librarians and archivists have to play.

By Lauren Collister

|

Post a comment

LAYOUT MENU

Insert PARAGRAPH
Insert H2
Insert H3
Insert Unordered List
Insert Ordered List
Insert IMAGE CAPTION
Insert YMAL WITH IMAGES
Insert YMAL NO IMAGES
Insert NEWSLETTER PROMO
Insert QUOTE
Insert VIDEO CAPTION
Insert Horizontal ADVERT
Insert Skyscrapper ADVERT

LAYOUT MENU

The Data Rescue Project is a coordinated effort among a group of data organizations to create a central information resource for data rescue-related efforts and data access points for public US governmental data that are currently at risk. The project was recently profiled by The New Yorker.

We wanted to understand the integral role of librarians and archivists in this work. Lauren Collister, a member of Katina’s editorial team, spoke with Lynda Kellam, a volunteer organizer of the Data Rescue Project, who also works as the Snyder-Granader Director of Research Data and Digital Scholarship at Penn Libraries at the University of Pennsylvania. Watch their whole conversation here, or read it, edited for length and clarity, below.


I’d love it if you could give a quick overview of the Data Rescue Project—what youre doing and whos contributing to it.

The Data Rescue Project is a grassroots volunteer effort to rescue data and raise awareness of the importance of government data. Primarily, we are working with federal data, but there’s data collected at all levels of government. The project emerged out of efforts of three data librarianship organizations: IASSIST (International Association for Social Science Information Service and Technology), RDAP (Research Data Access & Preservation), and Data Curation Network. All three are closely intertwined, with overlapping memberships but very different focuses and interests.

The project also expanded beyond that because we have connections with people—librarians and non-librarians—in other communities who are interested in government data. For example, we have partners like PEDP (Public Environmental Data Partners) and the PEGI (Preservation of Electronic Government Information) project.

We’ve also, thorough this process, met other partners. A great example is SUCHO, Saving Ukrainian Cultural Heritage Online, a group that formed in response to the Ukrainian war. Another is Safeguarding Research and Culture

Those are just a few of the groups we work with, beyond the larger ones like the Internet Archive and End of Term Web Archive. The project was meant to harness the grassroots energy of people who wanted to do something but weren’t sure how, or how to get started, or maybe weren’t aware of the existing efforts to preserve electronic government information and government data. We have a wide range of contributors. Librarians, but also researchers and tech people.

In addition to amplifying each other’s efforts, our goal is to rescue as much data as we can, primarily focusing on social data because PEDP is doing environmental data.

I wonder if you could share a particular project or rescue effort that you worked on or interacted with—maybe something youre particularly proud of. What needed to be done, and what was that work like?

One of the things we created at the beginning was a spreadsheet to help as we went through the workflows of data rescue, to make sure that people were able to collect the metadata needed to make things preservable for the long run.

That has led to a lot of things we’re proud of. We’ve worked with partners to find a preservation home for IMLS (Institute of Museum and Library Services) data. That was one of the early things that we made sure to get. Every time we finish an agency or are able to get a lot of the data from an agency we consider it a win.

The thing I’m most proud of is the Data Rescue Project Tracker, which is a catalog of data rescue efforts beyond just ours. It’s a tool where people can look for a dataset, see who’s captured it and where it’s been backed up. That is not just Data Rescue Project volunteers, that includes PEDP, that includes ICPSR (Inter-university Consortium for Political and Social Research), and others.

You mentioned SUCHO, and that starts to answer my next question: right now it seems as if a lot of eyes are focused on the United States, but I wonder if there are other international efforts like this. Does it happen with administration transitions elsewhere? What does this look like globally? Or do you have participants working on United States data from around the world, for example?

We definitely do.

SUCHO has been a great partner. The person who works with us is actually based in Germany. SUCHO was a widely popular effort to back up cultural heritage digital objects and websites, and he’s been great in helping us set up our infrastructure—our communications channels, our website, our newsletter, all of that.

It was interesting—we had more attention in the French press in the beginning than in the American press. The first few articles were French. One was in Le Monde in mid-February, which was surprising. That garnered a lot of interest from people who are in Europe.

The lead person for Safeguarding Research and Culture is based in Germany. We can handle smaller datasets, but if it’s something that’s going to require large-scale scraping, we can turn to him and his group.

There’s been a Canadian data rescue event as well.

To answer the question about whether we see this in other places, not to this scale, necessarily, but one of the things that we are hoping to do is create a model for people who do experience this kind of situation. And it doesn’t have to just be in relation to administrative changes or regime change, a political situation. It also can be in response to disasters or other events that damage the infrastructure of a country. A rapid response approach to saving data in any kind of situation.

What is the relationship between the current work that you’re doing and data rescue efforts seen in 2017? Is it some of the same people, same infrastructure? How are you building on that past work?

The lead group at that time was called Data Refuge. That group doesn’t really exist anymore—a lot of those people went on to different careers or positions. So the infrastructure wasn’t quite there for us to be able to just pick it up and put it into what we’re dealing with now. Plus, they were mostly focused on environmental data.

What did exist even then was the End of Term Web Archive efforts, because there’s a recognition that presidential websites change with every administration. And then out of 2017 EDGI (Environmental Data and Governance Initiative) and PEDP were created, as well as ICPSR’s DataLumos, which is where we archive the data.

That’s the infrastructure that was created in 2017 that we’ve been able to use for our efforts. There are a lot of people who have told us that they were active back in 2017. But then we also have a lot of people who weren’t part of the 2017 effort.

So we pay homage to Data Refuge, but we’re not the same thing.

You mentioned DataLumos and using open infrastructure tools. I know on your website, you’ve mentioned that there are a lot of freely available or open tools that folks can use to help with this work. Is using open infrastructure essential to your mission? How does it contribute to or facilitate this work? Or are you more tool-agnostic—whatever gets the job done?

Definitely whatever gets the job done. One of the things that was attractive about DataLumos—a crowdsourced repository for government data—is that it had a preservation mission.

Our big priority is to avoid commercialization of the data. Government data is in the public domain unless it’s restricted for some reason. People do add value and create things that can be monetized, but we wanted to make sure that it’s free and open for people to access.

We’ve been able to do that through DataLumos and also through the Internet Archive—we’ve nominated a lot of links to the Wayback Machine or to the End of Term Web Archive.

So I would say open infrastructure is critical. We’re trying to use whatever we can spin up quickly without money. Until recently, we weren’t taking donations. So it’s been important to have that open infrastructure available.

You mentioned that librarians and archivists play a key role, though you have volunteers of all kinds. Could you tell us more about the role that librarians and archivists play in doing this work and how they use their unique skill sets?

A lot of us are data librarians. So one of the first things we did was draw on the curation workflows that had been created by Data Curation Network. MIT also created one specific to government data. We really wanted to amplify those workflows so that people knew what was involved in curating and backing up and preserving for the long term data in particular, which has different needs from a web site.

Web crawling is certainly something we support and do, and it’s something that has to be done in certain cases, especially for a large website. We also recognize that for data, web-crawled websites are not long-term preservation. We really want to make sure that the data, the documentation, the metadata can be preserved by an organization that actually has data in mind.

Metadata has been a big part of that both for data rescues and for the tracker. As we were creating the tracker, we were thinking through the kinds of metadata someone would need in order to be able to access the data and use it efficiently. So those two skill sets are especially important.

We’ve also had people with technical skills—they can do web scraping, those kinds of things. We’ve been putting them to use.

But we can find a task for anybody. One thing we’ve been pushing is this idea of a data use story—making sure that people are telling how they’re using the data, why it’s important. Librarians could really help us get more stories out there.

I talked to a journalist today who asked me what’s missing from media coverage, and I think one of the things that’s missing is a true understanding of what the impact will be if we lose this data for the long term. That it’s not just about researchers, it’s not just about universities. The data has an impact on real people, on services that we provide in the community. Librarians can help connect those dots and be ambassadors for government data in their communities.

Do you have a data use story that really resonates with you or that others have found powerful?

It depends on the community you’re in. We’ve put two stories on our website so far. One is about IPEDS—the Integrated Postsecondary Education Data System. A person was talking about working with high school students and using IPEDS to help them decide where they’re going to college.

Another one talks about the Youth Risk Behavior Surveillance System, a large-scale survey that looks at youth risk behaviors and the impact that changing variables that relate to gender would have on services provided to people who are transgender. That story did a really good job of connecting the dots down to services that are actually provided by a community organization.

If people are interested in sharing stories, we would love to collect them. But they are also welcome to just talk about them in their own communities.

It sounds like you’re capturing a snapshot of the data. Are there data sources that are still being updated or changed in some way? Do you have to go back and re-get them? What’s that process like? Do you ever consider something “done” in this space?

That’s a great question. A lot of this will be ongoing, we think. At this point, our goal is to get the data. It may take a few months to be able to fully understand what has happened, to take a reckoning.

People ask us what percentage of things we think are completely gone. I don’t it’s a great percentage of the data that’s gone, at least in the social data that we’re looking at. It’s more that the federal workers who supported the data, the data stewards, are gone. So the concern is: if the people who are employed to be the experts in that data are no longer there, who is going to take care of that data?

It’s harder to answer your question because of that question. We’ll have to do some matching of datasets to see—have they been updated? Have they been altered? But that’s going to be a different process that will need to take place after we’re sure that the administration is done with the reorganizations.

It sounds like there’s going to need to be a future project in which you’ll both look back at what you have saved and look forward at the work you need to keep doing to make sure it remains available.

Yes. Providing volunteer support to ICPSR has been a big part of our mission. They’ve created a tool where people help remediate the metadata from some of the things that they rescued.

That effort is really important because they have a large staff, but everybody there has a job, they’re not focused on curating rescued data. So I think there will be space for volunteers, even months from now, to help think through ways that we can help ICPSR with their metadata, or that we can go back and look at the data we’ve collected and update it in some way.

For listeners who want to get involved or support the project, how can they do that? Where should they go? And what are some of the things that need to be done now, and how can they find out about some of these future projects?

Definitely subscribe to our website, which is where our newsletter goes out. We put out a post twice a week typically. You can follow us on BlueSky for rapid response information.

On our FAQ on our website, we have information on getting more heavily involved. There’s a volunteer form, and we’ll reach out to you and add you to our communication channel, which is called Mattermost.

As things start to slow down, we’ll use Mattermost to coordinate the metadata cleanup and comparison of datasets—things like that.

You can also always create a data rescue event. We don’t officially organize these for people, but we have resources that you can use. We are happy to advise on doing one and suggest places to look for data to rescue. We’ve started encouraging people to look at state-level data.

If somebody wants to gather community stories, we would love to amplify those in any way we can.

We’ve become a member of Open Collective Europe, which gives us 501(c)(3) status, so people have been able to give us small amounts of money. That helps us cover our communications channels.

The final thing is letting other people know why government data matters. Talking to your family or friends and telling them about what’s happening and why this matters on a personal level. That’s really important.

I want to end with a question I always like to ask: is there anything you wish I’d asked you, or a topic you’d like to share more about?

I do want to emphasize that we are volunteers. People are working on weekends and evenings to do this.

So there may be things that could be improved. I encourage people to reach out to us and let us know if they see incorrect metadata in our tracker or if they see something that could use some improvement. That is volunteering, and that’s helping us. We love feedback.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error