Confronting the Challenges of Sensitive Open Data
When governments collect sensitive data about private individuals, personal privacy and governmental transparency come into conflict. How should we resolve this tension?
When governments collect sensitive data about private individuals, personal privacy and governmental transparency come into conflict. How should we resolve this tension?
Government agencies collect an immense amount of data about members of the public, often including sensitive information. When this information is accessible to third parties, how should we resolve tensions between personal privacy and government transparency (Savage & Monroy-Hernández, 2018)?
In this article, we’ll use the term “sensitive open data” to describe data containing private information about individuals that becomes available to the public through a range of mechanisms, from court orders to statutory mandates. Collectively, this data illuminates the variety of ways which open data can help or harm individuals, making it a useful edge case in exploring the tensions between personal privacy and government transparency and understanding the social impact of open data in an increasingly data-intensive world.
While data ownership is a thorny topic in the private sector, it seems more straightforward in the public sector. Consider a single person living in a studio apartment in New York City whose apartment is broken into while they’re grocery shopping. They call the police, officers respond, and details of the incident are entered into New York Police Department (NYPD) systems.
All things being equal, it is not unreasonable for this person to assume this data is ultimately theirs. It helps to walk through the logic: (1) the data is about the individual, (2) it has been collected and stored using taxpayer-funded infrastructure, (3) governments serve the public. These facts lead to an intuitive sense that the data should be owned by our theoretical crime victim. In reality, though, anyone can access such data given a relevant legal statute enabling access and approval of an access request, all without the victim’s knowledge.
For the last three years, the two of us have grappled with the fact that publicly available data can entail invisible risks to those whose personal information is included. Our focus has been data about policing in the United States (i.e., data collected or generated as part of policing activities), a substantial amount of which is accessible to the public via laws like the Freedom of Information Act. In contrast to data about crime, policing data contains a wealth of information about both typical and unusual officer interactions with the public. In fact, we cannot tell what is typical behavior without access to policing data (separate from other criminal justice data), since officers interact with the public for a range of reasons, from roadside assistance to wellness checks, that may or may not result in further legal action.
The tension between transparency and privacy inherent to policing data reflects the evolution and expansion of officers’ role within society. In contrast to television screenplays that focus on its exceptional aspects, policing is often boring: officers patrol streets, do paperwork, assist motorists. Likewise, the media report on dramatic events that then become a focus of public calls for transparency and accountability, leaving the public unaware of whether these events are rare or reflective of a pattern.
Statutes exist in the United States and other countries to make policing data accessible to the public for purposes of transparency. Access to sensitive open data about policing is now normal, even if technology complicates the status quo (Breitenbach, 2015; Reporters Committee for Freedom of the Press, 2023). In fact, some jurisdictions either require or provide avenues for the release of specific kinds of data (e.g., body-worn camera footage) if an incident meets certain criteria (Houston Police Department, 2021; New Orleans Police Department, 2023).
The value of such transparency to a range of interested parties for basic purposes, such as legal proceedings about an event, is clear. But reactive transparency does not support scientific research about police systems as organizations, let alone provide an accurate picture of how policing operates in practice or the data rights at stake. Some even argue this approach has led to excessive openness that may compromise police operations (Lin, 2015; Newell, 2021).
Furthermore, officer interactions with the public involve individuals who may be victims, who are never arrested, or who (even if arrested) have not been convicted of a crime. The data collected during such interactions also includes information about individual officers, reflecting a special confluence of identifiers and interested parties. When that data becomes open, victims, suspects, bystanders, and officers collectively experience a loss of privacy that may be mandated by law, whose potential harms can only be mitigated.
In the United States, one state codified the shared interest of these parties by providing access to body-worn camera recordings to those involved: the subject(s) of the recording and the officer(s) (Illinois Attorney General, 2024). Meanwhile, legislation in another state prioritizes the right to privacy over the need for transparency, for example, by encouraging encryption of communication over police scanners to protect sensitive personal information (California Department of Justice, 2020; S.B. 719tion, 2023). Such variation in how to balance openness with privacy protections reflects significant disagreement in how to responsibly provide access to sensitive data.
These features are not unique to policing data. Almost any government service—from public housing, to child welfare services, to public health initiatives—involves members of the public, government employees, and record keeping that may be subject to future public scrutiny (US Department of Justice, n.d.).
In the European Union (EU), under the General Data Protection Regulation (GDPR), data privacy protections are more robustly elaborated than they are in the United States but still involve a weighing of interests (transparency vs. privacy). For at least the last fifteen years, European jurisprudence has favored the protection of personal privacy (European Commission v. The Bavarian Lager Co. Ltd., 2010), a principle that has continued under GDPR, but this approach has led to predictable concerns about transparency vis-à-vis journalistic access (Erdos, 2019) as well as highlighted inconsistency across member states (Erdos, 2016).
While the GDPR reflects a substantial difference in how data rights are established and enforced in the EU vs. the United States, we refer to all legal mechanisms enabling non-governmental access to government-collected data about individuals as “open records laws” to avoid confusion, acknowledging that the scope of sensitive open data may be qualitatively smaller in the EU compared to American contexts.
The risks associated with sensitive open data require identifying and justifying a strategy for negotiating the privacy/utility tradeoff (Dwork, 2009). The first challenge: for information subject to open records laws, privacy-preserving strategies are moot, as the release of sensitive information cannot be prevented.
The second challenge arises from the first: If organizations must provide access to sensitive information, then just knowing what information is accessible (i.e., the names of data elements) can place individuals at risk, as it identifies data that must be made available on request. This means access to metadata can be just as harmful as access to the data.
To address these challenges, data providers—including organizations that collect sensitive data, as well as those who facilitate access to sensitive data—must ensure crystal clarity about why sensitive data must be openly available, what protections are needed to prevent foreseeable harms, and who benefits.
The importance of addressing consent in sensitive open data cannot be overstated, but we have models that we know work. For example, many kinds of lifesaving, emergency research in the United States require exception from informed consent (EFIC), if only because the person involved is incapacitated (e.g., a study of resuscitation techniques) (US Food and Drug Administration, 2013). The problem of sensitive open data is different, but benefits from the insights learned from conducting this type of research, in which data about a group of people must be obtained without informed consent from each individual.
For all emergency research utilizing EFIC, US Food and Drug Administration (FDA) regulations require community consultation and public disclosure of research outcomes (Exception from Informed Consent Requirements for Emergency Research, 1996). Public research presentations are the most common method of community consultation and are associated with high acceptance rates of EFIC studies by community members (Dickert et al., 2021; Fehr et al., 2015). Other approaches, like surveys, focus groups, and interviews, have also been used (Dickert et al., 2021).
Similar mechanisms for community engagement could not only help ensure that the benefits of sensitive open data outweigh the harms but could also, in the future, enable communal ownership of such data. How? While data trusts and similar entities have gained traction as vehicles designed for responsible data stewardship (Hardinges et al., 2021), securing community buy-in to these solutions is difficult, especially in light of the potential for exploitation (Micheli et al., 2021; Chouldechova et al., 2023; Irani et al., 2023; Sambasivan et al., 2023). Proven models for community consultation offer a practical foundation for enhancing both the management and accessibility of sensitive open data.
Data sovereignty is a critical issue for indigenous communities wary of extractive practices, a conversation that predates current debates (Kukutai and Taylor, 2016). We speak about the American context here, which is influenced by Canadian efforts to support First Nations (Carroll et al., 2020), but the tensions involved emerge in multiple contexts around the world (e.g., Australia, see Lovett et al., 2020). We cannot speak to these contexts individually but highlight relevant aspects of indigenous data sovereignty in the United States as an example.
The FAIR principles—published in 2016 to promote best practices in scientific data sharing—are designed to make data “Findable, Accessible, Interoperable, and Reusable” (Wilkinson et al., 2016). A complementary set of principles, the CARE Principles—“Collective Benefit, Authority to Control, Responsibility, and Ethics”—were developed by the International Indigenous Data Sovereignty Interest Group, through consultations with Indigenous Peoples, academic experts, government representatives, and other affected parties, in response to increasing concerns regarding the secondary use of data belonging to Indigenous communities. According to their authors, the CARE Principles integrate Indigenous worldviews that center “people” and “purpose” to address critical gaps in conventional data frameworks by ensuring that Indigenous Peoples benefit from data activities and maintain control over their data (Carroll et al., 2020).
Prior to development of the CARE Principles, there was a growing awareness of Indigenous data sovereignty and governance among tribes, often driven by the expansion of data-intensive projects and increased advocacy for data supporting tribal values and wellbeing (Carroll, 2019; Rodriguez-Lonebear, 2016; Rainie et al., 2017; NCAIPRC, 2017; NCAI, 2018). In turn, tribes developed and adapted governance mechanisms for improved data stewardship. Carroll et al. (2019) describe several case studies of tribal data governance. We highlight two relevant cases:
In 2014, the National Congress of American Indians Policy Research Center (NCAIPRC) organized for five tribes to each pilot a unique community-based data project (NCAIPRC, 2017). One community, the Pueblo of Laguna, developed proprietary census software in partnership with the University of New Mexico. As a result of effective community engagement, the project led to increased census participation. The software remains tribe owned and will support future data collections (NCAIPRC, 2017). As Carroll et al. (2019) explain, this example demonstrates that tribes can independently manage data and utilize external expertise to develop technology that reflects their priorities.
Other tribal organizations have promoted meaningful participation and shared decision-making in human research protection initiatives through the creation of Institutional Review Boards (IRBs), commonly referred to as research ethics committees outside of the United States. The first tribal IRB, the Navajo Nation Human Research Review Board, which exercises sovereignty over all human research activities in the Navajo Nation and oversees studies for the Navajo Area Indian Health Service, was created in 1996 (Navajo Nation Human Research Review Board, n.d.). Notably, all research data conducted under this IRB’s authority belongs to the Navajo Nation. Today, 11 Indian Health Service IRBs exist alongside a growing number of independent tribal IRBs (Indian Health Service, n.d.). This model has also been adapted by non-indigenous communities, for example, in the Bronx, where the Bronx Community Research Review Board reviews local community research studies.
The application of the CARE Principles has been proposed across various domains, such as earth science (O’Brien et al., 2024), ecology (Jennings et al., 2023) and archeology (Gupta et al., 2023). Initiatives to integrate these principles into research infrastructure not only highlight the importance of Indigenous community governance in enhancing the quality and reproducibility of research and data; they also align research with community values and facilitate responsible data stewardship.
While the CARE and FAIR principles were formally articulated through networks including significant US participation, similar frameworks have been developed independently across the Global South. The Swakopmund Protocol established collective ownership and benefit-sharing requirements for traditional knowledge across nine African nations (Costantine, 2022). India’s Traditional Knowledge Digital Library, operational since 2001, implements a distinctive approach to findability and access that prioritizes protection over openness. And Brazilian law mandates Indigenous and traditional community representation in data governance through the Genetic Heritage Management Council (da Silva and de Oliveira, 2018). These parallel developments reflect ongoing global efforts to balance the ethical tensions identified in CARE principles between open data ideals and Indigenous data sovereignty.
Applying the approaches described above to sensitive open data respects the autonomy of individuals who have rarely given consent for others to access their private information. It also ensures that access to data about community members benefits their communities, allowing the ideal of open data as a public resource to be realized without resource extraction or economic exploitation.
Nothing suggests making these changes will be easy. But failing to act would leave in place a system that centralizes sensitive open data in ways that parallel what Couldry and Mejias (2021) call “data colonialism” in corporate contexts, extracting value from peoples’ information while withholding its benefits from data rights holders. Given the unique public value of sensitive open data, these are challenges worth overcoming.
Breitenbach, S. (2015, September 22.) States grapple with public disclosure of police body-camera footage. Stateline. https://stateline.org/2015/09/22/states-grapple-with-public-disclosure-of-police-body-camera-footage
California Department of Justice. (2020.) Confidentiality of information from the California Law Enforcement Telecommunications System (CLETS) [Information bulletin]. https://oag.ca.gov/sites/all/files/agweb/pdfs/info_bulletins/20-09-cjis.pdf
S.B. 719, 2023-2024 Reg. Sess., (Cal, 2023). https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202320240SB719
Chouldechova, A., Black, E., Wolf, C. T., & Opoku-Agyemang, K. (2023.) A case for rejection: The sociopolitical context of ‘Rejection’ in HCI.” In Proceedings of the 2023 ACM conference on fairness, accountability, and transparency, 1546–61. Association for Computing Machinery. https://doi.org/10.1145/3593013.3594093
Costantine, J. (2022.) The Swakopmund protocol for the protection of expressions of folklore: A review of implementation in Rwanda and Tanzania. Journal of Intellectual Property Law & Practice, 17 (10): 834–43. https://doi.org/10.1093/jiplp/jpac088
Couldry, N., & Mejias, U. A. (2021.) The decolonial turn in data and technology research: What is at stake and where is it heading?” Information, Communication & Society, 24 (16): 1–17. https://doi.org/10.1080/1369118X.2021.1986102
Dickert, N. W., Metz, K., M. D. Fetters, M. D., Haggins, A. N., D. K. Harney, D. K., Speight, C. D., & Silbergleit, R. (2021.) Meeting unique requirements: Community consultation and public disclosure for research in emergency setting using exception from informed consent. Academic Emergency Medicine 28, no. 10: 1183–1194. https://doi.org/10.1111/acem.14264
Dwork, C. (2009.) The differential privacy frontier. In Theory of Cryptography Conference, 496–502. Springer. https://doi.org/10.1145/1557019.1557079
Erdos, D. (2016.) European Union data protection law and media expression: Fundamentally off balance.” International & Comparative Law Quarterly, 65 (1): 139–83. https://doi.org/10.1017/S0020589315000512
Erdos, D. (2019.) European data protection regulation, journalism, and traditional publishers: Balancing on a tightrope? Oxford University Press.
European Commission v. The Bavarian Lager Co. Ltd. Case C-28/08 P. Court of Justice of the European Union, June 29, 2010. 2010 E.C.R. I-0000. https://curia.europa.eu/juris/liste.jsf?language=en&num=C-28/08.
Fehr, A. E., Pentz, R. D., & Dickert, N. W. (2015.) Learning from experience: A systematic review of community consultation acceptance data. Annals of Emergency Medicine 65, no. 2: 162–171.e3.
Gupta, N., Martindale, A., Supernant, K., & Elvidge, M. (2023.) The CARE Principles and the reuse, sharing, and curation of Indigenous data in Canadian archaeology. Advances in Archaeological Practice, 11(1):76-89. https://doi.org/10.1017/aap.2022.33
Hardinges, J., Tennison, J., Shore, H., & Scott, A. (2021.) Data trusts in 2021. Ada Lovelace Institute. https://www.adalovelaceinstitute.org/report/legal-mechanisms-data-stewardship/
Houston Police Department. (2021.) General Order 800-03: Critical Incident Video Public Release. https://www.houstontx.gov/police/general_orders/800/800-03%20Critical%20Incident%20Video%20Public%20Release.pdf
Illinois Attorney General. (2024.) 2024 FOIA webinar on law enforcement videos. https://illinoisattorneygeneral.gov/Page-Attachments/2024%20FOIA%20Webinar%20on%20Law%20Enforcement%20Videos.pdf.
Indian Health Service. (n.d.) Institutional review boards (IRBs). U.S. Department of Health and Human Services. Retrieved September 7, 2025, from https://www.ihs.gov/dper/research/hsrp/instreviewboards/
Irani, L. (2023.) Rejection as a form of agency and refusal in computing. Communications of the ACM, 66 (11): 30–32. https://doi.org/10.1145/3630107
Irani, L., Vertesi, J., & Dourish, P. (2023.) The last mile: Where language, culture, and technology meet in data work. In Companion of the 2023 ACM International Conference on Supporting Group Work, 365–68. Association for Computing Machinery. https://doi.org/10.1145/3617694.3623261
Jennings, L., Anderson, T., Martinez, A., Sterling, R., Chavez, D. D., Garba, I., Hudson, M., Garrison, N. A., & Carroll, S. R. (2023.) Applying the ‘CARE Principles for Indigenous Data Governance’ to ecology and biodiversity research. Nature Ecology & Evolution, 7, 1547–1551. https://doi.org/10.1038/s41559-023-02161-2
Kukutai, T., & Taylor, J. (Eds.). (2016.) Indigenous data sovereignty: Toward an agenda. ANU Press. https://www.jstor.org/stable/j.ctt1q1crgf
Lin, R. (2015.) Police body worn cameras and privacy: Retaining benefits while reducing public and officer concerns. Duke Law & Technology Review, 14 (1): 346–65. https://scholarship.law.duke.edu/dltr/vol14/iss1/15/
Lovett, R., Jones, R., & Maher, B. (2020.) The intersection of indigenous data sovereignty and closing the gap policy in Australia. In T. Kukutai and J. Taylor (Eds.). Indigenous Data Sovereignty and Policy. Routledge.
Micheli, M., Jarke, J., & Heiberg, M. (2021.) The datafication of the public sector: Models, strategies, and governance.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 644–54. Association for Computing Machinery. https://doi.org/10.1145/3442188.3445923
National Congress of American Indians Policy Research Center (NCAIPRC). (2014.) Research recommendations. National Congress of American Indians.
National Congress of American Indians Policy Research Center (NCAIPRC). (2017.) Recommendations from tribal experiences with tribal censuses and surveys.
National Congress of American Indians (NCAI). (2018.) Resolution KAN-18-011: Support of US Indigenous data sovereignty and inclusion of tribes in the development of tribal data governance principles.
Navajo Nation Human Research Review Board. (n.d.) About NNHRRB. Retrieved September 7, 2025, from https://nnhrrb.navajo-nsn.gov/aboutNNHRRB.html
New Orleans Police Department. (2023.) Chapter 82.1.1: Records release and security. In New Orleans Police Department Operations Manual. https://nopdconsent.azurewebsites.net/Media/Default/Documents/Policies/Chapter%2082.1.1%20-%20Records%20Release%20and%20Security.pdf
Newell, B. (2021, August 15.) Body cameras help monitor police but can invade people's privacy. The Conversation. https://theconversation.com/body-cameras-help-monitor-police-but-can-invade-peoples-privacy-160846
O’Brien, M., Duerr, R., Taitingfong, R., Martinez, A., Vera, L., Jennings, L. L., Downs, R. R., Antognoli, E., Brink, T. T., Halmai, N. B., David-Chavez, D., Carroll, S. R., Hudson, M., & Buttigieg, P. L. (2024.) Earth science data repositories: Implementing the CARE principles. Data Science Journal, 23. https://doi.org/10.5334/dsj-2024-037
Rainie, S. C., Schultz, J. L., Briggs, E., Riggs, P., & Palmanteer-Holder, N. L. (2017.) Data as strategic resource: Self-determination and the data challenge for United States Indigenous nations. International Indigenous Policy Journal 8, no. 2. https://doi.org/10.18584/iipj.2017.8.2.1
Reporters Committee for Freedom of the Press. (2023.) Police body-worn cameras: A primer
for newsrooms. https://www.rcfp.org/resources/bodycams/
Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020.) The CARE Principles for Indigenous data governance. In Open Scholarship Press Curated Volumes: Policy. https://openscholarshippress.pubpub.org/pub/xx3kj9rv
Rodriguez-Lonebear, D. (2016.) Building a data revolution in Indian country. In T. Kukutai and J. Taylor (Eds.), Indigenous data sovereignty: toward an agenda, 253–272. Australian National University Press. https://doi.org/10.22459/CAEPR38.11.2016.1
Carroll, S. R., Rodriguez-Lonebear, D., & Martinez, A. (2019.) Indigenous data governance: Strategies from United States Native nations.” Data Science Journal, 18 (1): 31. https://doi.org/10.5334/dsj-2019-031
Sambasivan, N., Arnesen, E., Hutchinson, B., & Prabhakaran, V. (2023.) Re-imagining algorithmic fairness in India and beyond. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 85–96. Association for Computing Machinery. https://doi.org/10.1145/3593013.3593989
Savage, S., & Monroy-Hernández, A. (2018.) Participatory militias: An analysis of an armed movement to protect communities in Mexico. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–13. Association for Computing Machinery. https://doi.org/10.1145/3287560.3287577
Da Silva, M., and de Oliveira, D. R. (2018.) The new Brazilian legislation on access to the biodiversity (Law 13,123/15 and Decree 8772/16). Brazilian Journal of Microbiology, 49 (1): 1–4. https://doi.org/10.1016/j.bjm.2017.12.001
US Department of Justice. (n.d.) FOIA.gov FAQ. Retrieved December 1, 2025, from https://www.foia.gov/faq.html.
Exception from Informed Consent Requirements for Emergency Research, 21 CFR § 50.24 (1996). https://www.ecfr.gov/current/title-21/section-50.24
US Food and Drug Administration. (2013.) Exception from informed consent requirements for emergency research [Guidance document]. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/exception-informed-consent-requirements-emergency-research
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J., da Silva Santos, L. B., Bourne, P., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016.) The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18
10.1146/katina-010626-1
Copyright © 2025 by the author(s).
This work is licensed under a Creative Commons Attribution Noncommerical 4.0 International License, which permits use, distribution, and reproduction in any medium for noncommercial purposes, provided the original author and source are credited. See credit lines of images or other third-party material in this article for license information.