Abstract
There is an increasing demand from the scientific community for access to images. To meet this need, we must establish robust anonymisation rules to ensure data privacy and compliance. Sharing images is the next level of data sharing and transparency.
The PHUSE Anonymization of Imaging Data Working Group is inviting the community to review a new guideline focused on DICOM metadata, pixel-level anonymisation, defacing practices, and workflow considerations for external data sharing. This guideline reflects extensive cross-industry collaboration and aims to establish standards that protect participant privacy while preserving research value. Community feedback at this stage is critical for ensuring it meets the established anonymisation standards, is technically rigorous, globally applicable, and reflects real-world research needs. By participating in this review, stakeholders advance their shared goals of transparency, interoperability, and responsible data sharing across the life sciences ecosystem.
Executive Summary
The pharma industry needs consistent, modern standards for anonymising imaging data as AI-driven re-identification risks grow and data sharing expands. A collaborative Imaging Data Anonymization Guideline is now being endorsed by PHUSE to establish practical, minimum requirements for anonymising DICOM metadata, pixel data, and workflows. Community review is essential to ensure the guideline is ready for real-world use. Your feedback will help shape a standard the entire industry can rely on.
Why PHUSE Is Supporting This Guideline
Across pharma, academia, and technology partners, imaging data is becoming a cornerstone of clinical development, AI/ML research, and real-world evidence generation. Yet despite its expanding value, imaging data remains one of the least standardised and most challenging data types to anonymise effectively.
The Imaging Data Anonymization Guideline directly addresses this gap. Developed by imaging experts, data scientists, regulators, and technology partners, the guideline sets clear expectations for anonymising DICOM metadata and pixel data, handling private tags, applying defacing techniques, and enabling responsible external data sharing.
As a neutral, cross-industry collaboration forum, PHUSE is uniquely positioned to convene diverse expertise and promote consensus-driven standards that reduce fragmentation, increase confidence, and protect participant privacy across the research ecosystem.
What the Guideline Covers at a High Level
The guideline delivers a comprehensive, end-to-end view of imaging anonymisation requirements, including:
1. DICOM Metadata
Guidance for high-risk tags, sequences, private tags, UIDs, dates, textual fields and consistent transformations across multimodal datasets.
2. Pixel Data Protections
Requirements for detecting and removing burned-in annotations, addressing identifiable anatomical structures and selecting validated defacing methods.
3. File Naming & Workflow Considerations
Recommendations for folder structures, file-label handling, QC processes, alignment with clinical datasets, and operational feasibility.
4. Practical Minimum Standards
Clear, implementation-friendly recommendations balancing privacy protection with scientific utility – aligned with HIPAA, GDPR, and DICOM de-identification profiles.
Together, these components establish a repeatable, defensible baseline for anonymising and sharing imaging data consistently and securely.
Why Community Review Is Essential
AI-assisted re-identification, increasingly multimodal studies, regulatory expectations, and the growth of imaging-based RWE all underscore that no organisation can solve anonymisation challenges alone.
A community review allows us to:
- Validate the guideline across diverse infrastructures, modalities and workflows
- Ensure feasibility for sponsors, CROs, academic centres and smaller institutions
- Identify edge cases, operational challenges and areas needing clarification
- Strengthen global alignment and build trust across stakeholders
- Refine the standard for broad, practical adoption.
Your perspective – technical, operational, regulatory or scientific – will directly influence the quality and usability of the final guideline.
How This Work Strengthens Industry Collaboration
Clear anonymisation guidelines unlock something the industry has long struggled with: safe, scalable image sharing across institutions, vendors and therapeutic areas.
By aligning on what robust anonymisation looks like, the guideline helps organisations:
- Share imaging datasets confidently for secondary research
- Enable AI model development, validation, and benchmarking
- Participate in multi-site and federated learning initiatives
- Submit consistent and compliant imaging data to regulators
- Reduce redundant work across sponsors and CROs
- Build interoperable, reusable imaging pipelines.
In practice, the guideline becomes a foundation for collaboration, innovation and responsible data use across the pharma ecosystem.
Escalating Re-Identification Risks in an AI-Driven Era
As AI models grow more sophisticated, traditional anonymisation methods for medical imaging are no longer sufficient. Facial reconstruction and recognition technologies now match MRI head scans to publicly available photographs – even after standard defacing – introducing new and significant privacy risks.
The Evidence Is Clear
Research studies examining worst-case re-identification scenarios have shown:
- Re-identification rates approaching 50% when matching anonymised MRI reconstructions against very large public image databases (in the hundreds of millions of images)
- Open-source facial recognition tools achieving nearly 60% accuracy, demonstrating that risk is not limited to commercial systems
- Diffusion models capable of memorising over a third of training samples and reproducing more than two-thirds in near-identical form, raising serious concerns about unintentional data leakage in imaging AI pipelines.
These findings emphasise a simple truth: anonymisation must evolve as quickly as the models that threaten it. The guideline incorporates contemporary evidence to strengthen pixel-level protections and recommend defacing techniques resilient to modern AI capabilities.
Privacy Barriers to Federated Learning and Multi-Site Collaboration
Federated learning (FL) has enormous potential to advance medical AI by enabling institutions to train models without sharing raw data. Yet, inconsistent anonymisation remains one of the largest obstacles preventing sites from participating.
What the Research Shows
- Reviews across oncology, neurology and rare disease imaging identify non-standard anonymisation as a top barrier to launching and scaling FL initiatives.
- Privacy and compliance concerns remain the primary reasons why institutions decline to join multi-site imaging studies.
- Leading FL frameworks stress that consistent de-identification rules are essential for minimising leakage risk and enabling cross-site comparability.
The guideline provides a harmonised baseline that simplifies onboarding, builds trust and accelerates collaborative AI development.
The Growing Demand for Standardisation in Data Sharing
The research community’s appetite for high-quality imaging data is expanding. However, many datasets remain inaccessible because contributors lack confidence in anonymising imaging safely and consistently.
This unmet need slows progress across:
- Multicentre clinical research
- AI and machine learning development
- Real-world evidence generation
- Regulatory submissions requiring imaging data.
Proof Points
- Major repositories such as TCIA have highlighted challenges with inconsistent de-identification and modality gaps, limiting downstream usability.
- Efforts such as the OMOP Medical Imaging Extension demonstrate that imaging cannot be fully integrated into modern research ecosystems without harmonised anonymisation and metadata standards.
Clear guidance directly removes a major barrier to unlocking imaging datasets for broader research use.
Meeting the Unmet Need: The Research Community Wants Access to Images
Researchers consistently report that imaging datasets are among the most difficult to access due to unclear anonymisation expectations, regulatory uncertainty and operational burdens.
By establishing practical, consensus-driven minimum standards, the guideline:
- Increases confidence in data sharing
- Reduces institutional privacy and compliance risk
- Encourages broader participation in repositories and consortia
- Expands access for AI researchers seeking high-quality, compliant training data.
In short, the absence of standardised anonymisation guidance has slowed scientific progress – this document helps change that.
What You Can Do Now
Here’s how you can contribute:
1. Review the guideline and share your feedback.
Your insights will help refine the standard for real-world use.
2. Share the guideline across your imaging, biometrics, data management and AI teams.
Broader visibility accelerates internal alignment.
3. Highlight edge cases, operational realities, or institutional needs.
These contributions guide future supporting materials.
Looking Ahead: Materials Coming Soon
Based on community feedback, the Working Group plan to develop:
- Technical implementation guidance
- Detailed workflow examples
- Expanded recommendations for modalities or use cases beyond the initial scope.
These additions will not replace the guideline, but extend its practicality and help organisations adopt and operationalise the standard effectively.
Conclusion
Imaging data is one of the most powerful assets in modern research – but also one of the most sensitive. With re-identification risks rising, AI capabilities accelerating, and multi-site collaboration expanding, the need for clear, standardised anonymisation guidance has never been more urgent.
The Imaging Data Anonymization Guideline is a major step forward, but its strength depends on the community shaping it.
Your feedback ensures we build a standard that is usable, scalable, and ready for the future.