12th

June

2025

Data Transparency Autumn Event - Call for Papers is Open!
5 days to go!
Get Involved

18th

June

2025

Emerging Trends & Innovation Community Forum
11 days to go!
Get Involved

25th

June

2025

June Webinar Wednesday
18 days to go!
Get Involved

26th

June

2025

Risk Based Quality Management Webinar
19 days to go!
Get Involved

4th

July

2025

APAC Connect 2026 - Call for Papers is Open!
27 days to go!
Get Involved

16th

November

2025

EU Connect 2025 – Registration is Open!
162 days to go!
Get Involved
  1. Home
  2. /
  3. Communications
  4. /
  5. PHUSE Blog
  6. /
  7. Data Synthesis Platform Available for PHUSE Members

Data Synthesis Platform Available for PHUSE Members

At a PHUSE workshop in September we organised a half-day session on synthetic data and its applications. This was hands-on with the attendees, using R to synthesise datasets and evaluate the utility of the generated data. The response after the workshop was positive and there was strong interest in providing a broader capability to PHUSE members to learn more about data synthesis.

We are now making available a data synthesis platform, as part of the PHUSE Open Data Repository (PODR), in partnership with Replica Analytics Ltd. This is available free for non-commercial purposes and allows users to gain first-hand experience with data synthesis.

Data synthesis is an analytic approach for creating “fake” data. This means that a generative model is trained that captures the statistical properties and patterns in the original data. This generative model is then used to produce new synthetic data. Therefore, the generated data are produced from the model and do not have a one-to-one mapping to the original data, but still retain its analytic utility.

There are two data synthesis tools available on the PODR.

The first is an interactive tool for data synthesis. This allows users to upload datasets, synthesise these datasets, and then generate comprehensive utility metrics to see how similar the generated data are to the original data. A simple workflow modelling approach is used to define the data sources and the data transformations that should be applied to the data through that pipeline. Those accustomed to working with data will be familiar with the general workflow modelling approach.

The second is an R package that implements the same synthesis and utility evaluation functionality. The R package is available in a Jupyter Hub, which is configured to communicate with the synthesis engine. The interactive tool and the Jupyter Hub work together, and therefore synthesised data can be exported to the Jupyter Hub and further analysed there. The combined toolset will allow users to synthesise data and analyse them within the same environment, moving easily between them.

The ultimate objective is to demonstrate the capabilities of data synthesis, and enable the user community to learn about this technology, which is gaining interest within the health, and other, sectors. The PODR synthesis platform will be updated over time to include additional capabilities and to incorporate feedback from this community.

If you are a PHUSE member and would like to get an account and receive additional information about the data synthesis platform, please email workinggroups@phuse.eu.

Related Blogs