5th

June

2025

Estimands in Real-World Evidence Studies Webinar
10 days to go!
Get Involved

4th

July

2025

APAC Connect 2026 - Call for Papers is Open!
39 days to go!
Get Involved

16th

November

2025

EU Connect 2025 – Registration is Open!
174 days to go!
Get Involved
  1. Home
  2. /
  3. Communications
  4. /
  5. PHUSE Blog
  6. /
  7. Evolving Trends in Regulatory Decision-Making: The Integration of Real-World Data (RWD)

Evolving Trends in Regulatory Decision-Making: The Integration of Real-World Data (RWD)

– Sugandha Dangi, RWD Manager, Biostatistics, GSK

What is the main difference between Phase IV (postmarket research) and RWD?

Real-world data is collected from locations such as physicians’ offices, pharmacies, billing centres, hospitals and claims offices.

When we integrate data sources, we call that real-world data. Postmarket research, on the other hand, is a broader term. If I want to conduct postmarket research – for example, on a newly launched drug by GSK – I would study how patients are responding to the drug. This could include tracking drug uptake and assessing its safety profile.

This research can be done in several ways:

● Using real-world data, such as patient records and health databases

● Referring to white papers written by experts

● Conducting primary market research, such as surveys and discussions with doctors.

Postmarket research is a broad term, and real-world data serves as one of the tools for postmarket research.

How do you transform data in different formats into OMOP format?

As programmers, my team don’t generate the data themselves. Instead, they purchase data from vendors – Fortune 500 companies specialising in big data. I’ve worked with one such data vendor – IQVIA.

These datasets are already transformed and formatted how my team need. They are ready to use, HIPAA compliant, and anonymised. We simply ingest these databases into our systems and we’re ready to go.

Is RWD helpful for new drugs or existing drugs?

I would say both. We do a lot of postmarketing studies where we analyse the safety profiles of drugs and track the uptake of new drugs (ones that have just been launched).

We study both patient uptake and physician uptake, looking at which regions have more patients using the drug and which physician specialties are adopting it. We also check for off-label use and track how many patients are switching drugs – and which ones.

This level of detail helps us see whether our drug is being added to the standard of care. A patient might already be on a treatment, and for better efficacy or results, the physician might prescribe our drug as well. We also analyse patient adherence – whether they use the drug for five months and then drop off. If they stop, then why?

Real-world data won’t tell us why, but it forms the basis for further research into patient compliance. Therefore, we can answer many of these types of questions using real-world data, both for new and existing drugs.

How accurately does RWD reflect real-world conditions? Sometimes real conditions are not reported, such as AES

When we talk about the USA, real-world data is quite reliable, because most of the population is covered through insurance. The younger population is covered through commercial insurance, while the older population is covered through Medicare. This data is generally available and considered reliable.

When it comes to adverse events, it’s crucial to conduct a feasibility analysis of the data. If I know the specific adverse event I want to report, I’ll assess the data sources I have. For example, I might have four or five licensed datasets, or I could reach out to external vendors with non-licensed data. If I find that my in-house data poorly represents the adverse event of interest, I’ll check if vendors have better coverage.

So, yes, this is an important consideration. Ensuring good representation in the real-world data helps us generate solid results and meaningful interpretations.

There is a variety of wearable devices with different technologies for measuring parameters. Are there any standards imposed before selecting the type of device or manufacturer?

There’s always a different team in real-world data that handles this – who are more on the data science side. I know certain standards need to be imposed before selecting the type of device or manufacturer.

That said, I don’t have much insight into digital biomarkers specifically. But I do know that a vendor is always selected, a contract is drawn up and then the details of the type of device or manufacturer are finalised. I don’t have in-depth knowledge of this, but if we’re talking about OMOP CDM, there are a few steps involved:

  1. Profiling the available data
  2. Mapping and standardising
  3. Using SQL ETL such as WhiteRabbit
  4. Testing and then launching into production

How do we get access to the RWD dataset? Are there any good vendors or providers?

Well, no real-world data is good data or bad data. There are a lot of vendors in the market specialising in indications, so it completely depends on your requirements.

For example, if a company specialises in oncology, there are many oncology-rich datasets with proper biomarkers and variables for maintenance therapy. If you’re looking at vaccines, then there are different sets of databases. For rare diseases, there are registries specifically for those.

What are the different challenges of working with RWD in programming compared with Phase III studies?

Real-world data is huge and messy. Keep in mind that this is data filled in by administrators in doctor’s offices or in pharmacies. They’re human – mistakes happen. For example, if I want to see a lab value and need a unit, say milligrams (MG), it might be written as MG (uppercase), mg (lowercase), milligram or some other variation. Programming-wise, this doesn’t work.

Some people might even write it in grams, so we then have to convert it into milligrams. This type of standardisation is one of the biggest challenges we face – it’s very time-consuming.

The second challenge is missing data. There are many instances when key information, like a patient’s date of birth, is missing or left blank. If, for example, their weight is missing, how do we handle that? In these situations, we ask a team of statisticians to impute the data.

In India, not all prescriptions are digital nor are all healthcare details documented. How do we then ensure reliability of RWD? Have you encountered this challenge during regulatory submissions?

I haven’t worked with the Indian market yet. I’ve primarily worked with the UK and the US, where there aren’t these challenges. My colleagues in India who work with real-world data often mention these issues, and it’s why real-world data in India isn’t considered very credible.

That said, in India, I’m sure they take a sample size and then extrapolate the results. However, these results tend to have a high error margin of plus or minus 20%, which is significant. In the UK or the US, we typically have an error margin of plus or minus 2%. So, yes, this is definitely a challenge.

Can you share any insight into data quality and standardisation issues/challenges?

The data vendor takes care of almost 70% of the data quality. They handle the standardisation of the real-world data, which typically includes information such as your name, prescription, age and the doctor who saw you. This data is then recorded in rows and columns. So, at this stage, the vendors are standardising the data.

But the job doesn’t end there. When the data comes to us in rows and columns, we, as programmers, still find a lot of inconsistencies. The units might be missing when dealing with lab data, the age might be missing, and sometimes the diagnosis code is incorrect.

This is especially true in electronic health record data, where a lot of the information is text-based. Anything in text is a programmer’s worst nightmare. To handle this, we’re increasingly turning to machine learning algorithms to standardise the data.

How do you handle missing data, especially in retrospective studies?

If the sample size is large, we generally don’t take the missing data into account and tend to exclude those patients. So, first, we do a feasibility check and look for missing data. If there’s a lot of missing data, we typically turn to other data sources. If not, we pass it over to a team of statisticians who will impute the data for us. As programmers, we don’t usually handle that part.

Screenshot 2021-02-03 at 12.12.30.png

Webinar Wednesday.jpg

Sughanda.png

Sugandha answered these questions live in February’s Webinar Wednesday – watch it here (from 33:34).

If you have further questions for Sugandha or you’d like to know more about her answers above, reach out to communications@phuse.global.

Screenshot 2021-02-03 at 12.12.30.png

Generic SDE.jpg

Explore insights from her presentation at the Bengaluru Single Day Event in November 2024 – her slides are available here

SDE Bengaluru.png

Screenshot 2021-02-03 at 12.12.30.png

#32888 Phuse Real World Data Spring Event Banner Dec 2024 v1.jpg

If you’re interested in real-world data, catch up on the insights you missed at the Real World Data Spring Event.

Held virtually 9–10 April, the event brought together experts to discuss the latest advancements, challenges and innovations in the field.

Presentation slides and session recordings are available here.