Enhancing Line of Therapy Derivation Using Real-World Data From Electronic Health Records Via Integration of Medical Claims Data
In an interview with the Journal of Clinical Pathways, Smita Agrawal, PhD, executive director at ConcertAI, speaks about her study on deriving lines of therapy by integrating data from multiple data sources to improve accuracy. The study was presented at the 2023 ASCO Annual Meeting in Chicago, IL.
Transcript:
Dr. Smita Agrawal: Hi, I'm Smita Agrawal, executive director at ConcertAI.
Can you give some background about your study and what prompted you to undertake it?
Dr. Agrawal: Yes. ConcertAI is a real world data company and we aggregate real world oncology data from hundreds of clinics from across the US. This data is then standardized and normalized and made fit for research purposes for our clients, mostly pharma clients who then use this for various research studies, even for doing trial design and doing feasibility analysis, and even for health economics and outcomes research.
So for all of those use cases, one of the key things that is required is identifying the lines of therapies that these patients are treated with along their patient journey.
This information is not readily available in the EHR data. What we get there is just a discrete mentions of the drugs that these patients take during their clinical journey. We have developed an elaborate framework to then derive the line of therapy for these patients from this data. However, our biggest data asset, which is our RWD360 data set, which contains more than 7 million patient lives, is actually derived only from the structured data, it's not curated. So there is, as everyone knows, a little bit of messiness and noisiness in this data.
Our framework deals with a lot of the noisiness in the data.
To counter the messiness we decided to integrate other data sources from where we can get information about the drugs patient were getting. For this study, we used a claims data set where we saw more than 90% overlap with our EHR data.
That's the reason we went ahead and did this analysis to see if we could derive more accurate lines of therapies for this data set.
Can you briefly describe how the study was conducted?
Dr. Agrawal: Yes. As I mentioned, we had already developed an elaborate framework for deriving lines of therapies from the EHR only data. We used the same framework, but then integrated an open claims dataset, which has more than 90% overlap with our RWD360 dataset. And then we did various analysis to see what was the best way to then use this data in an integrated fashion.
What we found was that the easiest way to integrate this data and get more accurate lines of therapy was to integrate the medication data coming from the claims with the data coming from EHR, and then build the lines of therapies using this integrated data set.
There were some nuances in how we integrated this data based on the framework, but overall it was pretty simple process once we had that framework in place.
What were the key findings of your study?
Dr. Agrawal: Yes. As I mentioned, because real world data from pulled from EHR can have significant messiness, we were expecting some enhancement in our line of therapies, and that's what we observed.
Our foundings were multifold. First of all, we found that there were many patients for which previously we were not able to derive a line of therapy. But now with this integrated data, we were able to derive lines of therapies for these patients. There was an increased somewhere in the range of 6 to 34% in the accuracy of the lines of therapies that we were deriving as well.
When I say accuracy, I mean we could derive lines of therapies for a larger number of patients. We could derive more lines of therapies for the same patients. So if there was gaps in their therapies that we were observing from the EHR only data, this was getting filled in with the claims data. The lines of therapies of some patients were longer, and in some cases there were more drugs in a particular line of therapy.
So all of these together resulted in us deriving better and more accurate lines of therapies for these patients.
Of course, we did extensive validation on this integrated derivation of our line of therapies because we do have access to the unstructured data as well. So we could go into the notes and then confirm in cases where we were deriving different lines of therapies from the claims, whether they were more accurate than the lines of therapies derived only from the EHR data.
Is there anything else you would like to add?
Dr. Agrawal: There are many real world data sources out there, everyone has their own derivation of line of therapy, which may be slightly different, but we hope that this framework can be adopted universally and we hope that more and more people start to use multiple data sources together to derive better and more accurate lines of therapies, which in turn with benefit the studies that these data sources are being used for downstream.
© 2023 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of the Journal of Clinical Pathways or HMP Global, their employees, and affiliates.