Why data science is more than just a placebo for delivering better clinical trials.

In the wake of Covid-19, the development of a vaccine against the virus has become crucial for the world to begin to overcome the chaos and devastation the virus is continuing to cause.

As part of the drug development process, clinical trials are critical to delivering a new treatment for the coronavirus and are a natural requirement to meet stringent drug-approval regulations. However, there are two facts that stand out when we begin to talk about clinical trials. Firstly, a single clinical trial can easily cost more than $100 million, and secondly, only 14 percent of drugs in clinical trials go on to secure regulatory approval. There is an opportunity to reduce the cost, complexity and time involved in completing clinical trials with the right platforms and processes in place, for example with the adoption of less static and more adaptable open source technologies.

Former FDA Commissioner, Dr. Scott Gottlieb, viewed this as a strategic imperative, particularly when it comes to some of the biggest dreams in healthcare today – effective, efficient precision medicine. To get there, Gottlieb recognised something that industries have been working on for a while: the need for better data – better data sharing to combine different data sources, better data integration to bring in insights from electronic health records, and better data integrity to support organisations like the FDA or the MHRA in making informed decisions to deliver better health outcomes. In short, it is data science that could prove transformative in taking today’s healthcare innovations ensuring a successful clinical trial to deliver tomorrow’s treatments – Covid-19 included.

The healthcare research industry is no stranger to statistical analysis, yet there is still much it could learn from a more holistic data science approach. After all, while it seems that some of the “fail fast, fail often” attitude of Silicon Valley is echoed in the current approach to success in clinical trials, there are tools and approaches from the data-driven business model that clinical trials could learn from. There are three phases of the process where this warrants consideration.

Trial planning

Nearly every stage of the clinical trial process is marred by challenges in enrolment timelines – according to a study from 2014, roughly 80% failed to meet their timelines, and 30% of phase III trial terminations were due to enrolment difficulties rather than issues with the drug itself. Making informed decisions based on data can help with every step from trial design and assessment of operating characteristics through to site selection and participant recruitment. For example, historical site performance characteristics, combined with data on the rate of the illness in a population and competitor intelligence on trial sites can all be combined with the right expertise to find the best sites for efficient drug trials. Meanwhile, both existing patient data files and social media or forum data could all help find and recruit study participants. However, this is only feasible if researchers can adopt the types of natural language processing tools that other businesses use to ingest and analyse this type of unstructured raw data.

Trial execution and monitoring

Trial monitoring and helping trials progress through each stage of the process is one of the most impactful areas for data science for a number of reasons. Firstly, before any trials even begin, researchers can model the quantitative characteristics of the proposed study design to predict the probability of being able to make a decision at the end of a trial, which helps de-risk the initial expenditure.

In phase one trials, researchers need to understand how a drug performs, what effective dosages could be, and any potential side effects. In a standard statistical analysis, researchers may be able to correlate a dose and an effect but may miss vital contextual data that could better inform treatment. In a data science approach, all variables are included – and data such as patient demographics can be cross-correlated with both treatment efficacy and side effects to understand the impact of different variables (age, weight, other conditions, diet and so on) on treatment. This can also provide wider context on the important factors affecting treatment that can shape the future phases of the study, as well as inform a potential precision medicine strategy for the drug. In phases two and three, data science-driven approaches to overcoming data siloes, a common problem in modern enterprises where departments have historically worked in isolation and embracing diverse data sources can also help to create better models of how the drug will perform outside of the clinical setting and predict real success. All of this is important for creating both a successful trial, and a successful long-term product.

Trial reporting

Finally, good data practices and a successful trial outcome are meaningless without being able to prove the integrity and accuracy of data to the regulatory body at the end of the study to gain approval. By capturing model data and producing an audit trail of all decisions made and the reasons behind it, researchers have the best chance of convincing regulators that a successful trial outcome is enough to green light the drug. Think of it like accounting: how many multi billion-dollar corporations are balancing their books by hand? The same should hold true for proving regulatory compliance.

All of these create a compelling case for how better data-driven decision making could lead to more efficient, effective clinical trials for a coronavirus vaccine. However, it’s not as easy as flying in a team of data scientists or a black-box AI solution to try and magically generate the right answer. Subject-matter expertise is critical for interpreting and understanding the data and insights that comes out of these types of datasets – while in-depth data science skill sets may not yet be prevalent among the researcher community. In this likely scenario, it’s about building a cross-departmental collaboration of technologists and scientists who can work together to deliver the best outcome.

The benefit of this over some advanced plug-and-play solution is one of trust. Only by creating a common understanding of the goals of the project can the data scientists ensure their models and solutions meet the needs of the researchers, and can the researchers feel confident in making decisions based on the findings of the data scientists over more traditional regression and t-test type statistics. But with a shared understanding and common goals, adding effective data science initiatives into the clinical trials process will be critical not just to help drug companies improve the efficiency of their go-to-market and approvals process, but in delivering real differences in clinical outcomes for patients who are currently, or may be, affected by Covid-19.

by Bruce Seymour, Account Director, Life Sciences at Mango Solutions