Ask the Experts: advancing real-world data in the precision medicine era

Currently, there are many hurdles that need to be overcome with real-world data (RWD) before it can be completely integrated into precision medicine oncology. An important issue is how RWD is being collected. One avenue for RWD collection is biospecimens. However, novel technologies such as ‘lab-on-a-chip’ are starting to emerge.

In our ‘Ask the Experts’ feature, a panel with expertise including research, clinical practice and policy development will provide perspectives on the applications, challenges and future of the collection of biospecimens, and how RWD can be used to advance the field of precision medicine.

Our experts are Lawrence Johnson (ICON, MD, USA), Deborah Collyar (Patient Advocates In Research, CA, USA) and Don Ingber (The Wyss Institute for Biologically Inspired Engineering, MA, USA). Meet the experts here.

Take a look at the discussion below that looks at the major challenges of collecting biospecimens, and the future of RWD and precision medicine.


  Do you think the relevance and usefulness of collecting RWD using biospecimens has changed over the last decade?

  What are the major challenges and implications of using biospecimens in oncology?

  How do you think these challenges can be overcome?

  Are there any emerging technologies that you think could advance the field of ex-vivo RWD in oncology?

  How could advancing RWD collection impact the precision medicine field?

  How do you think the landscape of RWD collection in oncology will evolve over the next 5 years?


Do you think the relevance and usefulness of collecting RWD using biospecimens has changed over the last decade?

Lawrence Johnson: Yes.  Our ability to analyze biospecimens has improved with advances in bioinformatics and Next Generation Sequencing, including whole genome genetic sequencing and gene expression profiling on anything from formalin-fixed paraffin embedded tissue to fresh tissue.  We can now link that data to other longitudinal data, including RWD on individual patients, such as longitudinal complete blood counts (CBC) and other analytes, biomarkers, pathology results, and or radiology results.  This provides a truly holistic picture, making it easier to identify patients suitable for a particular study, to find novel therapeutic targets, to develop better ways of monitoring patients, and to identify new indications for older drugs off of patent.

The paradigm of how we view patients and diseases has changed.  Physiology is not static.  Neither should patients or their disease be viewed as static, but rather as an ongoing process of change. Tumors and the body’s immune response change over time.  A cancer’s stage of genetic evolution will determine how it responds to a given therapy and how the patient’s individual immune system responds to the cancer. In addition, an individual’s immune system response will vary over time.  It is no longer sufficient to make a binary yes or no determination or measurement with regard to a normal or abnormal reference range at an isolated point in time when examining a biospecimen or analyte.  Rather, it is essential to assess where patients are in their disease progression and their ability to respond immunologically as well as to the specific therapy.  That information can now be combined with other information, including RWD such as from the patient’s electronic medical record (EMR), in order to determine the most appropriate intervention for that point in time in the individual patient’s disease process.

Don Ingber: I think that this has always been useful but the amount and quality of data, and richness of information that we can gather has increased enormously with multi-omics, transcriptomics, proteomics and so forth. That being said, yes, I do think the value of collecting RWD using biospecimens has increased, as we can now leverage the data for personalized medicine, diagnostics and therapeutics.

Deborah Collyar: I think it has changed, partly because we were not using the term ‘RWD’ too much before the last decade. Biospecimens have been around for a long time, and of course RWD has been around for a long time but we were not using it in research very much. It is important to point out that RWD is not the same as real-world evidence (RWE). We have to turn it into evidence, and that isn’t what people actually want. What people want is real-world answers.

RWD by itself doesn’t get us very far, it is a raw tool just like biospecimens are, and we have to turn that data into something useful though analytics. This gets us to RWE but to get to real-world answers we have to turn the analytics into something that is actually clinically relevant. What is in this for patients?

What are the major challenges and implications of using biospecimens in oncology?

Lawrence Johnson: There are legal and regulatory hurdles to using biospecimens, and the regulations often differ by region and or country.  For example, the General Data Protection Regulation (GDPR) in the European Union imposes limitations on what can be done with biological samples and what specific patient permissions must be obtained even for new studies using older previously collected and stored specimens.

There are some complicating factors that must be considered when analyzing biospecimens. There is often no provision for stratifying data from specimens or analytes by the study participants’ age and or race, although ‘normal’ physiology changes with age. For example, there is a normal decline in renal function with the normal loss of nephrons as we age, and certain analytes may vary by race such as lower average absolute neutrophil counts in Africans.  Most laboratories fail to account for these differences when establishing a normal reference range.  They use what I view as a flawed, but still acceptable approach, using the current Clinical & Laboratory Standards Institute regulatory standards in order to establish what is ‘normal’ by assuming that 120 subjects is an adequate number to determine a normal reference range without adjusting it for age, ethnicity, or other factors.  Just as it is important to look at relative changes in individuals over time and absolute changes with regard to a reference range, we must remember that the derived reference range itself is often flawed if determined using more conventional methods.  This was best demonstrated to me when I analyzed 1.5 million uniquely matched patients with data going back to 1998 in order to assess CBC reference ranges, finding previously unreported differences by both gender and decade of age as well as what constitutes a normal platelet count in a second or third trimester pregnant patient versus a non-pregnant or first trimester pregnant patient.

When assessing data it is important to look at the velocity of change, the consistency of directionality of change and the relative level of change within individual patients in assessing data from biospecimens and analytes, in order to both determine the significance of a change and to narrow the differential diagnostic possibilities.  For example, a 4 g/dL drop in a patient’s hemoglobin value from 16 g/dL – 12 g/dL would be viewed as normal if the assessor used the binary thinking of normal versus abnormal based on a static reference range derived using only 120 ‘normal’ patients.  However, there is a meaningful difference in the level of significance as well as the differential diagnostic considerations if the drop in hemoglobin occurred over 10 years versus 24 hours, or if this drop were significantly different from whatever average hemoglobin values this individual patient had over the last 10 years.  This points to the value of relative individual longitudinal assessments of data rather than working with static ‘snapshots’ and a view of normal versus abnormal using a flawed reference range.

Don Ingber: I think one of the major challenges is patient-to-patient variability as every patient is an individual. This is frustrating for scientists who work with highly controlled, reproducible models like engineered mouse models. On the other hand, this is what clinicians deal with everyday and this is the real world. This is now our challenge. I think that variability actually can be a positive here, but it means you have to get larger numbers of fresh specimen samples, and it matters a great deal in terms of how quickly you can isolate the cells, or how you fix and collect them. There is also a microbiome in these samples as they are real tissues that have living bacteria. This means complications could arise during culturing. Also, there is enormous variability in different regions of the same cancer. For example, you can get cancerous, necrotic region, pre-malignant and ostensibly normal regions all nearby each other. You really have to do a full characterization using image analysis, immunohistochemistry, genomics and then pursue the cell biology part of it as well.

In terms of implications, I am part of a Cancer Research UK (London) Grand Challenges Grant that is called STORMing Cancer. We are interested in cancer progression, especially inflammation-associated cancers. One of the most exciting things is that we are getting primary tissue specimens and not just doing transcriptomics and multi-omics, but we are using them as sources of cells. In addition to isolating cancer cells, we are also collecting epithelial cells from dysplastic, metaplastic and surrounding healthy appearing regions, as well as associated fibroblasts, endothelial cells and immune cells. My own group builds organ-on-a-chip microfluidic culture devices using these different patient-derived cells to recreate and study these different stages of cancer progression in vitro, and identify how the microenvironment influences cancer development.

Other members of the STORMing Cancer team are doing high-resolution imaging and are able to see what it is like in vivo. Others are working on proteomics, transcriptomics – both in whole genome and single cell. I think we are trying to get the most out of the richness of collecting specimens.

Deborah Collyar: From a patient standpoint, for example, if we need to find out what biological markers change throughout cancer treatment, one of the things that we have been doing for a long time with neoadjuvant therapy are serial biopsies. Patients have a biopsy before and after neoadjuvant treatment, and then after surgery. It is important to consider how these biopsies are collected to avoid any additional invasive procedures. It is really important to ensure clear and careful communication with the potential donors because they need to understand why mandatory collection is important, whether in an existing clinical trial or a separate protocol for future research.

Another challenge is relevance and how to correlate that data into something that can be useful for a broader population of humans. The variability between people is something that we need to be able to address. I know that the organ on a chip offers us a better possibility of getting there, but again the collection of biospecimens must be very precise and specific. I think that this means we have to think about how to collect biospecimens, how to store them differently, how to communicate this information to patients and donors and communicating with patients about what we will be doing with the information. This is important when people decide to donate their biospecimens for research. We need to make sure they are used well, and that the information and data can be shared to advance the field as much as possible.

How do you think these challenges can be overcome?

Lawrence Johnson: With respect to patient privacy laws, it is important to have a clear data collection strategy from the outset, with an understanding of the different regulatory requirements in each region and or country.

However, it is still possible to develop a robust database with very meaningful data on millions of patients, even if data collection, such as through a patient registry, is confined to a single country because of GDPR restrictions due to the statistical firepower provided by such a large number of patients.

Obtaining patients’ specific consent upfront to pseudonymize their biological data and use it in a specific way has the best chance of complying with GDPR, although we have seen some inconsistencies in how ethics committees respond to that approach.  If this was not done in advance and the goal is to link RWD with biospecimen data post hoc, it may be done through a variety of methods such as hashed algorithms or a master patient index unique to the patient and to ICON.  While this method is not perfect in matching data across datasets, it has come a long way in recent years, and the master patient index method has been accepted in the USA by the FDA in my own personal experience.

For data analysis, it is important to look at multiple, moving and static variables of most any kind longitudinally using more sophisticated mathematics such as partial differential equations rather than simple correlation matrices.  Researchers should examine data without any preconceived notions, including what might be meaningful, significant or related, in order to find novel associations and not just those that are seemingly intuitive.  Preconceptions pose a significant limitation in the discovery of novel biomarkers, companion diagnostic assays and the development of precision medicine among other things.

The difficulty of extracting insights from free text or natural language with artificial intelligence (AI) and machine learning can be circumvented, at least to some extent, through the broader use of standardized and comprehensive templates for physicians’ notes, pathology reports and radiology reports, etc melded with technology often used in virtual or remote clinical trials.

Don Ingber: One way to overcome the challenges is by bringing full force to bear and having a multi-interdisciplinary type of approach where many teams with different expertise bring their own analytical capabilities so that you are doing the imaging and high-resolution mapping via immunohistochemistry or in situ hybridization at the same time as transcriptomics, proteomics and in vitro model development. On top of this, having the clinician there to characterize the clinical phenotype of that patient, and the pathologist to characterize the histological phenotype of the region that the specimen was collected from, provides a relevant context without which interpretation of results would be difficult. Then designing the study with an awareness of the clinical intricacy so that experimental conditions and variables explored are as relevant to patient populations as possible, which is not often done in scientific studies because there is such a disconnect between clinical medicine and basic research. I am an MD PhD, so I have some experience with bridging the gap, but in the past, even scientists with clinical training approached their basic research differently compared to how they approached a clinical problem. Now the fields are merging, and we really need to get back to human biology.

Deborah Collyar: It is incredibly important to set standards that are followed by every institution that collects data and biospecimens, which sounds like a tall order, but it can be done. There are a lot of people who want to see that happen and standards have been developed, but we are talking about an institutional culture change. Institutions have been inadequate in the way they approach research that needs to be restructured. Policy bodies and funders need to step up to this and state that these standards must be used, or we will not fund you.

Are there any emerging technologies that you think could advance the field of ex-vivo RWD in oncology?

Lawrence Johnson: We are seeing some emerging technologies for collecting data including histopathology.  Some novel technologies can assess cellular and tissue morphology as well as detect tumors using lasers and or sound waves in conjunction with a supercomputer to construct an image akin to a glass tissue slide.  This represents a marriage of radiology and pathology that can be done without resorting to a sometimes highly invasive tissue biopsy with its associated comorbidities.

At ICON we are also using and developing a platform capable of importing and ingesting disparate data (such as EMR data, longitudinal registry data and biospecimen data), which is key to turning RWD into RWE.  The system is able to standardize and normalize data based on different units of measure and, to some extent, different assay platforms.  This provides more than a data mart or a holding cell for the data.  It has the ability to apply AI and machine learning to the data in order to derive sometimes novel insights.  These types of analytical tools move beyond simple correlation matrices to looking at multiple variables, both static and moving, simultaneously over time.

Don Ingber: Certainly, we use organoids, which is a great advanced technology to isolate stem cells from different organs. Then, we take the cell organoids that are little closed balls surrounded in matrix that are very hard to get access to, break them up and put them into our organs-on-chips that allow us to then interface these epithelial cells with other tissues such as connective tissues, endothelium, circulating and tissue resident immune cells etc I think organoids and organs-on-chips are great advances in that space. The big advantage of organs on chips is that you can replicate host responses inflammatory and immune responses to infections and diseases, such as cancer, in addition to testing drugs under clinically relevant conditions.  For example, you can flow drugs through the organ-on-a-chip like you do in a patient, and create a clinically relevant, dynamic drug exposure profile, and that analyze host responses in a patient-specific manner if you line the chips with patient-derived cells.

I have already mentioned a lot of emerging technologies, including single-cell and bulk transcriptomics, proteomics, metabolomics, glycomics – all the omics are great! There are new types of imaging where they are developing multiplexing so that you can map out three dimensions and screen 200 antibodies at once. In STORMing Cancer, we have team members who are using computational approaches to map out networks of cell-to-cell communications, not just protein–protein or gene–gene, so that we can study the cellular basis of the disease mechanism. The entire biological hierarchy from molecules to proteins, genome to cells, cells to tissues, and tissues to whole organs physiologically linked within the body in a spatially and physically relevant context, that’s really where the future lies.

Deborah Collyar: That isn’t my area of expertise, but I do know about organ-on-chips. I am hoping to work in more detail with them on the project ‘STORMing Cancer’. I think this project has real potential because it uses real samples from real people. I am sure there will still be some issues because the cells will be taken out of the body but again, as long as we have standards that keep viability for biospecimens, I think we have a real opportunity to help create better tools, drugs and interventions that actually work for humans.

I work in oncology as well as anti-infective areas where they have been all about looking for susceptibility or resistance in the bug, but in a test tube and not a human. A human’s immune system, for example, doesn’t come into play when they’re developing new anti-infective drugs, and this is a major issue. This is why I got involved, because cancer patients are immune compromised and some of us have long-term effects from the therapies. Anti-infectives is one of those because our immune systems can be compromised, as we’re seeing with COVID-19 for instance, and that can also be long-term.

All of these issues are important for science to acknowledge and consider. I realize that they cannot be brought into early basic science but somewhere along the translational research path.

How could advancing RWD collection impact the precision medicine field?

Lawrence Johnson: Through the use of RWD, researchers can have a much more complete understanding of patients from a diagnostic standpoint, which helps in knowing if they will respond to a given therapy.  Having RWD also makes it easier to make the business case for developing a companion diagnostic assay, especially if the data provide clues as to where to start development.  This can dramatically cut companion diagnostic development costs and timelines.

There is every reason to believe that as we develop a more detailed map of a tumor’s genome and protein expression, we will identify the same or similar therapeutic targets from patient to patient in conjunction with commonalities such as a tumors’ morphologic appearance using more objective measures provided by digital pathology and image analysis that go beyond what the human eye and brain can discern through a microscope, which is determined, at least to some extent, by the tumor’s genome.  This would mean that we could apply the principles of precision medicine to a larger number of patients, ensuring both a higher rate of patient compliance and better individual patient outcome compared to using less specific, often cytotoxic, chemotherapy.  Traditional thinking suggests that precision medicine narrows the number of patients that might benefit, but this ignores the fact that there are some basic biological commonalities that we are only now starting to discover.

Don Ingber: From my work on organs-on-chips, I can see a future where you can literally have your own personal cancer chip, and your liver chip and kidney chip etc and test drugs for you. They can be linked together fluidically through endothelium-lined vascular channels to test to whole body responses to drugs. For example, an oral drug can be introduced into the lumen of the intestine chip to explore how it is absorbed and then measure how it is metabolized by the liver chip and cleared by the kidney chip, and does it produce marrow chip toxicity?  Genomic and transcriptomic analysis could be carried out to determine whether you have a particular mutation that scales with disease phenotype or response to therapy. Immuno-oncology, which is obviously one of the most exciting fields in cancer now and difficult to study in vitro or in animals, is another example. I think that using all human patient-derived samples and their own immune cells could allow you to study immuno-oncology in vitro using this type of organ-on-a-chip approach.

We are now using computational approaches, including algorithms that can repurpose existing approved drugs for new applications based on multi-omics data to confront the STORMing Cancer challenge as well as many other diseases. We are doing that for COVID-19 right now by leveraging transcriptomics data from COVID-19 patients. The intersection of AI and bioinformatics, with high-dimensional data sets, means that we have access to large numbers of readouts across thousands of genes or proteins or metabolites, as well as powerful new molecular dynamics simulation approaches, which I am confident will lead to new approaches to drug development.

Deborah Collyar: I think the oncology field is leading on understanding more about biomarkers and the interaction of those biomarkers in humans. That is where we are developing better agents and better approaches through immunotherapy, and in combination with other therapies.

RWD collection can help us understand some of the differences in cancer cell responses to different treatments because of the variability between people. Systems like organ-on-chips maybe able to help us if they can get their hands on different samples from different people and start to understand variability.

I do want to say a little bit about things that sometimes the research field takes for granted or assumes is correct. For instance, in the USA, the question to begin with is how many EMRs do we have? The data can look very different depending on who entered it in, and there are many times where patients know there are errors in their electronic health records and cannot get them corrected. When you start with data errors, you are not going to get relevant answers. It can literally be ‘garbage in, garbage out’.

How do you think the landscape of RWD collection in oncology will evolve over the next 5 years?

Lawrence Johnson: All signs indicate that the use of RWD will explode, especially as health systems are forming partnerships or consortia and beginning to share data.  Interestingly, including data from more generalist health systems with major cancer centers, will address the almost inherent referral bias seen in data from major academic centers.  In general, we will see more collaboration between stakeholders to leverage the statistical firepower of their combined data sets of both more generalist patient data sets, which might serve as controls, and more specialized patient data sets increasing the yield of patients with rare diseases.  For example, we expect to see partnerships between sponsors, insurers, and governments in various permutations.

It is highly likely that we will utilize more virtual or remote clinical trials, including the benefits of remote sampling devices. We already have devices to assess blood glucose, blood pressure and some movement disorders, and we can expect to be able to do something similar to quantify other analytes.  Ultimately, this should lead to faster investigator and patient recruitment, more consistent, reliable and easier to analyze data, greater patient participation, greater patient retention, shorter times to completion of clinical trials and increased cost savings.

Currently, we are using rather primitive methods of data analysis.  We will move away from simple correlation matrices of static data points at specific points in time to more sophisticated mathematical data analysis.  For example, rather than restricting ourselves to looking at absolute changes in values versus a binary and most likely flawed reference range obtained from a small population, we will be able look at relative changes over time within an individual patient.  We will be able to take into account other features including the magnitude of change within the individual patient, the velocity or temporality of change within that patient, and the directionality of change.  We will also collect new types of data in an ongoing manner, often using remote, non-invasive devices that previously required a visit to a doctor or hospital and an invasive technique.

Don Ingber: It is happening so quickly, but I think right now it’s all one-offs. For example, we had to get the STORMing Cancer grant to enable our clinicians to collect living cells from patient surgical resections that we can use to construct organs-on-chips at the same time we are taking neighboring tissue samples for histological, transcriptomic, and proteomic analysis. The transcriptomics and the histology are already happening often in the cancer community. I hope that in the next five years it is done in a more systematic way along with living cell isolation. Certainly, there are groups that are collecting and multiplexing slides of histological sections that you can buy. There are companies that isolate cells or sell you whole organs, but there is not a concerted effort to standardize ways to isolate patient-derived cells and make organoids or culture cells like fibroblasts or endothelium from the same patient, which then can be used to build organs-on-chips or be made available to other groups that study in various ways.  I think that would be a huge plus for the entire cancer community.

I think as people find benefit; it will happen. Some drug companies are beginning to collect immune cells from patients for immuno-oncology studies, and a few are even carrying out some of these studies with organs-on-chips, but this is being done as one-offs. It’s not done clinically yet. I hope that we get there soon.

Deborah Collyar: It is really important to bring in a multi-stakeholder approach where we have different fields of science as well as clinical science and patient representatives working on these things together. It truly can be the bench-to-bedside and back again approach. I think that is the only way we will actually be able to create RWD collections and biospecimen collections that represent the true human situation and conditions that we need to show what will be relevant to actual people.


Meet the experts:

Lawrence Johnson

Johnson, MD, FCAP, FASCP oversees ICON’s global network of fully accredited laboratory facilities to support large, global studies. He has an interest in the use of big data, RWE and novel methods of data assessment. He performed the largest CBC reference range study, looking at 1.5 million uniquely matched patients with data going back to 1998, finding differences by both gender and decade of age. Lawrence is board certified & recertified by the American Board of Pathology in clinical, anatomic and hematopathology. He also serves as a College of American Pathology (CAP) inspector and checklist reviewer.


Don Ingber

Ingber is the Founding Director of the Wyss Institute at Harvard University, Judah Folkman Professor of Vascular Biology at Harvard Medical School and Boston Children’s Hospital, and Professor of Bioengineering at the Harvard John A. Paulson School of Engineering and Applied Sciences. He received his B.A., M.A., M.Phil., M.D. and Ph.D. from Yale University. Ingber is a pioneer in the field of biologically inspired engineering, and his work has led to major advances in cancer research as well as mechanobiology, angiogenesis, tissue engineering, systems biology, nanobiotechnology and translational medicine. Ingber has been a recipient of DoD Breast Cancer Innovator Award, American Cancer Society Faculty Research Award, and Cancer UK Grand Challenge grant.  He is also a member of the National Academy of Medicine, National Academy of Inventors, American Institute for Medical and Biological Engineering, and the American Academy of Arts and Sciences.

Deborah Collyar

Deborah Collyar has been a patient engagement leader since her first cancer diagnosis. She founded Patient Advocates in Research (PAIR) international communication network in 1996. Deborah infuses hundreds of patient advocates into research programs and delivers innovative ways to gather input from thousands of patients. Her work encompasses many diseases, programs and policies at grassroots, national and international levels and emphasizes patient issues throughout development, clinical trials and health literate communication with providers and patients. She is a speaker, blogger, author, team member, trainer and faculty at professional workshops.


Interested in finding out more about biospecimens? Visit ICON