Skip to main content
Cover Story

The Use of Artificial Intelligence in Frozen Section Histopathology

February 2024
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of The Dermatologist or HMP Global, their employees, and affiliates. 

vidal
Nahid Y. Vidal, MD, FAAD, FACMS, is a double board-certified dermatologist and dermatologic surgeon and currently serves as chair of the division of dermatologic surgery at Mayo Clinic in Rochester, MN. She also serves as principal investigator for innovative research focused on generative artificial intelligence in cutaneous surgery and oncology.

There have been numerous technologic advances focused on artificial intelligence (AI) in digital pathology applicable across specialties. AI models typically focus on either augmenting clinical decision-making, such as tumor detection and identify- ing inflammatory disease, or on workflow tasks related to pathology, including tissue and slide quality evaluation, education, and segmentation. Most research and innovation in digital pathology has focused on permanent sections (formalin-fixed). We recently conducted a systematic review of the use of AI in frozen section histopathology.1

Frozen sections are most notably used in Mohs micrographic surgery where the Mohs surgeon is both extirpating tumor and interpreting frozen section pathology to ensure 100% margin clearance intraoperatively. However, there has been a school of thought that frozen section pathology has less value. Because of the nature of frozen sections, where the tissue is frozen quickly, this can result in artifacts such as ice crystals or fragmentation of the tissue. As a result, many assumed that digitized frozen sections would be lower quality images and that AI models could not be successfully created. We focused our review on understanding the current state of AI for frozen sections and quickly learned that there was a paucity of literature on the topic. We hope the review will inform and inspire dermatologist-innovators to lead advancements in this area and to have a seat at the table.

Key Findings

One of the key findings from our review was that despite the relevance of frozen sections to Mohs surgery and other surgical specialties that rely on frozen tissue diagnosis during surgery, there is a paucity of AI research on frozen sections. Our review only included 18 studies that fulfilled inclusion criteria. Overall, we found that models trained on datasets which included frozen sections could be more generalizable and often performed better when applied to different types of tissue preparations, whereas models developed using only formalin-fixed permanent sections did not do very well when tested on frozen section images. The main advantage of using frozen sections in AI training is that it improves the diversity of the training sets in pathology-based AI and, thus, the generalizability.

The second key finding was that much of what is published, not just in frozen section AI but AI in general, is missing key pieces of information necessary for us to know the true value of the model, and how to replicate or fully interpret the results in context. It might be impressive that a model has an AUC [area under the receiver operating characteristic curve], or essentially a performance of 0.8 or 0.9, but it is hard to know the translational value to clinical practice—does it serve the needs of the specialty, or the needs of the patient? For many of the studies, it was hard to determine the relevance of the model’s success in clinical practice without having the information to assess the study or publication. There is also a trend of publishing impressive performance numbers of a model without necessarily providing all the critical pieces of information to assess bias and generalizability.

research box

We also learned that many studies did not fully capture the range or scope of ability that a multi-rater or multi-observer study would have provided. The gold standard when we are looking at traditional studies is to have multiple observers or raters with minimal variability between them. In reviewing the AI frozen section literature, there was often only 1 physician being compared to AI model performance, which is hard to generalize beyond that individual’s skill and ability. And because many studies relied on the interpretation of a single reader, one also wonders whether the results were prone to error, or if any bias was introduced to the model. When the physician’s performance was compared to the model’s, the physician’s native environment was not always used. For example, as a Mohs surgeon, I am looking at physical glass slides to render frozen pathology diagnosis and margin analysis in real-time without waiting for digital scanning. In almost all the AI frozen section studies, a model’s performance was compared to an expert looking at a whole slide digital image on a computer. This potentially results in underestimating the physician's ability and overestimating the model's ability, with unclear model relevance across a diversity of settings: different labs, practices, tumors, and anatomic regions.

It was not surprising that most of the studies in our review focused on basal cell carcinoma (BCC), as it is “low-hanging fruit” for slide-level diagnostic model development and where one would begin for proof of concept. In fact, several studies have demonstrated high AUC performance by AI models diagnosing BCC on whole slide images of frozen sections. However, BCC is one of the more readily identifiable tumor morphologies on frozen sections, with a recurrence rate of less than 1% if the tumor is treated with Mohs surgery. So, one must ask: What is the clinical translational value of a model that can diagnose BCC on frozen sections? We are likely better off as a specialty looking at segmentation or augmentation of the process, rather than autonomous diagnostics, to either improve our accuracy with other tumors or to develop models that address time-consuming, repetitive, and burdensome tasks.

An important unanswered question regarding AI and frozen sections is whether these efforts can improve patient outcomes, and how? Can we prevent missed diagnoses in difficult frozen section cases? Can we augment dermatology, dermatologic surgery, and beyond with AI frozen section models that do not add time and cost? Can we make progress in AI on frozen sections that addresses patient access issues or inequities in dermatology? Ultimately, we learned that developing models based on frozen section tissue can be done with decent performance, but the feasibility and translational value is unknown in some of the studies.

Challenges and Limitations

One of the consistent challenges for the studies focused on frozen section AI, which I have also personally encountered, is that it is hard to determine how large your training dataset needs to be. For many of these studies, the number of whole slide images that were selected was arbitrary. Dataset size varied somewhere between the hundreds to the thousands of whole slide images, or it was not reported at all. We also do not know what quality of whole slide images were included. Were these perfect textbook-type images where we would predict that a model will do quite well, or were randomly selected tissue samples scanned to create the whole slide images? I would argue that using the latter to evaluate the model’s performance would more likely be able to help us determine the value in clinical practice. Furthermore, data selection and utilization should consider the intended purpose of the model. For example, in real-life practice, we are going to see frozen sections with fragmentation that becomes “whole” only when reading across multiple sequential sections, or we make judgment calls when we see tissue that does not overtly show tumor but is heavily inflamed.

Another challenge, specifically for several studies in cutaneous pathology, was false-positive predictions due to the inflammatory cell aggregates. Essentially, the image might have been blurred or harder to interpret, which begs the question: How many difficult pathologic images were in the training set and then used to assess the model’s performance? Future studies in AI and dermatology research could be done to ensure not only the completeness and utility of the dataset, making sure that it reflects everything we could potentially encounter in real life, but also diversity of the dataset, making sure that we are looking at multiple types of skin, multiple anatomic sites, and then multiple types of tissue that might be presented to a pathologist. We may also learn from designing models the degree of annotation needed by an expert. Once a model is developed, sometimes you do not know the blind spots. One thing we could do better in AI research in general is understanding the blind spots of every model that is published. By understanding the blind spots, we can investigate whether there is bias built into the model, we can assess in what ways the model may harm patients, and we can learn how to reiterate for more feasible and safer deployment into practice. Ensuring that our published datasets are diverse so that people are not making conclusions based on bad data is imperative to avoid bias.

Practical Applications

There is a potential role for AI to assist with the overall flow of practice within dermatology. For example, an AI model could be used to recommend workflows based on potential risk, which would allow us to triage which cases should be read out first. And then perhaps we can have an augmented model that recommends a second look at certain suspicious high-risk regions to prevent potential misses and ensure high-quality clinical workflows. Such tools could augment clinical workflows without overly influencing clinical decision-making.

Specific to Mohs surgery, the role of diagnostic AI in frozen sections remains unclear. To have value, a model would need to assist with or perform a task that we find useful. I can see the value of a model that identifies poorly differentiated squamous cell carcinomas (SCCs) or high-risk features (perineural invasion) on hematoxylin-eosin staining without the use of special immunohistochemistry (IHC). The value of IHC is well known and there is increased utilization of IHC during Mohs surgery. However, IHC adds a significant amount of time and cost that smaller practices or those in resource-poor settings may not have access to. It would be a game changer if AI could replace the need for frozen section IHC as it would save time and cost, and could impact patient outcomes.

To add another layer to it, we can consider using frozen sections as part of multimodal data. In dermatology, our typical flow is that a patient comes in with a suspicious lesion, which is then biopsied. The biopsy is usually a small piece of tissue used to render a diagnosis, but it might miss key histopathologic features that are valuable in staging tumor (e.g., SCC). For example, we may not see perineural invasion because of the (appropriately) smaller size and more shallow depth of a tangential biopsy. We may not be able to ascertain whether the tumor is deeper than the subcutaneous fat. These details are used in the staging of SCCs and help us make critical management decisions. In the big picture, imagine if we could one day combine clinical information, digital images of permanent sections and Mohs frozen sections, and genomics to be able to predict specific outcomes for individual patients.

In the more immediate future, we can use AI on frozen sections to help us with our Mohs surgery flow—to provider preliminary assessments of tissue completeness (recuts), to provide preliminary assessments on pathology (get the patient ready for another stage), and to allow more efficient triage and scheduling of cases.

Collaboration Between AI and Clinicians

In our review, we learned that a physician plus AI performed better than AI alone or the physician alone on a limited task. This supports one of our other general conclusions, which is that there is a place for high-quality AI to augment clinical practice if we are aware of and control its potential risks. A model working in a vacuum and diagnosing on its own does not perform as well as a physician who can then use that model because a physician does not render diagnoses in a vacuum but rather has access to additional context. This is critical not only in dermatologic innovation, but far beyond our specialty. We need to think about AI as making us better at what we already do instead of this concept of AI replacing us.

The best role for AI is to not only make our jobs easier and make us more accurate, but to help us understand how to individualize care and ultimately get the best outcomes for our patients. We are human and we are going to miss things sometimes. AI can be helpful in ensuring that we do not miss anything. Beyond pathology, there is also a role for AI to help reduce the burden of repetitive, time-consuming, and costly processes that we currently have. For example, ambient listening can help us with documentation. Generative AI can help us create content. And there are advancements in AI-assisted microscopy that may potentially change when and how we perform biopsies at the bedside; we could potentially assess in situ vs invasive tumors at the bedside and make management decisions for low-risk lesions without a procedural biopsy. This may be a paradigm shift.

As an optimist, I like to think that AI is going to help us all. However, I also think AI can be dangerous when not used in a patient-centric way and when physician experts are not involved in the entire process: from conception to deployment. As long as we can prioritize patients, we will be okay. But for us to do that, we need dermatologists and dermatologic surgeons to have a seat at the table. We need dermatology leaders who are engaged, invested, and interested in AI so we can provide input early on before models are cleared/approved for commercial use by the US Food and Drug Administration.

Conclusion

Whether we want it or not, AI is here, and we do not need to be AI scientists to understand it. The best way to ensure that AI is working for us and not against us is to not only be engaged and have a seat at the table, but also to embrace the change and empower each other to drive the direction that AI takes our specialty. In this way, we can be a part of the conversation and ensure we are addressing the issues that are salient to us as dermatologists and our patients.


Disclosure: Dr Vidal has received a research grant from the Totz Career Development Award and is a grant recipient for generative AI.


 

Reference

1. Gorman BG. Lifson MA, Vidal NY. Artificial intelligence and frozen section histopathology: a systematic review. J Cutan Pathol. 2023;50(9):852-859. doi:10.1111/cup.14481