Artificial Intelligence in Dermatology: Cure or Curse?
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of The Dermatologist or HMP Global, their employees, and affiliates.
Artificial Intelligence (AI) was a hot topic of 2023. From the skyrocketing valuation of Nvidia Corporation to the drama surrounding the ouster and rapid reinstatement of OpenAI CEO Sam Altman, AI dominated news cycle after news cycle. And for a good reason: It represents the group of quasi-magical tools that will liberate humanity from menial, time-consuming tasks; revolutionize human health; create unlimited wealth; and rocket us into space and beyond — or will it? How will it change patient care, enhance our understanding of skin, and ultimately affect the practice of dermatology? In my view, the direct application of AI and related technologies will not revolutionize the day-to-day practice of dermatology or its subspecialties in the near future. However, it will undoubtedly improve productivity, accelerate research, streamline administrative burden, and reduce costs.
Human or Machine?
AI is a broad term describing technology designed to mimic human reasoning. It is not a new concept. In 1950, British mathematician Alan Turing proposed that machines, even if they cannot think, may be engineered to appear like they are thinking. Hence, a Turing Test refers to the assessment of whether an output was generated by a human or a machine. In his seminal paper on the topic, Turing described 3 players: a computer, a human, and a judge. The judge inputs queries into the computer and both the computer and the human offer output. If the judge cannot tell the difference between the human and the computer, then the computer has demonstrated intelligence.1
I started working with health care technology in the fall of 2019 when I was a first-year dermatology resident. Our goal was to integrate the American Academy of Dermatology’s (AAD) Qualified Clinical Data Registry, DataDerm, for the first time into Epic electronic health records (EHRs). Part of my job was to design strategies to extract data from patient charts in a reliable and replicable fashion, a task referred to as mapping. After 2 years of experimentation, mapping, and close collaboration with AAD’s DataDerm team, we had achieved a functioning and accurate integration. I realized the tremendous potential value of the aggregated data that would be available in DataDerm. If we could analyze these large datasets in an experimentally valid manner, we could answer complex clinical questions that had previously eluded our grasp, but it would take thousands of hours of sifting through de-identified and scrambled data to accomplish. So, I started to learn about tools that could make such a task manageable.
In today’s parlance, machine learning (ML) encompasses many related models, each representing a unique mathematical approach to one simple objective: organizing data. The easiest way to understand these different models is by thinking about use cases. Why were these algorithms developed? Each model was geared toward different data input scenarios resulting in the evolution and blending of strategies. These tactics form the foundations of AI’s original philosophical purpose: to allow a computer to replicate output that is indistinguishable from a human’s. ML models include supervised learning (SVL), unsupervised learning, and deep learning (Figure). Each strategy approaches organizing data in a different way, comprising crucial tools for managing big data.
A Deep Dive Into Big Data
Health care generates more data than any other industry. By 2025, the total volume of health care data worldwide is estimated to exceed 10 000 exabytes.2 In more familiar terms, that is 10 billion gigabytes. Big data refers to a volume of information that cannot be managed using traditional analytics tools.3 Some of that data is immediately usable, translatable, and quantifiable. This class of data is referred to as structured data because it is stored in a specified format as soon as it is generated. Hence, it has a predetermined structure and values are restricted. This property stems from the schema-on-write nature of the data, which requires a programmer to define the possible range and relationship of values or characters in the data set.
Examples of structured data in an EHR include lab values, demographics, vital signs, and billing codes. Structured data is translatable from one system to another due to its relative simplicity, and data standards have been developed over the past 20 years to facilitate interoperability and electronic exchange of certain structured data. In short, if the data can easily be input into a spreadsheet, it is probably structured. But structured data is a small slice of the health care data pie, representing only about 20% of the total.4 The other 80% or more of health care data is unstructured and represents the core of big data in health care. It also represents 99% of the headaches associated with data management, interoperability, and research.
Unstructured data is everything that does not meet the criteria of structured data: clinical note text, a pathology report, photographs, imaging, and more. This is where ML approaches can be especially useful. A human sifting through all that data and attempting to organize it would take thousands of hours. Instead, why not train a supervised ML model to do the sifting for you? SVL models consist of a training set of manually annotated data that matches an input to the desired output. This approach contrasts with unsupervised machine learning, where unlabeled data is grouped by patterns. To train an SVL model to identify whether an image is a cat or a dog, a training set of images (the control) will be manually annotated as cat or dog. The experimental data would be unlabeled, and the job of the SVL model is to classify whether the images in the experimental arm are cats or dogs. In practice, a selection of the output is reviewed for accuracy and re-annotated manually if any incorrect associations are noted. Thus, the accuracy of the model is improved with each run of the model until the desired sensitivity and specificity are achieved.
An archetypal SVL model was applied to dermatology at Stanford in 2017 to classify skin lesions as either benign or malignant. The study used a specialized supervised ML model known as a convolutional neural network (CNN).5 Neural networks consist of a series of nodes that mimic the interconnectedness of neurons in the brain to analyze output layer by layer. Convolutional refers to the filters that are present in the specialized layers. Each convolutional layer is trained to detect specific instances, patterns, or values, such as an edge of a shape or a color. The output of a convolutional layer is a matrix of values corresponding to an abstract map of properties. The deeper the data permeates into the model layer by layer, the more complete the abstract picture becomes.
At some point, criteria are met to determine a probability that an image contains a defined output, such as a pigmented lesion or scaling of a hyperkeratotic actinic keratosis (AK). The point of this architecture is to allow the model to retrain itself. This type of algorithm is designed to detect patterns, which makes it especially useful for image recognition. An image is broken into pixels and the value of each pixel is evaluated. In the above study, a training set of 129 450 clinical images of 2032 different diseases was used. The study concluded that the CNN performed comparably or better than the 21 gold-standard dermatologists.5
Although the study represented a significant feat of technology, the real-world application of image-recognition technology remains a question. Google is currently developing DermAssist in response to the “almost 10 billion Google Searches related to skin, nail, and hair issues” processed each year.6 In 2021, Google published an article in JAMA Network Open.7 Three key observations from this study should be highlighted: 1) dermatologist agreement with the reference diagnosis after use of the AI assistant was unchanged, 2) the top prediction of the AI was accurate in 63% of cases, and 3) about 8% of cases were excluded from analysis due to equivocal biopsy results.7 DermAssist was granted a Conformite Europeenne marking as a Class I medical device in the European Union.
I could see these products being helpful adjuncts for generating a differential when used by dermatologists who could evaluate the output with a critical eye. Unfortunately, the market for dermatologists who want a second AI opinion on the differential is probably too small to draw interest or provide at a reasonable cost compared to the market for AI dermatologists. An interesting retrospective study in the future will assess how clinicians respond to output from these models. Does the 2% chance that the lesion is malignant prompt referral? Or does it reinforce misdiagnosis of a dermatofibrosarcoma protuberans resulting in serious morbidity a few years later? Only time will tell.
Large Language Models
One promising ML tool that could radically shift our practice paradigms for the better is natural language processing (NLP), which translates text or speech data into probabilistic models. In the future, incoming patient messages may be flagged for more rapid follow up if a negative sentiment is detected.8 A number of platforms, such as Microsoft’s DAX Copilot, are just now coming to market that convert voice input from patient encounters into generated clinical notes, a game-changing function that would allow dermatologists and our staff to focus on patients and not EHRs. Historically, use of NLP in health care was limited by the need to exhaustively annotate training set data via a supervised ML approach. This paradigm shifted radically with the introduction of large language models (LLMs), a new branch of NLP.
LLMs are a subset of NLP using self-supervised ML models. They form the foundation of generative AI products, such as OpenAI’s generative pre-trained transformer ChatGPT and Google Bard.9 Despite claims to the contrary, these technologies are not ready to directly answer patient medical questions. For example, I asked ChatGPT 4 to describe the difference in management of melanoma-in-situ and severely dysplastic nevus. The platform informed me with great confidence that the key difference was that a severely dysplastic nevus is managed by “regular monitoring and surveillance.” A recent study using ChatGPT to simulate discussions about AK confirmed that my experience reflects appropriate expectations for performance at this time. The chatbot answered less than one-third of questions about AK appropriately. As the authors note, the purpose of an LLM is to simulate human-like responses, not serve as a compendium of accurate, evidence-based recommendations.10
Even if medical chatbots are not ready for direct patient interaction, the opportunities for NLP to reduce our administrative burden and allow us to refocus on patients are promising. The most ethical, economic, and efficacious applications of these technologies avoid interfering with the patient-physician interaction and instead facilitate and streamline communication and administrative support. Any application that crosses the line into providing or implying a clinical recommendation is a medical device, according to the US Food and Drug Administration (FDA).11
Regulators are paying close attention to AI’s growing role. More medical device guidance has been published in the last 2 years addressing software development than in the last 20 years combined. The FDA regulates AI/ML software applications in health care under the umbrella of software as a medical device. To ensure patient safety, more stringent requirements for the approval and deployment of AI applications, especially those that continue to learn on new datasets while in the clinical setting, are being applied. Two key regulatory checkpoints apply for these software applications: 1) verification and 2) validation. In verification, the precision of the ML platform is interrogated. Does the software reliably perform the same way each time as engineered? In validation, the accuracy of the application is evaluated. Do the algorithm outputs reflect reality when compared to a gold standard in a use case? Does it work? As these technologies evolve, clinical applications of AI/ML are inescapable. These regulatory safeguards will provide some level of protection to patients. However, drift is inevitable in any dynamic system. Various biases are sure to be introduced and difficult to anticipate.
Opportunities to Improve Patient Care
How can we best leverage ML to improve patients’ lives right now? Administrative tasks should be a prime target. At Northwestern Medicine, I am directing a grant project under the Mansueto Institute focused on expediting prior authorizations for specialty drugs using ML approaches. With insurance companies implementing AI approaches to evaluate prior authorizations, an ML-powered arms race is imminent. Several startups have surfaced over the past year pursuing both sides of this issue.
Clinical trials matching is another natural target. Recruitment time for industry-sponsored phase III trials has increased over the past decade or so by almost 50%.12 The complexity of patients has grown proportionally to our knowledge of diseases and arsenal of therapeutic options. ML strategies, including NLP, are being leveraged to identify eligible patients in the right locations. ML research tools taming masses of unstructured data will reveal new insights into diseases, associations, and risk factors. And probably even uncover new maladies.
AI also offers substantial opportunity for health care learners and continuing education. AI-driven simulations can replicate complex medical scenarios, allowing students to practice decision-making in a risk-free environment. Virtual patient encounters, powered by ML algorithms, can enhance clinical reasoning skills and offer a dynamic experience. AI can assist in crafting personalized learning paths, tailoring educational content to students’ needs and reinforcing areas that require additional focus. By leveraging AI/ML to build new educational programs, dermatologists and the larger network of health care providers can enhance their proficiency in delivering high-quality patient care.
Conclusion
AI/ML is democratizing access to data, resources, and insights at an exponential rate. As with any technological progress, there are many pitfalls we will need to navigate. My hope is that this wave of innovation will unlock a new era of research, knowledge, and focus. Burnout will be mitigated as administrate tasks are minimized and patient-dermatologist relationships will flourish with renewed focus in the exam room and access to novel therapies, disease insights, and educational tools. You will not be replaced by AI, let me reassure you. The future has never been brighter for dermatology and innovation in our field.
Dr Pearlman is a micrographic surgery and dermatologic oncology fellow in the department of dermatology at Northwestern Feinberg School of Medicine in Chicago, IL.
Disclosure: The author is the founder and project leader of Rx.AI, which has been funded by the Northwestern Medicine Mansueto Innovation Institute. He has received grants related to DataDerm research from the American Academy of Dermatology (AAD). He is a member of the AAD DataDerm Oversight Committee. He holds stock and serves as the CEO of Suneco Technologies, Inc., which is a member company of JLabs@ NYC, a Johnson & Johnson Innovation hub. He has served on advisory boards for Castle Biosciences. He has no commercial interest in any product mentioned in the manuscript.
References
1. Turing AM. Computing machinery and intelligence. Mind. 1950;49:433-460.
2. Boehncke K, Duparc G, Sparey J, Valente A. Tapping into new potential: realising the value of data in the healthcare sector. L.E.K. December 4, 2023. Accessed February 5, 2024. https://www.lek.com/insights/hea/eu/ei/tapping-new-potential-realising-value-data-healthcare-sector
3. Batko K, Ślęzak A. The use of big data analytics in healthcare. J Big Data. 2022;9(1):3. doi:10.1186/s40537-021-00553-4
4. Kong HJ. Managing unstructured big data in healthcare system. Healthc Inform Res. 2019;25(1):1-2. doi:10.4258/hir.2019.25.1.1
5. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/ nature21056
6. Bui P and Liu Y. Using AI to help find answers to common skin conditions. The Keyword. Google. May 18, 2021. Accessed February 5, 2024. https://blog.google/ technology/health/ai-dermatology-preview-io-2021
7. Jain A, Way D, Gupta V, et al. Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw Open. 2021;4(4):e217249. doi:10.1001/jamanetworkopen.2021.7249
8. Baxter SL, Saseendrakumar BR, Cheung M, et al. Association of electronic health record inbasket message characteristics with physician burnout. JAMA Netw Open. 2022;5(11):e2244363. doi:10.1001/jamanetworkopen.2022.44363
9. Naveed, H, Khan AU, Qiu S, et al. A comprehensive overview of large language models. Cornell University. December 27, 2023. Accessed February 5, 2024. https://arxiv.org/abs/2307.06435
10. Lent HC, Ortner VK, Karmisholt KE, et al. A chat about actinic keratosis: examining capabilities and user experience of ChatGPT as a digital health technology in dermato-oncology. JEADV Clin Pract. 2023;1-8. doi:10.1002/jvc2.263
11. Webster P. Medical AI chatbots: are they safe to talk to patients? Nat Med. 2023;29(11):2677-2679. doi:10.1038/s41591-023-02535-w
12. Brøgger-Mikkelsen M, Zibert JR, Andersen AD, et al. Changes in key recruitment performance metrics from 2008–2019 in industry-sponsored phase III clinical trials registered at ClinicalTrials.gov. PLoS One. 2022;17(7):e0271819. doi:10.1371/journal.pone.0271819