Taking the Garbage Out: Addressing AI Bias in Quality Measurement
Abstract
Use of artificial intelligence (AI) in health care quality measurement can enhance the precision and efficiency of performance assessment. However, the use of AI in measurement also raises concerns about biases that could perpetuate disparities and affect vulnerable individuals. There have been recent national discussions that emphasize the need for ethical, transparent, and equitable AI applications in health care quality measurement, such as the US Centers for Medicare & Medicaid Services’ (CMS) information session titled “AI in Quality Measurement” and the Biden-Harris Administration’s Executive Order on AI. Addressing bias is crucial to ensuring that AI tools do not exacerbate existing inequalities, but instead contribute to fair quality assessment and high-quality outcomes. This article explores the use of AI in health care quality measurement and tactics to mitigate bias in AI-enhanced measurement.
Introduction: A Catalyst for Thought
Recent literature has shed light on the expanding role of artificial intelligence (AI) in health care, particularly for its use in quality measurement. Studies demonstrate AI’s potential to significantly enhance health care delivery by improving clinical operations, bolstering quality and safety, and refining the mechanisms of quality reporting.1,2 Projections suggest that AI adoption could save the health care industry between $200 and $360 billion annually,3 thereby underscoring its potential to yield substantial economic and operational benefits.
However, alongside these optimistic projections, a critical discourse has emerged regarding the potential pitfalls of AI integration in health care, particularly concerning bias. For example, research has documented instances in which AI applications in health care exacerbate, rather than mitigate, disparities,4 indicating a need for nuanced exploration of AI’s limitations, especially biases that may skew AI-derived insights.
Recent policy initiatives aim to address the challenges of AI in health care. The Biden-Harris Administration’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, for example, promotes a framework to help ensure AI is ethically applied in public policy, including in health care. This Executive Order highlights the importance of developing AI tools that are not only effective, but also equitable and transparent in their function.5 In addition, the US Centers for Medicare & Medicaid Services (CMS) recently convened an information session titled “AI in Quality Measurement.”6 The CMS session contributed to our ongoing understanding of AI in quality measurement by underscoring the increasing significance of leveraging advanced computational technologies to enhance the accuracy and efficiency of measurement, reflecting a broader push within the health care community toward innovation-driven solutions.
The timing is right for stakeholders to critically assess AI’s potential to transform health care quality measurement. This assessment must rigorously address the dual objectives of harnessing AI’s transformative potential for quality measurement while vigilantly curbing its capacity to reinforce existing health care biases. To that end, this article focuses on exploring the use of AI in health care quality measurement and identifying tactics to mitigate bias in AI-driven quality assessment.
The Promises and Perils of AI In Quality Measurement: A Deep Dive on Biases
AI integration in health care represents a transformative shift, including in quality measurement, which focuses on evaluating the effectiveness, efficiency, and equity of health care services. Quality measures help quantify health care processes, outcomes, patient perceptions, organizational structures, and systems that are associated with delivering high-quality health care.7 AI tools in quality measurement encompass a range of technologies and applications designed to enhance precision, efficiency, and effectiveness of quality measurement processes. Examples of these tools include AI-enabled data collection systems that automate and refine the gathering of clinical information,8,9 AI measurement and analytical tools that assess and interpret quality data,8 and AI-driven analytics platforms that can integrate and analyze large datasets to identify trends and actionable insights.10 AI in the context of quality measurement is tasked with not only gathering and analyzing data, but also ensuring that the insights derived are devoid of bias that could skew outcomes. Such an endeavor necessitates a deep understanding of both the potential and the limitations of AI technologies in accurately capturing and interpreting complex health care data.
AI, particularly large language models (LLMs), is increasingly pivotal in enhancing the accuracy, reliability, and efficiency of health care quality measurement.10 LLMs are trained on large datasets, enabling them to comprehend and generate natural language, along with other forms of content.11 By leveraging LLM capabilities, AI can automate the extraction of complex clinical concepts from patient visit notes, ensuring faster coding with greater precision, while significantly reducing administrative burden.8,9 This advancement enables clinicians, hospitals, and health plans to more efficiently report data within value-based programs, optimizing both patient care and resource use.
Furthermore, AI has the potential to transform quality measurement by providing novel insights and revealing connections between different metrics, potentially creating new avenues for improving clinical outcomes and facilitating a more equitable health care system. For instance, AI can leverage patterns and correlations that may not be immediately obvious to human analysts,2,10,12 which may highlight areas where quality measures might unjustifiably penalize or favor certain groups. Additionally, AI can detect potential harms and unintended consequences of existing quality measures, offering insights to refine and enhance measures.13 This ensures that measures are valid, reliable, and can be applied fairly for a diverse population. Consequently, AI not only supports existing quality measurement frameworks, but also sets the stage for developing increasingly sophisticated metrics that more accurately reflect the nuances of patient care and health system performance.
Although AI offers significant benefits by enhancing data fidelity and enabling real-time analytics, it also poses challenges, such as data privacy concerns,9 the risk of algorithmic bias,14 and the need for substantial training on datasets that accurately represent diverse populations.15 Literature highlights that without appropriate training on representative datasets, AI systems might replicate existing disparities in health care provision or introduce new biases, thereby compromising the integrity of quality assessments.15-17 Most AI tools used for quality measurement are trained on clinical data that reflect the patterns of the majority population, which may overlook the unique needs and perspectives of minority groups.8 Additionally, the financial burden associated with accessing these large datasets and preparing them for use in AI tools is substantial. Organizations must invest in costly data cleaning and preprocessing to ensure datasets are suitable for AI models.18,19 Strategies to reduce these costs, such as collaborative data sharing initiatives and funding support for data infrastructure development, are important considerations.
The complex, opaque nature of AI algorithms can impede their acceptance, necessitating clear communication and rigorous validation protocols to build confidence in their use. Therefore, a focused analysis of AI’s application in quality measurement is imperative to navigate these challenges effectively and to harness AI’s full potential in improving health care quality.
Navigating Complexities: Strategies for Bias Mitigation in AI-Driven Quality Measurement
Successful deployment of AI to support quality measurement requires a comprehensive, multi-stakeholder strategy dedicated to understanding, identifying, and mitigating bias. Below are tactics to address bias in AI-driven quality measurement.
Mitigating Bias in AI Systems Through Comprehensive Training Guidelines
To fully harness the transformative potential of AI in health care quality measurement, it is crucial to implement comprehensive guidelines for training AI systems that rigorously address potential biases. The process of developing these guidelines should be deliberate and emphasize the collection and utilization of diverse datasets that represent the broad spectrum of real-world patient demographics, medical conditions, and socioeconomic backgrounds. The guidelines should be peer reviewed, published, and continuously updated to reflect advancements in AI technology and emerging insights from ongoing research. Such data diversity is essential for mitigating biases and enhancing the effectiveness of AI in stratifying and evaluating quality measures across various patient groups.
The guidelines should be crafted through engagement with a multi-stakeholder group that includes AI developers, health care providers, medical societies, health plan representatives, patients and patient advocates, quality measurement experts, ethicists, and policymakers. Together, these stakeholders can ensure that AI models are not only technically proficient, but also ethically sound and socially responsible. For instance, AI and quality measure developers must work closely with health care professionals to understand the clinical implications of biases and to integrate clinical insights into AI training processes. Meanwhile, ethicists and patient advocates can ensure that AI systems uphold patient dignity and do not inadvertently perpetuate health care disparities.
Moreover, policies mandating adherence to comprehensive training guidelines for AI models are necessary. These policies could require regular bias auditing and re-training of AI models to ensure that AI systems evolve in response to new data and insights. Additionally, policies promoting transparency in AI operations, including required disclosure of datasets and algorithms used as well as actions taken to mitigate bias, are critical. There is a growing recognition of the need for policy frameworks to manage AI in health care. For example, over 50 lawmakers in the House and Senate have recently advocated for CMS to increase oversight of AI use in Medicare Advantage coverage decisions.20 By integrating robust training protocols through policy efforts, AI systems can provide more accurate, fair, and inclusive assessments of health care quality, thereby supporting enhanced patient outcomes and contributing to the integrity and reliability of quality measurement.
Educating Stakeholders on AI’s Role in Quality Measurement
Key stakeholders, including policymakers, measure developers, and health care providers, must be well informed of the nuances of AI applications as its use expands within health care quality measurement. Continuous education and training are vital to help these stakeholders understand the potential biases in AI tools, methodologies used in developing AI models, and critical strategies needed to effectively use AI-generated data.
For policymakers, specialized workshops and briefing sessions could provide guidance in their efforts to govern the use of AI in health care, ensuring these technologies serve the public interest without reinforcing existing disparities. These educational efforts should focus on how AI technologies and tools should be designed and monitored to prevent bias and ensure equitable health outcomes. Similarly, quality measure developers and health care providers require ongoing training to stay current with the latest advancements and ethical considerations in AI. Training may include professional development courses, certifications, and seminars that delve into the application of AI in clinical settings, emphasizing ethical considerations, bias mitigation, and nuanced interpretation of AI-generated insights. For instance, measure developers should be trained on integrating AI tools that proactively identify and correct biases in health outcomes data, thereby crafting measures that genuinely reflect equitable health care practices. Health care providers, on the other hand, must understand how to responsibly apply these quality measures in a clinical context to ensure that AI-generated insights enhance patient care across diverse populations.
Implementing Systematic Monitoring and Evaluation of AI Models in Health Care Quality Measurement
The successful integration of AI in health care quality measurement hinges not only on the initial design and training of AI models, but also on the ongoing monitoring and evaluation of their performance post-deployment. To ensure these models function as intended without exacerbating existing disparities or introducing new biases, a structured framework for continuous oversight is essential. This strategy involves regular assessments of AI-driven measurement, systematic detection of any biases, and implementation of corrective measures when necessary.
Key stakeholders in this process include AI and measure developers, health care administrators, clinicians, and regulatory bodies. Notable regulatory bodies include the US Food and Drug Administration (FDA), which oversees the safety and effectiveness of AI applications in health care,21 CMS, and the Agency for Healthcare Research and Quality (AHRQ). Additionally, AI associations, such as the Association for the Advancement of Artificial Intelligence (AAAI), and quality organizations, such as the National Committee for Quality Assurance (NCQA), are instrumental in establishing standards and guidelines for AI in the health care sector. AI and measure developers are responsible for incorporating feedback mechanisms into AI and measurement systems that can constantly gather and analyze performance data. Health care administrators and clinician leaders must ensure AI systems are seamlessly integrated into clinical workflows and promote a culture in which measure users are encouraged to report discrepancies and anomalies in AI-assisted outcomes. Furthermore, regulatory bodies must establish and enforce standards for AI performance, including adherence to ethical guidelines for AI in health care and mandatory reporting of AI evaluation results.
The monitoring and evaluation process can be operationalized through several practical steps. First, AI tools should be deployed with established baseline metrics to regularly compare against AI performance. This involves historical data analysis and using automated, real-time analytics to help detect potential biases early and trigger alerts for human oversight, thereby providing opportunities for course correction. Second, periodic reevaluation can determine whether AI is performing as expected across various patient demographics and conditions. Third, it is crucial to implement a feedback loop in which insights from monitoring and evaluation are used to refine AI models.
Reflection and Conclusion: The Path Ahead
The transformative potential of AI in enhancing quality measurement is undeniable. However, integrating AI into health care systems is not without significant challenges. Developing AI use standards, careful implementation, robust training efforts, ongoing evaluation, and continuous dialogue among key stakeholders can help ensure that AI tools do not inadvertently lead to unintended consequences or perpetuate biases. Research and experience at the intersection of AI and health care quality measurement are ongoing and evolving. The path ahead will require a balanced approach that embraces AI’s possibilities while vigilantly addressing its ethical, technical, and practical challenges. By committing to unbiased measurement and continuous improvement, the health care industry can harness the power of AI to transform quality measurement in ways that are both innovative and just, promoting better health outcomes for all patients.
Author Information
Affiliations:
1Real Chemistry, New York, NY
Correspondence:
Palak Patel, MHA
199 Water St 12th floor
New York, NY 10038
Phone: (1) 240-646-2622
Email: ppatel@realchemistry.com
Disclosures:
M.D., P.P., and T.V. are employed as consultants by Real Chemistry, an independent provider of analytics-driven, digital-first research, marketing services, and communications to the health care sector.
References
1. Sahni NR, Carrus B. Artificial intelligence in U.S. health care delivery. N Engl J Med. 2023;389(4):348-358. doi:10.1056/nejmra2204673
2. Gala D, Behl H, Shah M, Makaryus AN. The role of artificial intelligence in improving patient outcomes and future of health care delivery in cardiology: a narrative review of the literature. Healthcare. 2024;12(4):481. doi:10.3390/health care12040481
3. Sahni N, Stein G, Zemmel R, Cutler D. The potential impact of artificial intelligence on health care spending. National Bureau of Economic Research. 2023;NBER Working Paper No. w30857. doi:10.3386/w30857
4. Abràmoff MD, Tarver ME, Loyo-Berrios N, et al. Foundational Principles of Ophthalmic Imaging and Algorithmic Interpretation Working Group of the Collaborative Community for Opthalmic Imaging Foundation. Considerations for addressing bias in artificial intelligence for health equity. NPJ Digit Med. 2023;6(1):170. doi:10.1038/ s41746-023-00913-9
5. President of the United States. Executive Order No. 14110. Safe, Secure, and Trust-worthy Development and Use of Artificial Intelligence. Federal Register. 2023;88(210). https://www.govinfo.gov/content/pkg/FR-2023-11-01/pdf/2023 24283.pdf
6. US Centers for Medicare & Medicaid Services. Artificial intelligence (AI) in quality measurement [Internet]. Published March 19, 2024. Accessed April 17, 2024. https://www.youtube.com/watch?v=oc966JKB-Gw
7. US Centers for Medicare & Medicaid Services. Quality measures. CMS.gov. Accessed June 21, 2024. https://www.cms.gov/medicare/quality/measures
8. Lareau D. AI’s impact on patient care, clinical coding and quality metrics. Forbes. March 29, 2024. Accessed June 21, 2024. https://www.forbes.com/sites/ forbestechcouncil/2024/03/29/ais-impact-on-patient-care-clinical-coding-and-quality-metrics/
9. Moura L, Jones DT, Sheikh IS, et al. Implications of large language models for quality and efficiency of neurologic care. Neurology. 2024;102(11). doi:10.1212/wnl.0000000000209497
10. Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):689. doi:10.1186/s12909-023-04698-z
11. What are large language models (LLMs)? IBM. Accessed July 29, 2024. https://www.ibm.com/topics/large-language-models
12. Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. Published online July 2, 2024. doi:10.4274/dir.2024.242854
13. Yelne S, Chaudhary M, Dod K, Sayyad A, Sharma R. Harnessing the power of AI: a comprehensive review of its impact and challenges in nursing science and healthcare. Cureus. 2023;15(11):e49252. doi:10.7759/cureus.49252
14. Bekbolatova M, Mayer J, Ong CW, Toma M. Transformative potential of AI in health care: definitions, applications, and navigating the ethical landscape and public perspectives. Healthcare. 2024;12(2):125. doi:10.3390/healthcare12020125
15. Arora A, Alderman JE, Palmer J, et al. The value of standards for health datasets in artificial intelligence-based applications. Nature Medicine. 2023;29(11):2929-2938. doi:10.1038/s41591-023-02608-w
16. Nazer LH, Zatarah R, Waldrip S, et al. Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health. 2023;2(6):e0000278. doi:10.1371/journal.pdig.0000278
17. Hirani R, Noruzi K, Khuram H, et al. Artificial intelligence and healthcare: a journey through history, present innovations, and future possibilities. Life. 2024;14(5):557. doi:10.3390/life14050557
18. Aldoseri A, Al-Khalifa KN, Hamouda AM. Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges. Appl Sci. 2023;13(12):7082. doi:10.3390/app13127082
19. Jain M. How much does it cost to build a generative AI? ScaleupAlly. Published July 2, 2024. Accessed September 13, 2024. https://scaleupally.io/blog/cost-to-build-generative-ai
20. Suter T. Bipartisan lawmakers call for increased AI oversight in Medicare advantage coverage decisions. The Hill. Published June 25, 2024. Accessed July 2, 2024. https://thehill.com/policy/healthcare/4739152-bipartisan-lawmakers-increased-ai-oversight-medicare-advantage-coverage/
21. How FDA regulates artificial intelligence in medical products. Pew. Published August 5, 2021. Accessed July 29, 2024. https://www.pewtrusts.org/en/research-and-analysis/issue-briefs/2021/08/how-fda-regulates-artificial-intelligence-in-medical-products