Navigating AI Readiness: A Framework for Health Data Set Quality in Clinical AI and ML
According to a recent study published in JAMA Network Open, researchers address a critical issue in developing and applying artificial intelligence (AI) in health care systems: more data quality frameworks for creating AI-ready datasets.
"The lack of data quality frameworks to guide the development of AI-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care," said researchers.
The qualitative study involved semi structured interviews with 20 data set experts, all health data set creators, and 18 ML researchers. The interviews, conducted in English through a secure video conferencing platform between August 23, 2022, and January 5, 2023, aimed to discern the factors contributing to datasets' AI readiness. Researchers invited 93 experts to participate, with 20 who accepted. The team used 16 databases across various sectors of health care, including purposive sampling, which is "the intentional selection of information-rich individuals" to ensure diverse results.
Researchers used a thematic analysis to determine patterns and themes based on the data received.
Three Themes Identified
- Intrinsic elements of data set AI readiness;
- Drivers of AI-ready data; and
- Contextual elements of data set AI readiness
Contextual elements of data set AI readiness presented 2 subthemes, including fitness and societal impact. At the same time, drivers of AI-ready data sets presented several subthemes, including data availability, data quality standards, documentation, team science, and incentivization. Participants detailed what's needed to create a successful and usable AI model. For example, one participant stated the need for "extremely clear labels," meaning, "here is exactly how this decision was made, here are all the images that are neatly linked to this EHR data, and all of the EHR data is in these very clean, joinable tables, and there are no missing values."
Participants stressed the creation of high-quality health data sets and mitigating risks associated with data reuse in ML research for systems to be considered AI-ready.
According to the study and participant responses, AI readiness requires a holistic appraisal of multiple elements, balancing transparency, and ethical reflection against pragmatic constraints. The findings suggest that achieving more reliable, relevant, and ethical AI and ML applications for patient care will necessitate strategic updates to data set creation practices.
"The AI readiness of health data sets is a key factor in clinical AI and ML innovation. This qualitative study developed a grounded framework for AI data set quality. Our work suggests that the concept of data set AI readiness is complex and requires the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints," said researchers.
Reference
Ng MY, Youssef A, Miner AS, et al. Perceptions of data set experts on important characteristics of health data sets ready for machine learning: A qualitative study. JAMA Netw Open. 2023;6(12):e2345892. doi:10.1001/jamanetworkopen.2023.45892