Bias in LLM-Based Triage Systems

A recent study has revealed the presence of latent biases in large language models (LLMs) used for triage in emergency departments. The research, published on arXiv, focuses on analyzing how these artificial intelligence systems may discriminate against patients based on racial, social, and economic factors.

Analysis Using Proxy Variables

The researchers used 32 patient-level proxy variables, represented by positive and negative qualifiers, to evaluate the effects of bias. Both public (MIMIC-IV-ED Demo, MIMIC-IV Demo) and restricted-access datasets (MIMIC-IV-ED and MIMIC-IV) were used. The results showed discriminatory behavior mediated by proxy variables in triage scenarios.

Tendency to Modify Perceived Severity

The study also highlighted a systematic tendency for LLMs to modify the perceived severity of the patient when specific tokens appear in the input context, regardless of whether they are presented positively or negatively. This suggests that artificial intelligence systems are still trained on noisy and non-causal signals, which do not accurately reflect the patient's true condition.

Need for Responsible Implementation

The study's deliveries emphasize the importance of ensuring the safe and responsible deployment of artificial intelligence technologies in clinical settings. Work is needed to improve the accuracy and fairness of these systems, preventing them from perpetuating existing discriminations.