Fine-tuning LLMs for the Medical Sector in Low-Resource Languages

Clinical documentation is essential for patient safety and continuity of care. The administrative burden of EHR systems contributes to physician burnout, a problem exacerbated in low-resource languages like Finnish.

A recent study explored the effectiveness of fine-tuning a large language model (LLM), specifically LLaMA 3.1-8B, for medical transcription in Finnish. The model was trained on a validated corpus of simulated clinical conversations, created by students at Metropolia University of Applied Sciences.

Methodology and Results

The fine-tuning process was performed with controlled pre-processing and optimization. Effectiveness was evaluated via cross-validation. The results indicate a low n-gram overlap (BLEU = 0.1214, ROUGE-L = 0.4982) but a high semantic similarity (BERTScore F1 = 0.8230) with the reference transcripts.

This suggests that fine-tuning can be an effective approach for the transcription of medical discourse in Finnish and supports the creation of domain-specific, privacy-oriented LLMs for the medical field. Further research is needed to explore this approach further.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks at /llm-onpremise to evaluate these trade-offs.