Detecting LLM Text in Korean: Intuition Is Not Enough

Distinguishing human-written Korean text from that generated by a language model (LLM) is a challenge, even for expert linguists. Often, there is an over-reliance on the formal correctness of the text, neglecting more subtle details.

Structured Training for Detection

A recent study explored whether detection skills can be learned and improved through structured training. LREAD, a rubric based on national Korean writing standards, was introduced, adapted to identify micro-level artifacts (punctuation, spacing, linguistic register).

Surprising Results

In a three-phase longitudinal protocol with Korean linguistics students, detection accuracy increased from 60% to 100%, with a sharp increase in agreement among evaluators (Fleiss' Kappa index from -0.09 to 0.82). Trained humans proved more effective than state-of-the-art LLM detectors, thanks to their ability to grasp language-specific micro-diagnostics. The results suggest that human evaluation, supported by rubrics, can complement automated detectors, especially in non-English settings.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these options.