GhazalBench: A New Benchmark for LLMs and Persian Poetry
A new study introduces GhazalBench, a benchmark designed to evaluate how large language models (LLMs) interact with Persian ghazals. Persian poetry, particularly ghazals, plays a significant cultural role in Iran, with verses by poets such as Hafez frequently quoted and paraphrased.
GhazalBench assesses two complementary abilities: producing faithful prose paraphrases of couplets and accessing canonical verses under varying semantic and formal cues. Tests revealed a dissociation in the models: while understanding the poetic meaning, they struggle with exact verse recall in completion-based settings. Recognition tasks reduce this gap.
A comparison with English sonnets showed significantly higher recall performance, suggesting that the observed limitations are related to exposure during training rather than inherent architectural constraints. GhazalBench is available on GitHub for further analysis and development.
For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!