QASM-Eval: Training LLMs for Hardware-Oriented Quantum Programming

The Challenge of Quantum Programming in the NISQ Era

The field of quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where performance is highly constrained by noise. To address these limitations, hardware-facing capabilities beyond simple gate-sequence circuit specification are often required. These include mid-circuit measurement and classical feedback for quantum error correction (QEC), precise timing control for dynamical decoupling (DD), and pulse-level waveform access for calibration.

OpenQASM-3 was introduced to expose precisely these capabilities, providing a hardware-level programming interface. However, despite the rapid progress of Large Language Models (LLMs) in code generation, there was no dataset specifically designed to train and evaluate LLMs on OpenQASM-3 programs that involve its advanced hardware-oriented features. It is in this context that QASM-Eval emerges as the first comprehensive dataset designed to fill this gap.

QASM-Eval: A Dataset for Quantum Hardware Control

QASM-Eval distinguishes itself by explicitly targeting OpenQASM-3's hardware-facing features, rather than focusing on quantum algorithm design or reasoning. The dataset comprises an expert-verified test set of 100 tasks and a training set of 4,000 tasks. These systematically cover classical logic, timing scheduling, pulse control, and complex real-world workflows, providing a robust foundation for LLM training.

To ensure the validity of generated programs, the development team implemented an extended verifier that checks syntax, quantum states, and program timelines. Initial evaluations revealed that state-of-the-art LLMs struggle heavily with OpenQASM-3 coding tasks. However, targeted fine-tuning based on QASM-Eval yielded significant gains in their performance, demonstrating the dataset's effectiveness as a training tool.

Context and Implications for LLM Deployments

The emergence of specialized datasets like QASM-Eval underscores a crucial trend in the LLM landscape: the necessity of fine-tuning for niche applications and highly technical domains. While general-purpose LLMs excel at broad tasks, their application to specific sectors like quantum programming requires deep customization. This approach is particularly relevant for organizations evaluating on-premise or hybrid deployments for their AI workloads.

The ability to train and customize LLMs with proprietary or domain-specific data, such as that offered by QASM-Eval, is a key factor for data sovereignty and infrastructure control. For CTOs, DevOps leads, and infrastructure architects, the capability to perform fine-tuning in controlled, air-gapped environments can be critical for addressing compliance and security requirements. This type of specialization enables the development of highly reliable and performant AI assistants for complex tasks, reducing reliance on external cloud services for sensitive data or critical processes.

Future Prospects and AI-RADAR's Role

QASM-Eval provides a crucial benchmark and training foundation to accelerate the development of reliable LLM assistants for hardware-facing quantum programming in the NISQ era. Its availability as an Open Source resource (data and code are on GitHub) fosters research and innovation in a rapidly evolving field. This type of progress is essential for unlocking the full potential of quantum computing, making it more accessible and controllable through advanced programming interfaces.

For companies exploring the opportunities offered by LLMs and quantum computing, understanding the trade-offs between self-hosted and cloud solutions is vital. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate Total Cost of Ownership (TCO), data sovereignty, and concrete hardware specifications, providing the necessary tools to make informed decisions about AI and LLM deployments, even in highly specialized domains like quantum computing.