ByteShape has announced the release of two new language models (LLMs) focused on code generation: Devstral-Small-2-24B-Instruct-2512 and Qwen3-Coder-30B-A3B-Instruct.

Model Details

  • Devstral-Small-2-24B-Instruct-2512: Optimized for GPUs, especially the RTX 40 and 50 series. It requires more computational resources but offers superior performance when the context fits within the supported window.
  • Qwen3-Coder-30B-A3B-Instruct: Designed to run on a wide range of hardware, including resource-constrained devices like the Raspberry Pi 5 (with 16GB of RAM), where it achieves approximately 9 tokens per second (TPS) with 90% BF16 quality.

The choice between the two models depends on specific needs. Devstral is more performant but requires more powerful hardware, while Qwen3-Coder is more versatile and can be used even on less performant devices. ByteShape provides GGUF quantizations for both models, optimizing performance on different hardware.

For those evaluating on-premise deployments, there are trade-offs between performance and hardware requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these alternatives.