ByteShape LLMs: Coder Models for Every Hardware, Including Raspberry Pi

ByteShape has announced the release of two new language models (LLMs) focused on code generation: Devstral-Small-2-24B-Instruct-2512 and Qwen3-Coder-30B-A3B-Instruct.

Model Details

Devstral-Small-2-24B-Instruct-2512: Optimized for GPUs, especially the RTX 40 and 50 series. It requires more computational resources but offers superior performance when the context fits within the supported window.
Qwen3-Coder-30B-A3B-Instruct: Designed to run on a wide range of hardware, including resource-constrained devices like the Raspberry Pi 5 (with 16GB of RAM), where it achieves approximately 9 tokens per second (TPS) with 90% BF16 quality.

The choice between the two models depends on specific needs. Devstral is more performant but requires more powerful hardware, while Qwen3-Coder is more versatile and can be used even on less performant devices. ByteShape provides GGUF quantizations for both models, optimizing performance on different hardware.

For those evaluating on-premise deployments, there are trade-offs between performance and hardware requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these alternatives.

ByteShape LLMs: Coder Models for Every Hardware, Including Raspberry Pi

Model Details

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Qwen3.5-35B-A3B: Optimized GGUF for 24GB GPUs

Qwen3.5-0.8B: LLM inference on legacy hardware without GPUs

Llama.cpp's "--fit" Speeds Up Qwen3-Coder-Next on RTX 3090

👥 Join 160+ AI explorers