StepFun 3.5 Flash vs MiniMax 2.1: comparison on Ryzen

Comparison between StepFun 3.5 Flash and MiniMax 2.1

A user shared their experience comparing two large language models (LLM): MiniMax 2.1 Q3_K_XL and StepFun 3.5 Flash IQ4_XS. The goal was to evaluate the performance of both models in a daily use context, with a focus on speed and intelligence.

Performance and resource usage

MiniMax 2.1 proved to be a fast and responsive model, suitable for everyday use. StepFun 3.5 Flash, while showing a high reasoning ability, showed significantly longer processing times, especially for tasks such as generating commit messages from small differences in the code. The user specified that they used a modified version of llama.cpp to enable tool calling support with StepFun 3.5 Flash.

Hardware specifications and VRAM

The tests were performed on an AMD Ryzen platform with Vulkan. StepFun 3.5 Flash, with a 64k context window, required approximately 107GB of VRAM. The reported performance metrics indicate a prompt evaluation time of 4098.41 ms (7.28 ms per token, 137.37 tokens per second) and an overall evaluation time of 188029.67 ms (54.34 ms per token, 18.40 tokens per second).

For those evaluating on-premise deployments, there are trade-offs to consider between a model's reasoning ability and hardware resource requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these trade-offs.

StepFun 3.5 Flash vs MiniMax 2.1: comparison on Ryzen

Comparison between StepFun 3.5 Flash and MiniMax 2.1

Performance and resource usage

Hardware specifications and VRAM

💻 Need GPU Cloud Infrastructure?

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

Step-3.5-Flash: un modello LLM compatto ma potente

Step-3.5-Flash: performance superiore con meno parametri

StepFun: in arrivo Step-3.5-Flash-Base e novità per il capodanno cinese