Comparison between StepFun 3.5 Flash and MiniMax 2.1
A user shared their experience comparing two large language models (LLM): MiniMax 2.1 Q3_K_XL and StepFun 3.5 Flash IQ4_XS. The goal was to evaluate the performance of both models in a daily use context, with a focus on speed and intelligence.
Performance and resource usage
MiniMax 2.1 proved to be a fast and responsive model, suitable for everyday use. StepFun 3.5 Flash, while showing a high reasoning ability, showed significantly longer processing times, especially for tasks such as generating commit messages from small differences in the code. The user specified that they used a modified version of llama.cpp to enable tool calling support with StepFun 3.5 Flash.
Hardware specifications and VRAM
The tests were performed on an AMD Ryzen platform with Vulkan. StepFun 3.5 Flash, with a 64k context window, required approximately 107GB of VRAM. The reported performance metrics indicate a prompt evaluation time of 4098.41 ms (7.28 ms per token, 137.37 tokens per second) and an overall evaluation time of 188029.67 ms (54.34 ms per token, 18.40 tokens per second).
For those evaluating on-premise deployments, there are trade-offs to consider between a model's reasoning ability and hardware resource requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these trade-offs.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!