Step-3.5-Flash: a new efficient model

A new model, Step-3.5-Flash, stands out for its high performance in relation to the number of parameters used. According to available data, Step-3.5-Flash outperforms DeepSeek v3.2 in several benchmarks focused on coding and agent capabilities, despite having a significantly lower number of active parameters.

  • Step-3.5-Flash: 196B total parameters / 11B active parameters
  • DeepSeek v3.2: 671B total parameters / 37B active parameters

The Step-3.5-Flash model is available on Hugging Face, paving the way for new possibilities for applications that require computational efficiency and low latencies. Using models with a lower number of active parameters can lead to significant advantages in terms of costs and hardware requirements, especially in on-premise deployment scenarios.

For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.