A user shared their experience using Qwen3-code-next, a large language model (LLM) focused on code generation, on a Mac Studio Ultra equipped with 128GB of RAM.

Setup and Initial Tests

The test was performed locally, leveraging the resources of the Mac Studio Ultra. The initial results were positive, with the model capable of performing basic tasks such as reading and writing files, web browsing, and checking the system time.

Real Development Challenge

The main challenge was to rewrite KittenTTS-IOS for Windows, a medium-difficulty project involving the use of ONYX and Swift libraries such as Misaki for English phonetics. The goal was to create a simple CLI with the KittenTTS model, avoiding complex phonetic manipulations.

Problems Encountered

Despite a promising start, several issues arose as the project's complexity increased. In particular, the model showed difficulties in managing larger contexts, leading to frequent timeouts and the need for manual restarts. The user also noted that the model wasted tokens trying to figure out how to save files, filling the context with unnecessary work. Memory management and prompt processing became a bottleneck, significantly slowing down the process.

Optimization Attempts

The user attempted to improve performance by increasing the timeout and quantizing the KV_cache to 8 bits in LM studio, but with uncertain results. Despite the difficulties, the model managed to produce an audio file with voice, although meaningless due to the lack of an adequate phonetic dictionary.

Final Evaluation

The user gave the model a score of 5/10, emphasizing that although the model is able to function with considerable patience, it is not comparable to the performance offered by larger models, even paid ones. The slowness in processing prompts, especially with large contexts, represents a significant limitation.