A user shared their experience with the Nemo 30B language model, highlighting its ability to handle large context windows on consumer hardware.

Performance and Hardware

The test was performed on a single RTX 3090 graphics card, paired with 32 GB of RAM. The user reported a processing speed of 35 tokens per second, considered adequate for summarizing long texts such as books or scientific articles. The use of CPU offloading is indicated for expert users.

Comparison with other models

Nemo 30B was compared to the Seed OSS 36B model, highlighting a higher speed of approximately 20 tokens per second. This makes Nemo 30B an interesting solution for those looking to run large language models locally with large context windows.