Taalas, a startup specializing in inference hardware, has made available a demo chatbot and an API, both powered by an ASIC chip developed internally.
High-Speed Inference
The platform achieves an inference speed of 16,000 tokens per second using the Llama 3.1 8B model. The choice of a small model was intentional, to validate the concept of accelerated inference via dedicated hardware. Taalas is now focusing its efforts on more complex models.
Free Access
Despite the development of more advanced solutions, Taalas offers free access to its demo, allowing users to directly experience the capabilities of its chip. A demo chatbot and an API for inference are available.
For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to support these evaluations.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!