Local LLM Execution in Browser
A demo has been released demonstrating the ability to run the GPT-OSS (20B) model entirely locally within a web browser. This implementation leverages the WebGPU API, offering an alternative to remote server execution.
Technical Details
The demo is based on Transformers.js v4 (in preview) and ONNX Runtime Web. The GPT-OSS (20B) model has been optimized and converted to the ONNX format to ensure adequate performance in the browser environment. Both the source code for the demo and the optimized ONNX model are available on Hugging Face.
For those evaluating on-premise deployments, there are trade-offs that AI-RADAR analyzes in detail in the /llm-onpremise section.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!