A recent online thread raised an interesting question regarding the absence of implementations for NVFP8 and MXFP8 formats within popular frameworks like llama.cpp and VLLM, and more generally in the open-source community dealing with model quantization.
The context
NVFP8 and MXFP8 are 8-bit floating-point formats that promise greater accuracy than traditional FP8, especially when used with NVIDIA's new Blackwell architectures. The question posed is why there isn't more interest in developing and integrating these formats, considering the potential benefits in terms of performance and accuracy.
For those evaluating on-premise deployments, there are trade-offs between performance, accuracy, and hardware support that must be carefully considered. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!