Kioxia: SSDs are the Answer for Agentic AI Amid HBM Costs and DRAM Limits

Kioxia Proposes SSDs for Agentic AI: A Solution to Memory Challenges

During Computex, Kioxia brought a crucial issue for the future of artificial intelligence, particularly for "agentic AI" systems, into focus: memory management. The company emphasized how current High Bandwidth Memory (HBM) solutions are often prohibitively expensive, while traditional DRAM presents inherent scalability limitations for the growing demands of Large Language Models (LLM) and more complex AI architectures.

In this scenario, Kioxia put forward a bold proposal: Solid State Drives (SSDs) could represent an effective answer. This perspective is particularly relevant for organizations aiming to implement AI solutions in self-hosted or on-premise environments, where Total Cost of Ownership (TCO) and hardware resource efficiency are decisive factors. Agentic AI, understood as systems capable of autonomously planning and executing complex tasks, requires rapid access to large volumes of data and models, posing significant challenges to memory infrastructure.

Technical Challenges of Memory for AI

High Bandwidth Memory (HBM) has become a de facto standard for high-end GPUs dedicated to LLM training and inference, thanks to its exceptional bandwidth. However, its high cost and limited capacity per chip represent a significant obstacle, especially when dealing with models with billions of parameters or extended context windows, which require tens or hundreds of gigabytes of VRAM. HBM integration is complex and expensive, directly influencing the final price of accelerator cards.

On the other hand, DRAM offers greater capacity per dollar compared to HBM and is easier to implement at scale. However, its bandwidth is significantly lower than HBM, which can create bottlenecks in model weight loading and unloading operations or data access for inference. For the most intensive AI workloads, DRAM latency and throughput may not be sufficient, limiting the system's overall performance and the horizontal scalability of solutions.

SSDs as a Strategic Alternative for On-Premise Deployment

Kioxia's proposal to use SSDs as a key component for agentic AI fits into a context of seeking more economical and scalable solutions. SSDs, while having lower latency and throughput compared to HBM and DRAM, offer significantly higher storage capacity per unit of cost. This makes them ideal candidates for scenarios where large datasets need to be managed, portions of models need to be loaded, or memory offloading techniques need to be implemented, where model weights are moved between VRAM and slower but larger capacity storage.

For companies considering LLM deployment in on-premise environments, integrating SSDs can result in a more favorable TCO. It allows for extending the effective memory capacity available for models without having to invest in a disproportionate number of expensive HBM-equipped GPUs. This strategy can be particularly useful for workloads that do not require maximum speed access to all data simultaneously, or for managing multiple models concurrently on a single infrastructure. Hardware optimization thus becomes a key factor in maintaining data sovereignty and control over the infrastructure.

Future Prospects and Considerations for AI Infrastructure

Kioxia's vision highlights a fundamental trade-off in AI infrastructure design: balancing performance, cost, and capacity. There is no single solution for all workloads, and the choice of memory hierarchy will depend on the specific needs of each project. For on-premise deployments, where managing operational and capital costs is critical, a hybrid approach integrating SSDs, DRAM, and HBM can offer the necessary flexibility.

This discussion underscores the importance for CTOs, infrastructure architects, and DevOps leads to carefully evaluate available hardware options. Adopting SSDs for agentic AI can be a component of a broader strategy to build resilient, scalable, and cost-effective AI infrastructures, especially in contexts where data sovereignty and regulatory compliance demand self-hosted solutions. AI-RADAR continues to monitor these evolutions, providing analysis on frameworks and architectures that support such deployment decisions.