A couple of blocks in the chaotic tech arena of Huaqiangbei are enough to sense how the parallel market is accelerating on GPUs destined for artificial intelligence. A direct report from Shenzhen's electronics district revealed that modified GeForce RTX 5090 cards with 96GB of VRAM can be ordered, with a one-week lead time and a total cost of around $8,200.
A modified card in Huaqiangbei
The seller explained the setup: the base RTX 5090 is sold for 36,000 yuan, while the VRAM replacement to bring it to 96GB costs an additional 20,000 yuan. The total hovers around 56,000 yuan, roughly $8,200. This is not an official NVIDIA product, but a hack that mates the consumer Blackwell GPU with a VRAM amount typically reserved for professional cards like the RTX 6000.
Rumors had already appeared via AliExpress listings, but this first-hand account confirms a real supply chain behind those ads, with stated delivery times of a week and the option for customers to send in their own 5090 for the upgrade.
Why 96GB of VRAM matters for on-premise inference
For those running LLM inference locally, VRAM is the primary bottleneck. Models with tens of billions of parameters, even after aggressive quantization, struggle to fit in the 24-32GB typical of top-tier consumer GPUs. With 96GB, the possibility opens up to serve larger models without resorting to the cloud, retaining data control and cutting latency.
The mod doesn't add CUDA cores or alter the memory bus bandwidth, so raw compute performance remains that of the 5090. Yet for self-hosted scenarios focused on small-batch throughput or experimentation, having generous memory headroom can be the difference between a viable deployment and one bogged down by constant CPU-GPU transfers.
The TCO calculation: $8,200 without a warranty
The most striking number from the report is the price: at $8,200, it dangerously approaches the cost of an RTX 6000 with official warranty, which the source pegged at roughly $11,000. The roughly $2,800 gap may not justify the risk of an unsupported card, with possible driver instabilities and untested longevity.
However, someone who already owns a 5090 could consider the operation for just 20,000 yuan (about $2,800). In that case, the TCO becomes more compelling: turning a consumer GPU into an asset capable of handling AI workloads that would normally require enterprise hardware, with a modest additional investment.
Beyond the anecdote: what the parallel market signals
The emergence of these modified cards isn't a hobbyist curiosity, but a sign of the hunger for video memory that the AI ecosystem is stoking. Demand for on-premise solutions pushes unofficial suppliers to fill the gap left by traditional vendors, who artificially segment consumer and professional lineups.
For those evaluating on-premise deployment, this story confirms that selection criteria go beyond benchmarks: operational risks, warranty, software compatibility, and lifecycle costs must all be weighed. AI-Radar offers analytical frameworks at /llm-onpremise to navigate such decisions, without losing sight of the fact that a hack, however clever, remains a trade-off between performance and reliability.
💬 Comments (0)
🔒 Log in or register to comment on articles.
No comments yet. Be the first to comment!