When AI demand meets supply chain limits

ASE Technology Holding COO Tien Wu stated that the current wave of AI demand is pushing advanced packaging capacity far beyond forecasts, with saturation expected to last at least through 2030. ASE, one of the world's largest semiconductor packaging and test providers, is a critical link in the chain that turns silicon wafers into AI accelerators ready for the data center.

This is not just about volume. Packaging techniques such as CoWoS (Chip-on-Wafer-on-Substrate), essential for pairing logic dies with high-bandwidth HBM memory, require complex manufacturing processes and precision that only a handful of global players can deliver. Wu's statement confirms that despite massive investment, demand will continue to outstrip supply for years.

The ripple effect on on-premise LLM hardware

For those evaluating or managing self-hosted infrastructure for large language models (LLMs), this is not an industry footnote but a concrete planning factor. The availability of GPUs like the NVIDIA H100 or B200, and of future generations, depends not only on foundries producing the dies but also on the ability to package them into functional modules. Every delay in packaging translates into longer lead times and higher costs for organizations ordering servers, on-premise clusters, and bare metal solutions.

When calculating the Total Cost of Ownership (TCO) of a local deployment, hardware price swings play a decisive role. If procurement windows stretch out, organizations risk having to revise their technical roadmaps or place orders far in advance, tying up capital. Data sovereignty and infrastructure control – key strengths of the on-premise model – collide with a supply chain made rigid by AI demand.

Reading the supply chain to anticipate deployment choices

ASE's message adds a piece to the strategic evaluation between cloud and on-premise. On one hand, cloud providers can negotiate framework contracts and reserve stock, smoothing out price variability; on the other, those who need to keep data and models within their corporate perimeter – perhaps due to GDPR constraints or air-gapped applications – cannot simply wait for the storm to pass.

In this setting, supply chain analysis becomes as important as inference benchmarks. Knowing that packaging capacity will remain tight encourages modular approaches: adopting hardware that can be brought online gradually, evaluating multi-vendor solutions, considering alternative chips (such as FPGA or ASIC-based accelerators) that might face less production pressure. For teams fine-tuning LLMs on local nodes, procurement planning is no longer an annual exercise but a continuous watch.

Beyond the bottleneck: what changes after 2030

Wu's remarks are not a doom prophecy but a market signal: the semiconductor industry is reorganizing to serve an AI workload growing at unprecedented speed. Investments in new packaging fabs, announced by ASE and competitors, will start to balance the scales only toward the end of the decade. Until then, those designing on-premise deployments of ever larger models – 70 billion parameters and beyond – will have to reckon with precious hardware that is not immediately available.

The lesson for local AI practitioners is that time-to-hardware must never be underestimated in business cases. Compute capability can be extraordinary, but if the chips don't arrive, enterprise-scale inference remains a theoretical exercise. Monitoring the announcements of major packaging providers becomes, paradoxically, an activity to pair with quantization tests and serving framework selection.