The role of high-performance computing in scaling artificial intelligence-centric semiconductor architectures

Authors

Botlagunta Preethish Nandan
SAP Delivery Analytics, ASML, Wilton, CT, United States

Synopsis

Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. This section will first discuss the diversity and evolving demands of different AI workloads. Then it will discuss how chiplet bring better cost efficiency and shorter time to market. It will further discuss the challenges in establishing chiplet interface standards, packaging, and security issues. Finally, it will discuss the software programming challenges in chiplet systems.

Computing workloads are evolving fast, and new demand is also emerging to make AI computable in a more efficient fashion. Next-generation AI algorithms and training settings are driven to explore better performance while pushing the limits of the hardware platform. Additionally, novel AI applications are also proposed to solve scientific computing challenges in a more efficient way. All these drivers of new workloads intend to maximize the performance on a certain task but also lead to diverse architectures ranging from specialized to general-purpose design. Such exploding diversities largely overwhelm the effectiveness and efficiency of the accelerator, making traditional architectural paradigms less applicable.

AI accelerators are already predominantly used to boost up the throughput of training and inference. It is expected that AI accelerators would continue evolving and become even more ubiquitous in broader areas including data analysis, simulations, etc. to achieve higher efficiency. However, with the architecture being dedicated to accelerating the AI workloads specifically, the workload capability of such accelerators must also evolve as the target AI algorithms. AI training requires huge amounts of data to be frequently moving across chips and chiplets. This would lead to escalating energy and consumption costs. An emerging workload is using memory side computing by rebaking the weight on DRAM chip 3D stack to alleviate the data exchange overhead. However, new power delivery and thermal issues arise that need to be researched and resolved (Ali et al., 2024; Poduval et al., 2024).

Downloads

Published

7 May 2025

How to Cite

Nandan, B. P. . (2025). The role of high-performance computing in scaling artificial intelligence-centric semiconductor architectures. In Artificial Intelligence Chips and Data: Engineering the Semiconductor Revolution for the Next Technological Era (pp. 98-113). Deep Science Publishing. https://doi.org/10.70593/978-93-49910-47-8_7