Over the past couple of process nodes the chip industry has come to grips with the fact that Moore’s Law is slowing down or ending for many market segments. What isn’t clear is what comes next, because even if chipmakers stay at older nodes they will face a series of new challenges that will drive up costs and increase design complexity.
Chip design has faced a number of hurdles just to get to this point, including a performance ceiling for processors and power limitations for all types of chips. There are issues involving RC delay, various types of noise, and increased challenges around photomasks and lithography. Through all of this, the drive to smaller nodes has paved the way to many of the answers. But as complexity and cost continue to skyrocket at each new node, the value of scaling no longer applies to the majority of designs, and the economic basis for scaling erodes for all but the largest-volume chips.
“In the past 35 years, the semiconductor industry and the EDA/IP industry have been spending most of their precious talents and resources readying and porting designs to new process nodes, generations after generations,” states Chi-Ping Hsu, executive director at Avatar Integrated Systems. “Moore’s Law definitely brought significant integration and scaling that were never imaginable before. But at the same time, it sucked all the talent to do mostly scaling/porting/integration type of engineering, instead of new innovations.”
While the death of Moore’s Law has been predicted for many years, it’s certainly not the end of the road. In fact, it may be the opposite. “The end of Moore’s Law could be the best thing that has happened in computing since the beginning of Moore’s Law,” said R. Stanley Williams, research scientist for HP Labs. “Confronting the end of an epoch should enable a new era of creativity.” (Computing in Science & Engineering. IEEE CS and AIP – March/April 2017).
Or as Russell Klein, HLS Platform program director for Mentor, a Siemens Business, puts it, “If the design is unable to take advantage of a smaller geometry silicon processes, then designers will need to find creative ways to address this gap.”
Constrained by cost
Cost is a key variable here, and there are two aspects to that – design cost and manufacturing cost. “The economics of the most advanced silicon geometries dictate that only the highest-volume devices will be migrated to them,” says Klein. “As a result, some designs will not benefit from the performance increase and power reduction that designers have come to rely upon over the years.”
On the manufacturing side, the rising cost is evident in both the manufacturing processes and the equipment needed to make the chips. “The announcement from GlobalFoundries that it is stopping investment in the 7nm process node is one of the key indicators that Moore’s Law is truly dead,” says Sergio Marchese, technical marketing manager for OneSpin Solutions. “It’s just getting too hard and too expensive.”
Design costs also have been escalating exponentially, as shown in Figure 1. While some of those costs are associated with additional complexity of rules at the smaller geometries, much of the cost increase is due to increased complexity.
Fig 1. Design costs at recent nodes. Source: Handel Jones, IBS
Constrained by verification
As design complexity increases, verification complexity rises by at least as much. “The approach of ‘better design’ suits a system that is deterministically designed,” says Gajinder Panesar, CTO for UltraSoC. “This means you can write the individual block, and nothing changes. But real systems are not deterministic.”
Panesar explains the dilemma. “At best you’ll get some improvements through better design, but this leads to even more complexity. You need better performance, and so you need to partition software and for that you need to understand how the system is performing, which means the need for insights: data and analysis on the performance. Increasingly problematic, but necessary, is when designs utilize cores from different processor architectures or custom blocks. This type of heterogeneous design means you’re not just worried about how each block or core performs on its own, you’re challenged with knowing how each one interacts with the other cores, and how it adds to the complexity of the entire system.”
One approach is to raise the level of abstraction. “RTL development is running out of steam,” says Mentor’s Klein. “The algorithmic complexity of these new systems will strain the traditional development methodology. Verification at the RTL level is nearing a breaking point. Developing accelerators in hardware is a task well suited to high-level synthesis (HLS). Exploring the architectural alternatives needed to achieve the optimal result, while impossible in RTL, can be achieved through HLS. Further, by moving verification to a higher level of abstraction, and therefore significantly reducing the effort, HLS addresses one of the critical problems of the hardware design community.”
Another approach is to utilize an FPGA resource that can defer some verification tasks but add others. “FPGA synthesis tools make complex design transformations to optimize device utilization and achieve performance and power targets,” says OneSpin’s Marchese. “That’s great, but the resulting netlist must be verified carefully.”
Constrained by performance
Perhaps the biggest problem is the performance constraint. “At more mature technologies there may have been some observable benefits when migrating between technology nodes,” says Tom Wong, director of marketing, design IP at Cadence. “But as you go down to 28nm and below, SoC performance is dictated more by interconnects (metal system) than transistor performance. Mainstream CPUs for PCs/laptops have hovered between 2GHz and 3GHz because Moore’s Law scaling no longer can give you performance gains in terms of clock speed. This is when CPU designs went from single-core to dual-core to quad-core. Also, running devices at a high clock rate can get you into trouble with heat (thermal issues) and high packaging and cooling costs. Unless you are designing for servers in the datacenter, low power is the most important spec. Even for chips used in modern-day datacenters, they are not performance at all costs.”
Spreading the load across multiple processors has not been an easy transition. “The benefits of parallel software execution have been known for years, but the software developers still shun changing their comfortable paradigm,” warns Klein. “Tilera produced a multi-core processor that was shown to be three times more power efficient than Intel’s x86 platform. It was unable to find commercial success, despite claims that power consumption was one of the biggest problems faced by system developers.”
Adding processors isn’t free. “In order for the CPU, or collection of CPUs, to function as desired, it is imperative to address the serious bottleneck issue that arises from the lack of memory bandwidth,” adds Kalpesh Sanghvi, SoC and systems solutions manager for Open-Silicon. “Memories like HBM play a critical role in bridging this gap.”
On-chip memory performance also is being increased. “Smartphones today use LPDDR4 at 3200 speed,” says Cadence’s Wong. “Today, there is a migration to LPDDR4/4X at 4266 to improve performance in the memory subsystem.”
But it is not just memory bandwidth that becomes a constraint. “More complex systems means you’re putting more onto a chip,” says UltraSoc’s Panesar. “The problem here is less about compute, and more about bottlenecks on interconnect. You need to understand how I/O is transporting data across an SoC. You need to understand how computation is affected, which again means you need insights or you don’t get the performance improvements you expect and are designing for.”
Both types of interfaces are being improved. “The CPU speed and performance growth pushes the memory requirements to scale further,” explains Sanghvi. “This is the reason companies have started working on 3D memories like HBM, which allows for the packing of more memory vertically, and addresses the requirement for higher bandwidth at lower power. In addition, the requirement for I/O interface speeds and overall bandwidth has increased drastically. These added requirements have pushed the envelope further, resulting in new complex technology solutions like 56G PAM-4 and 112G PAM-4.”
Wong notes that other interfaces are being pushed, as well. “We have seen the MIPI interface move from v1.1 (1.5Gbps per lane) to MIPI D-PHY v1.2 (2.5Gbps per lane), again to increase system throughput (performance). Similarly, PCIe2 (5GT/s) interfaces have given way to PCIe3 (8GT/s) in SSD controllers and flash storage interfaces.”
One trend that is firmly underway is to add additional application-specific processors. “We are now seeing specialized low-power DSP cores being used to address audio, video and baseband applications,” says Wong. “Additionally, we are seeing the emergence of neural net processors to aid in applications such as facial recognition, object detection and a slew of requirements needed to address the computational needs of autonomous vehicle chips.”
That approach can be extended even further. “Another alternative is to move functions from software into hardware, as accelerators,” says Klein. “Hardware has much higher performance and lower power as compared to software and is more efficient in its usage of silicon area.”
There is a middle ground emerging, as well. “One of the trends is to use more application-specific logic,” says Marchese. “These on-chip and off-chip accelerators can be implemented through FPGAs, embedded FPGAs, or custom-processor architectures.
These levels of heterogeneity create additional challenges, however. “Heterogenous computing systems are destabilizing behaviors even further and introducing even more uncertainty in the system,” warns Panesar. “There’s an increasing need for on-the-ground visibility across the design and the system.”
Others point to similar issues. “With heterogenous architectures, you can no longer rely on a simple bus architecture,” adds Wong. “Take a look at all modern SoCs and see what they have in common—fabric. They all have a NoC (network on chip) to connect these specialized cores and manage the traffic in these complicated systems.”
Constrained by area
More cores and memory, more complex I/O and increased embedded FPGA resources all add up to a higher transistor count. For chipmakers that choose to stay at a node, that translates into more area. On top of that, safety and security require additional area. “Designs for many applications are complex systems of hardware and software with discrete and heterogenous components,” says Panesar. “They must work as the designer intended and be fully compliant with safety standards such as ISO 26262. Plus, you must ensure they are robust in the face of current and future security threats, as in the SAE J3061 cybersecurity standard.”
Area may be a submarine problem. “The design process will suddenly reach the single-die process limit,” warns Avatar’s Hsu. “It will come much much faster than most people anticipated.”
Defining that limit isn’t so simple, though, because there are both soft and hard aspects. Soft limits include design issues such as clock skew across the chip, and they include manufacturing issues such as yield. Yield is one of the reasons why die sizes have been limited. It also was a factor in the original determination of Moore’s Law.
But there is a hard issue that the industry faces, as well. Reticle size has remained a constant. Current lithography systems utilize a 4X reduction. The standard size for a reticle substrate is 6 inches. That means a maximum chip size of about 30mm on a side. The closer to the edge of the reticle, the greater the optical aberrations and non-uniformity. ASML states a maximum scanned exposure field size of 26.0mm by 33.0mm. This does not change with the introduction of EUV.
“We may be seeing the end of single die systems, but the scaling will continue through multiple wafer-level packaging techniques,” says Hsu. “The industry is at its infancy with specialized home-grown techniques and no standards in process flow or design flow, with dies coming from multiple foundries. This will trigger an exciting semiconductor paradigm shift that will impact and change the semiconductor ecosystem and combine naturally with the system applications at large. Semiconductor and system industries will become closely intertwined. Companies that do not catch this paradigm shift will fade, and those that do will shine and prosper.”
The path forward has challenges, regardless of which process node or approach is used. “We already ate all the low hanging fruit,” says Wong.
Continuing to follow Moore’s Law will result in increased design manufacturing costs. Staying put at existing nodes will result in added design and manufacturing costs, as well, with an increase in area. Each path has both soft and hard limits. In fact, the best path forward may prove to be a combination of approaches in which different technologies are packaged together using a More Than Moore approach.
Big Changes For Mainstream Chip Architectures AI-enabled systems are being designed to process more data locally as device scaling benefits decline.
Mentor’s CEO looks at the impact of AI and machine learning, what’s after Moore’s Law, and the surge in EDA and semiconductors.