Using Memory Differently


Chip architects are beginning to rewrite the rules on how to choose, configure and use different types of memory, particularly for chips with AI and some advanced SoCs.

Chipmakers now have a number of options and tradeoffs to consider when choosing memories, based on factors such as the application and the characteristics of the memory workload, because different memory types work better than others for different workloads. Also important are placement of memories, the datapath to and from memory, and the volume and type of data that needs to be stored and processed.

“The industry is mass producing several types of memories that are designed to meet the key needs of different types of applications,” said Steven Woo, vice president, systems and solutions group, and distinguished inventor at Rambus Labs. “In some cases, it’s clear which memory is most appropriate for an application, while in others, there may be a choice between a couple different memories that might be suitable for a particular job. Characteristics of the application workload may help decide which is ultimately the right type of memory to choose. In other cases, factors like price, availability and implementation complexity may influence the choice of memory type.

In general, LPDDR is used for mobile and low-power applications, DDR for mainstream computing and many consumer applications, GDDR for graphics and high-performance computing applications like AI, machine learning, ADAS (advanced driver-assistance systems) and supercomputing. HBM, meanwhile, is for the highest-performance, most power-efficient processing.

Workloads typically are dependent on latency, bandwidth, capacity and power consumption, but there is an increasing amount of interplay between those in new chip designs. Any or all of those can complicate decisions about how to select, organize and configure memories to maximize performance or minimize power.

“Memories have a number of configuration options, some of which are programmable in configuration registers (for example, device latencies), while others are provided in separate devices altogether (such as x8 and x16 device widths),” Woo said. “Some workloads are highly sensitive to latency, and configuration registers should be programmed to minimize access time. Other workloads may be sensitive to bandwidth, and devices should be organized and configured to achieve the highest throughputs. The choice of which variant to select, and how to configure it, can be quite complicated in some applications. Choices depend on what is optimal for a particular application workload running on a particular processor.”

Memory also can be used in ways that are non-traditional, which complicates the choice even further.

“With a lot of the new algorithms and a lot of new strategies that are coming out for compute and analysis, memory is becoming almost an interesting utility,” said Mike Gianfagna, vice president of marketing at eSilicon. “It’s almost like it’s another circuit building block as opposed to just an off-chip or on-chip way of storing information. It’s a utility that’s linked very intimately with the algorithm.”

So while memory always been a key part of any design, dating all the way back to the introduction of the von Neumann architecture in the 1940s, there is much more focus on choosing, configuring and, in some cases, changing the functionality of those memories.

“For many years, one of the paramount ways that eSilicon had to differentiate itself was to customize memories for specific ASIC needs,” said Carlos Macián, eSilicon’s senior director of AI strategy and products. “In the recent past, the point was more along the lines of, ‘This particular ASIC could benefit from this particular configuration of a memory,’ and then optimizing it for a certain aspect ratio, a certain area or a certain power above and beyond what the off-the-shelf compilers could do because that had a tremendous impact on the final QoR of the product.”


Fig. 1: Basic von Neumann approach.

What’s changed
But with so much memory on the chip—in some cases it accounts for half the area of large SoCs—little changes can have a big impact. This is particularly true in AI applications, where small memories are often scattered around the chips with highly specific processing elements in order to process massive amounts of data more quickly. The challenge now is to minimize the movement of that data.

“It has been through the emergence of the AI market and AI architectures that the idea of near-memory computing or in-memory computing has found its revival,” Macián said. “Once the idea was back on the table, people started looking at ways to use that same concept for things like RISC-V processors. Obviously, all processors have memories, and it has long been very important to select the right memories to obtain a good implementation of the processor in any die. We now have processor developers asking for memory that would perform some operation on the addresses before actually retrieving the data from the memory proper in order to simplify the circuitry around the memory. We are talking here about doing huge amounts of computation in the memory or near the memory, but with key elements, key operations that simplify greatly the logic around it.”

What differentiates AI chips from other architectures is a relatively simple main computing element—the multiply accumulate (MAC) block. This block is parallelized and repeated thousands of times in an ASIC. Because of this replication, any small improvement made in the area or power, for example, has a huge overall effect in the chip, Macian explained.

To get the most out of these devices, AI chip architects also rely on microarchitectures. Changes are creeping into designs on every level, in some cases challenging standard design approaches, such as whether memories need to be small, fast and low-power.

“The microarchitecture that different companies are following is different, and those decisions drive the interaction with on-chip memory,” said Prasad Saggurti, director of product marketing at Synopsys. “Usually, fast memories are larger and of a higher power than the slower ones, but what people do in their memory hierarchy approach is make faster memories smaller and closer to the processor, and then shadow it with a larger, slower and lower-power memory, along with L2 cache, L3 cache, and so on. Now, different approaches are emerging due to AI and machine learning. There are certain operations that happen a lot and happen very frequently, so you can do things like in-memory or near-memory compute. And especially if there is something like embedded vision or some kind of face recognition, there are certain transforms that occur, so those kinds of memories can be aligned with certain types of operations. You don’t just do a read and bring it back to the CPU, do your operation, then do a write. You might do a read-modify-write in the memory or very close to the memory, so you don’t have to come back to the CPU for doing that.”

Equalization is another tradeoff, which has started to be used for DDR4 and higher data rates.

“An important consideration is what type of equalization is required to achieve open eyes for these newer types of memory interfaces,” said Ken Willis, a product engineering architect who focuses on signal integrity solutions at Cadence. “One of the key things is to virtually prototype the physical system topology so you can run tradeoffs and simulations. A new part of the tradeoffs is how to incorporate equalization into the signal integrity analysis. The physical and topological tradeoffs are about impedances, line lengths, via configurations, and now an additional dimension is equalization to optimize designs for all the new memory interfaces—essentially from DDR4 onwards. Everything that everyone is doing now has this additional variable of equalization. It’s one of the tradeoffs we have to make.”

Evolving role of the GPU
One of the engines that has been driving AI is the GPU. The GPU is inexpensive, readily available, and it can be used in highly parallelized operations such as processing training data. But GPUs aren’t particularly energy efficient, and there are dozens of startups and established chipmakers working on AI chips that are expected to boost performance by orders of magnitude over GPUs.

Still, GPUs are very familiar to design teams. “Comparatively, a lot of the data path is similar, so we end up seeing lots of single-port memory in GPUs,” said Saggurti. “We see a lot of two-port memory, a lot of FIFOs. We don’t see as many FIFOs in the AI CPU chips, but we see a lot of single-port SRAM. What that leads to is what was previously 10 or 20 megabits of SRAM on the server chips. Now we are seeing microarchitectures that need lots of memory because they don’t want to go off chip for anything. They want to keep all of it on-chip, so there may now be 1 or 2 gigabits of SRAM on a chip. The SRAM is an increasing percentage of area, but also an increasing percentage of power.”

This increasing focus on power requires a lot of analysis. “It’s not enough that we deliver an SRAM for a certain PPA,” Saggurti said. “It’s also important for us to do reliability analysis to understand whether the five sigma margining is sufficient, or whether it should go to six sigma. Should the memories be slowed down, given that a lot of internally self-timed SRAMs are statistically expected to behave a certain way? When there is a gigabit or two of memory, a one in a million chance is very likely to happen, so more and different analysis is required along with design specific analysis.”

How some of these new architectural approaches will fare is unknown at this point, but what was taken for granted in the past as the easiest path forward is being revised. In all cases, memory is a key variable.

“Until recently you had a fairly standard memory architecture you would use in an SoC,” said Neil Hand, director of marketing, Design Verification Technology at Mentor, a Siemens Business. “If you were building a cell phone, you had cache, you had primary memory, and there wasn’t a lot of creative things you could do. But when you start to look at novel compute architectures like AI, that standard architecture is just one of many,”

In the case of RISC-V, for example, Hand said design architects are building novel compute architectures because of the opportunity to play around with interesting memory configurations. “This isn’t new. On the DSP side of the world, people have been doing weird memory configurations working with various DSP IPs for a while now, and a lot of the advantages people get from these, whether it be audio processors or network processors, comes from the unique configurations of those processor memories.”

The challenge is to understand the system-level impact is of those memories.

“How do you decide? That’s where some of the capabilities around virtual prototyping and early architectural exploration start to become really important, because you can now experiment with the design very early on and say at a very coarse level what memory architecture is going to work,” Hand said. “Is it having lots of different scratchpad memories? Is it having, as on an AI architecture, memory that is modifying the address as you’re reading it? Is it doing math in the memory? There are some memory vendors looking at doing AI-specific memories where processing is built into the memory—a really cool idea if you know what you’re targeting.”

What is needed is a way to do architectural exploration to see the impact of various possible choices. This can include everything from virtual modeling to other simulation tools.

“I’ve added more processors and I’ve got this great memory, but now I’ve saturated my interconnect,” Hand said. “Once you start changing the traditional memory architecture, everything becomes interconnected again. If you think of the last wave of SoCs, which was driven by mobile, most of those SoCs looked and felt the same. They were running a broad range of general-purpose applications—an OS on top of the OS, running either games or various other applications. When you start going towards AI, autonomous, IoT, industrial, they become highly specialized compute, so driving a lot of interest now in different compute architectures is the fact that you can specialize them for compute. But it also applies exactly the same to memory architectures and also fabrics and NoCs, and all of these different parts of the SoCs. You can now configure them and specialize them for your range of applications.”

Conclusion
What’s becoming clear is that memory is no longer just a checklist item, and just throwing more memory at the problem isn’t always the best way to improve performance or lower power. The challenge is to understand how all of this works at the system level for increasingly specific applications, use cases and data types, and in those cases memory placement, type and how those memory circuits are used can have a significant impact.

Related Stories

Processing In Memory

How To Choose The Right Memory

Defining Edge Memory Requirements

Tech Talk: HBM vs. GDDR6