Apple walks Ars through the iPad Pro’s A12X system on a chip
By Samuel Axon
11 - 14 minutes
BROOKLYN—Apple's new iPad Pro sports several new features of note, including the most dramatic aesthetic redesign in years, Face ID, new Pencil features, and the very welcome move to USB-C. But the star of the show is the new A12X system on a chip (SoC).
Apple made some big claims about the A12X during its presentation announcing the product: that it has twice the graphics performance of the A10X; that it has 90 percent faster multi-core performance than its predecessor; that it matches the GPU power of the Xbox One S game console with no fan and at a fraction of the size; that it has 1,000 times faster graphics performance than the original iPad released eight years ago; that it's faster than 92 percent of all portable PCs.
If you've read our iPad Pro review, you know most of those claims hold up. Apple’s latest iOS devices aren’t perfect, but even the platform’s biggest detractors recognize that the company is leading the market when it comes to mobile CPU and GPU performance—not by a little, but by a lot. It's all done on custom silicon designed within Apple—a different approach than that taken by any mainstream Android or Windows device.
But not every consumer—even the "professional" target consumer of the iPad Pro—really groks the fact this gap is so big. How is this possible? What does this architecture actually look like? Why is Apple doing this, and how did it get here?
After the hardware announcements last week, Ars sat down with Anand Shimpi from Hardware Technologies at Apple and Apple’s Senior VP of Marketing Phil Schiller to ask. We wanted to hear exactly what Apple is trying to accomplish by making its own chips and how the A12X is architected. It turns out that the iPad Pro's striking, console-level graphics performance and many of the other headlining features in new Apple devices (like FaceID and various augmented-reality applications) may not be possible any other way.
A top-level view of the A12X
The A12X is, of course, closely related to the A12 from the iPhone XS, XS Max, and XR. The latter was the first silicon made in a 7nm process available in a consumer device, and this is the first for a tablet.
The A12X is made up of many components. We'd love to dive deep into exactly how this architecture works, but Apple is generally not forthcoming with details like that. Anandtech recently ran a detailed analysis of an A12 die shot among other things, but we don't have anything like that for the A12X yet. Still, we know the big picture. To that end, components of the A12X include:
A CPU (central processing unit), which carries out most instructions that are not specialized to other processing units.
A GPU (graphics processing unit), which handles graphics, from display of the home screen to effects in 3D games to assets for augmented reality applications
The Neural Engine, which handles neural network and machine learning-related tasks
An IMC (integrated memory controller), which efficiently manages data going in and out of memory.
An ISP (image signal processor), which analyzes images created when you take a photo and processes them, improves them, and more.
The Secure Enclave (also SEP, or secure enclave processor), which handles sensitive data like biometric identifiers in such a way that it is difficult for someone undesirable to access it.
There are several other components, like the display engine, a storage controller, an HEVC decoder and encoder, and more, that we won't get into much detail about here.
Chief among this list are the CPU, GPU, and the Neural Engine, so we'll focus a bit more on those.
The iPad Pro's CPU has eight cores—four focused on performance, and four focused on efficiency. And unlike some earlier Apple chips, all cores can be active at once. This is the first device in this product line that uses this many cores simultaneously.
“We've got our own custom-designed performance controller that lets you use all eight at the same time,” Shimpi told Ars. “And so when you're running these heavily-threaded workloads, things that you might find in pro workflows and pro applications, that's where you see the up to 90 percent improvement over A10X.”
For single-core performance, Apple's marketing materials claim that the A12X is 35 percent faster than the A10X. We've come a long way from the 412MHz single-core CPU manufactured by Samsung to Apple's specifications for the original iPhone in 2007.
We tested the A12X for our iPad Pro review, so let's look at those benchmarks to verify those claims. First, here are some basic specifications on every device included in the tests.
12.9-inch 2018 iPad Pro
10.5-inch 2017 iPad Pro
12.9-inch 2016 iPad Pro
Samsung Galaxy Tab S4
Qualcomm Snapdragon 835
Google Pixel 3 XL
Qualcomm Snapdragon 845
Desktops and laptops
2018 15-inch MacBook Pro with Touch Bar
Intel Core i9-8950K at 2.9GHz (4.8GHz Turbo)
AMD Radeon Pro 560X 4GB GDDR5
2017 15-inch MacBook Pro with Touch Bar
Intel Core i7-7820HQ at 2.9GHz (3.8GHz Turbo)
AMD Radeon Pro 555 2GB GDDR5
2016 15-inch MacBook Pro with Touch Bar
Intel Core i7-6820HQ at 2.7GHz (3.6GHz Turbo)
AMD Radeon Pro 455 2GB GDDR5
2017 iMac Pro
Intel Xeon W at 3GHz (4.5GHz Turbo)
AMD Radeon Pro Vega 64 16GB HMB2
2017 iMac (5K)
Intel Core i7-7700K at 4.2GHz (4.5GHz Turbo)
AMD Radeon Pro 580 8GB GDDR5
2018 Dell XPS 15 2-in-1
Intel Core i7-8705G at 3.1GHz (4.1GHz Turbo)
AMD Radeon RX Vega M GL 4GB HMB2
And now for the results.
We didn't quite record the claimed 35 percent improvement in single-core performance (though this is just one benchmark), but it's fairly close. The multi-core claim also checks out.
This performance is unprecedented in anything like this form factor. In addition to the ability to engage all the cores simultaneously, there's reason to believe that cache sizes in the A12, and likely therefore the A12X, are a substantial factor driving this performance.
You could also make the case that the A12X's performance in general is partly so strong because Apple's architecture is a master class in optimized heterogeneous computing—that is, smartly using well-architected, specialized types of processors for matching specialized tasks. Though the A12X is of course related to ARM's big.LITTLE architecture, Apple has done a lot of work here to get results that others haven't.
Unfortunately, Apple wouldn't discuss any of that in too much detail with us. Whatever the specifics, this chart does a particularly good job of illustrating why this is remarkable:
The iPad Pro outperforms every MacBook Pro we tested except for the most recent, most powerful 15-inch MacBook Pro with an 8th generation Intel Core i9 CPU. Generally, these laptops cost three times as much as the iPad Pro.
"You typically only see this kind of performance in bigger machines—bigger machines with fans," Shimpi claimed. "You can deliver it in this 5.9 millimeter thin iPad Pro because we've built such a good, such a very efficient architecture."
The GPU in the A12X has seven cores—that's one additional core over the A10X, likely made possible by the move to the 7nm process. But as always, the number of cores is not all there is to it.
Shimpi offered a pitch for the GPU. "It's our first 7-core implementation of our own custom-designed GPU," he said. "Each one of these cores is both faster and more efficient than what we had in the A10X and the result is, that's how you get to the 2x improved graphics performance. It's unheard of in this form factor, this is really an Xbox One S class GPU. And again, it's in a completely fanless design."
Here's what this looks like in reality—or at least, the simulation of reality that is benchmarks.
Generally, this GPU has a huge lead in the mobile space, but it's not encroaching on laptop territory the same way the CPU is—at least, not in these sorts of benchmarks. The advantage over other mobile devices is significant, though. There aren't any other devices this category that come close. As for performance gains relative to the iPhone XS and its A12, Shimpi said memory bandwidth is one part of that.
"The implementation is the same," he clarified. "But you do have much more memory bandwidth so there may be cases where it's actually faster than what you get on the phone if you do have a workload that has taken advantage of the fact that you have twice as big of a memory subsystem."
This impacts not just 3D graphics in games but a lot of the UI effects in iOS itself. Shimpi noted it's not just about peak memory bandwidth, but delivering bits efficiently. "Having that dynamic range is very important because there are times when you want to operate at a lower performance point in order to get efficiency and battery life," he said.
Mobile device comparisons aside, the laptop and desktop are more or less the ultimate target. "We’ll actually take content from the desktop, profile it, and use it to drive our GPU architectures. This is one of the things that you usually don't see in a lot of mobile GPU benchmarks," Shimpi explained.
But Apple repeatedly described the new iPad Pro's GPU performance as being comparable to the Xbox One S. That's the lower end of two Xboxes on the market right now, and it typically runs triple-A video games at 900p resolution. It's much, much weaker than its newer sibling the Xbox One X, which targets 4K for many games. (And resolution isn't all there is to it, either). Basically, the Xbox One S is an entry-level gaming console—not the latest and greatest, but perfectly adequate for playing today's most complex games. Generally, it's more powerful than Nintendo's Switch but less powerful than both PlayStation 4 models.
While it might not be the most powerful console, it's a striking guidepost for mobile devices. Our phones and tablets are normally nowhere near a game console when it comes to graphics performance. Still, the constant comparisons to the Xbox One S or PCs shouldn't suggest that the A12X architecture is similar. For example, the A12X shares memory between the GPU and CPU, much like Intel's integrated GPUs in laptops but much unlike the discrete memory in gaming PCs. (The Xbox One also takes a similar approach, but differs in other ways.) Shimpi talked about this in more detail:
Typically when you get this type of CPU and GPU performance, a combination of the two, you have a discrete memory system. So the CPU has its own set of memory and the GPU has its own set of memory, and for a lot of media workloads or pro workflows where you actually want both working on the same data set, you copy back and forth, generally over a very narrow slow bus, and so developers tend to not create their applications that way, because you don't want to copy back and forth.
We don't have any of those problems. We have the unified architecture, the CPU, the GPU, the ISP, the Neural Engine—everything sits behind the exact same memory interface, and you have one pool of memory.
On top of that, this is the only type of memory interface that iOS knows. You don't have the problem of, well, sometimes the unified pool may be a discrete pool, sometimes it may not. iOS, our frameworks, this is all it’s ever known, and so as a result developers benefit from that. By default, this is what they're optimized for, whereas in other ecosystems you might have to worry about, well, OK, sometimes I have to treat the two things as discrete; sometimes they share.
The Neural Engine and machine learning
The Neural Engine is designed to accelerate execution of machine learning-related tasks locally on the device in a manner that is more efficient and higher performing than if it was done on the CPU or GPU. And this is the part of the A12X for which Apple has claimed the biggest gains by far. In fact, there's no comparing the Neural Engine in the new iPad Pro with the former model, because the previous one didn't have the chip at all. Looking at Apple's phones, 2017's A11 could handle 600 billion operations per second; the A12 in 2018 iPhones is capable of 5 trillion.
Many users aren't clear on how a neural processing unit actually helps, largely because this is new to a lot of people. The first step to understanding why Apple is focusing on this is to identify some specific situations the Neural Engine is used for. These include, but are not limited to: recognizing your face via Face ID and the TrueDepth sensor array in the iPad Pro, scanning images and powering search features in the Photos app, processing speech, and numerous augmented reality-related tasks. In a recent interview with Wired, Apple's Tim Millet said that the experiences delivered on the modern iPhone "are critically dependent on the chip."
The Neural Engine in the A12X has eight cores, but Apple was mum on details about its architecture beyond that. All Shimpi and Schiller would tell us was that it is not adapted from the company's GPUs. This silicon powers many features that are built in to the iPad Pro, but it can also be utilized by third-party app developers in various ways through a software development API called CoreML.
Notably, the focus here is on doing machine learning tasks on the local device. There's a school of thought that says certain machine learning models would be most powerful if they could draw user data from millions of in-use devices and run on vast cloud computing networks. This route would involve collecting data from users, anonymously or otherwise.
But this is not how Apple does things. Its machine learning API allows developers to work with machine learning models in the cloud with the user's permission—but that cloud infrastructure is just not something Apple provides directly. In another approach, Apple offers Create ML to app developers; it's a tool that lets them run models on their development machines.
Apple says it doesn't focus on putting user data in the cloud and running models on it for two reasons. First, the company styles itself a privacy-focused alternative to its competitors. Second, many other use cases would be much more efficient when running on the local device. Shimpi noted that you would of course never want to do inference in the cloud. And Schiller pointed to an app that analyzes the user's basketball throw in real time as something that just would not be possible given the latency that would occur if you send that data to the cloud and back.
"Low latency is very important, privacy is very important," Shimpi added.
"Fundamentally, the reason we built the chip, is in service to the product's vision and its ambition," Shimpi said. "At the end of the day we want to make sure that whatever vision we have set out for the thing, if it requires custom silicon, that we're there to deliver. For a given form factor, for a given industrial design in this thermal envelope, no one should be able to build a better, more performant chip."
The thing is, Apple's A11 was already that for the phone, and the A10X was already that for the tablet. Why keep pushing? Schiller responded to that question with a passionately delivered speech:
People say, "Well, you're looking at this company or that company." We don’t; we really focus on our own self. The competition can do whatever they're going to do. We're trying our things the best way we know how. The counter of that is, because you're not worrying about it, when you're trying to make things better, you're also not caring if you're doing a lot better than the competition. It's not good enough. We're judging ourselves on ourselves.
What do we think we can do? It becomes this incredibly self-perpetuating thing. When you realize you can create a Neural Engine, you want to create a better Neural Engine! You realize you can create great graphics, you want to create even better graphics! And it just accelerates. It picks up speed within the organization.
If you're a team that makes an incredible, great Apple-designed A-series chip, well, next year you want to make an even better one, right? That's your passion. That's what you see across all Apple, is teams that take responsibility for their things are so passionate about making that thing better and better and better. It doesn’t even matter what anyone else is doing.
We don't care if they're doing something that isn’t interesting to us. We don't care if we're lapping them. Good. We’ll lap them ten times. It doesn't matter because it's in service to the user, not the competition.
There are a couple of additional possible reasons Apple may be pushing this hard that Schiller didn't mention. First, Apple has made augmented reality a major focus in recent iOS releases, in its development APIs, and in its iPhone and iPad hardware—including the silicon. As we noted in our coverage of ARKit 2, Apple's investment there is not just about current use cases on iPhones, but racing to a future tipping point where AR may become ubiquitous.
If the company can avoid resting on its laurels and focus on building the world's most powerful consumer AR platform before competitors, it may go into that possible future in a strong position. That will require ongoing, aggressive work on its custom silicon, among other things.
And then there's the question of the Mac. To bring all of Apple's work in machine learning and other areas to the Mac, custom silicon is essentially required. Intel and AMD's chip roadmaps don't seem to be compatible with some of Apple's apparent longterm goals. It could be that Apple is working to push itself in part because the end result is a custom-made laptop- or desktop-class CPU and GPU for the Mac platform that rivals or beats Intel's top-performing chips. (Unsurprisingly, Apple would not address its future plans for the Mac with us in our conversation.)
So we have a few possibilities for answers to "why," Schiller's response about team culture and process foremost—the rest is just speculation on our part. But how did Apple do this? Apple wasn't forthcoming with many deep technical details here, as expected, but Schiller attributed the company's success with its custom silicon in large part to how teams work together inside Apple.
The chip team will be literally a detective on the other team saying, "OK, we're planning, we really want more insight. What exactly do you want to do, how do you want it to work? What are the bottlenecks, where can we start creating silicon that ultimately will be part of a well-crafted system?”
Those meetings happen multiple times a week. It's not like there's some big get together, once a year, just to align schedules. They literally are having these discussions weekly about—more and more—a growing number of topics. It's not a finite set. It's a growing number.
Schiller added that the process of developing these chips starts years before they are released. That begins with teams meeting and talking about how to solve specific user problems on specific devices.
For years, most of Apple's responses to how and why it does what it does have come down to this same point. There are advantages when you do everything in-house and everything is integrated from end to end, whether that's meant in a technological or an organizational sense. Apple's obviously not sharing all its technical trade secrets on how the sausage is made, but this one point seems nevertheless to be Apple's explanation for most moments when it seizes the lead in something—the phone or tablet SoC is no exception.
Where does Apple go from here?
Apple is pushing up against high-end laptop and even desktop performance here, depending on what you're using for comparison. Granted, comparing architectures can be Apples (ahem) and oranges. Apple's CPU efforts are industry-leading on the mobile side of things, but they're not perfect. While Apple focuses on performance, Qualcomm, well, doesn't—partly because it essentially has a monopoly in the Android world and may not feel it even needs to, but partly because it focuses on connectivity. (Qualcomm's modems are industry-leading, even if its CPUs are not.)
There's one intriguing bit of context for all of this that Apple won't acknowledge in its discussions with Ars or anyone else: Macs are still on Intel chips. It's obvious to those who follow the company closely why that status quo isn't providing what Apple needs to move forward in its strategies. Further, a Bloomberg report citing sources close to the company claimed that Apple plans to launch a Mac with custom silicon—and we're talking CPU here, not just the T2 chip—are in the works.
Apple has come to dominate in mobile SoCs. In a lot of ways, though, Qualcomm has been an easy dragon to slay. Should Apple choose to go custom silicon route on the Mac platform, Intel will not be quite as easy to beat. But the rapid iteration that has led to the iPad Pro's A12X makes a compelling case that it's possible.
Apple won't talk about its future plans, of course. You could say that's all in the future, but when you have a 7nm tablet chip that rivals the CPU and graphics performance of most laptops and beats two out of five of the modern gaming consoles on the market with no fan at barely over a pound and less than a quarter-inch thick... it feels a bit like at least some particular future is now.
Now, if only there were iOS versions of Final Cut, Xcode, and Logic. Powerful hardware is nothing without strong software support, and as we've noted in our review, that's the area where we need to see some improvement for the iPad Pro to truly live up to its considerable potential.
Ron Amadeo and Peter Bright contributed to this report.