What advances have been made in the Zen architecture that will allow AMD and Intel to compete with each other?

It has been many years since AMD and Intel last faced off. A fans all miss the K7 era created by the EV6 bus, and the glorious days when AMD64 won the favor of Microsoft's new system, forcing Intel to give up promoting IA64 and obtain the x86-64 license from A. However, the good times never last long. As Intel got rid of the shadow of NetBurst and created the Core era, A never got a single cent from Intel. Later, joining the ARM camp did not bring any benefits to itself. At the same time, the GPU department was also in a more and more obvious decline, causing NVIDIA to press it to the ground for quite a long time. Therefore, in the past few years, AMD has been nicknamed the "slide" factory - AMD has been relying on the "PPT to save the country" policy to keep A fans hanging in the air, which also makes them give supportive remarks with self-deprecation every time they face the upgrade of A. But this time, AMD seems to have really given people hope of turning things around. During the Hot Chips 2016 chip technology forum this week, AMD disclosed a lot of details about its new microprocessor architecture Zen, which seemed to be a sign of the return of the hero of the past. So, what progress has Zen made that can allow AMD to stand on the same playing field with Intel again? A shot in the arm: microinstruction cache AMD chose to completely abandon the existing architecture for the design of the next-generation microprocessor architecture. From the beginning, it set the goal of "high-performance x86 processor" and redesigned the core of Zen. The previous generation architecture, that is, the bulldozer/excavator, has exposed many defects in practical tests. AMD's core design engineers should have their own considerations for choosing to break it first and then build it instead of checking for leaks and filling them. From the new core architecture, there is a change worth noting: AMD has added a microinstruction cache to Zen. The role of the microinstruction cache in a computing module is to make the microinstructions closer to the microinstruction queue, avoiding the core from wasting extra time to fetch instructions from the lower-level instruction cache, which has a very obvious effect on improving the execution efficiency of the entire core. Intel started adding micro-op caches to its CPU core designs very early on, and the results were so good that they continued to do so for several generations (considering that Conroe, which widened the gap between Intel and AMD, may have directly benefited from this), there is no reason not to believe that AMD's imitation will bring more considerable positive effects. The current question surrounding AMD Zen's micro-op cache is only its size. If you have to guess, considering that typical micro-op caches are relatively small, and Intel's micro-op cache supports 8-way 1536 uOps, AMD's buffer parameters should be on par with Intel's, because there is no choice. With this, AMD's claim that Zen has at least a 40% increase in instructions per clock cycle (IPC) compared to the previous generation of cores can be much more credible. Of course, the addition of microinstruction cache alone without numerical improvement would make 40% seem a bit pale, so AMD also enlarged all the indicators of Zen : the number of single microinstruction dispatches increased from 4 to 6 (and 4 integer and 6 floating-point microinstructions can be dispatched at the same time); the integer/floating-point instruction schedulers increased from 48/60 to 84/96; the length of L/S queues and return queues increased by about 50%. Ideally, these numerical improvements combined with more accurate branch predictions can enable the core to achieve higher throughput in the fastest order and maintain this high efficiency for a longer period of time. With the microinstruction cache, AMD Zen can be said to have made up for a core shortcoming. Without this step, it would be impossible to challenge Intel. Rebuilding confidence again: cache hierarchy reconstruction Compared with a bulldozer, Zen's cache hierarchy structure is not an exaggeration to describe it as a complete transformation. Although AMD avoids talking about data such as cache latency and bandwidth, it is currently believed that changes that can bring positive effects do exist. First of all, Zen changed the 32KB L1 data cache for each core from the Bulldozer penetration to the write-back type, and no longer tied the data update of the core cache to the bus cycle of the CPU and memory. This can maintain the performance of the burst write of the L1 cache without waiting for the slower memory to synchronize the cached data in the same bus cycle. And according to the statistics that Load operations are more frequent than Store operations, Zen's L1 cache makes the L/S unit asymmetric and gives more Load channels. Back to the overall view, AMD completely dismantled the original Bulldozer architecture and built a new CCX - a structure that hangs four CPU cores and their L1 and L2 caches on the L3 cache. Among them, this 8MB L3 cache is not like our common CPU lower-level cache, which collects data according to the pre-fetch/request instructions required by the CPU core. Its role is to provide temporary accommodation for instructions that are driven out of the L1 and L2 caches because they are not executed in time or marked invalid by the write-back command. It is more like a refugee camp, so it must not be as efficient as the L1 and L2 caches. However, because the 8-way L2 cache of the Zen core is as large as 512KB, this inefficiency can be corrected to a certain extent. At the same time, since there is no need for the refugee cache to store the data in the L2 cache, it reduces the data redundancy in the cache to indirectly improve the cache utilization efficiency, or cache capacity. The modular design adopted by AMD in Zen also allows the new CPU to have better product line scalability, enabling a set of architectures to cover everything from the most energy-efficient mobile chips to the hottest performance darlings, avoiding the fault phenomenon like the previous generation. For example, a CCX is made into a low-power 4-core CPU for notebooks, competing with Intel's mobile i3/i5. On the desktop, two CCXs are put together to form an 8-core Zen, directly challenging the i7. However, what AMD did not clearly state is what the interconnection structure between CCXs is, and they denied the speculation based on the improvement of the HyperTransport bus, but did not give a specific answer, leaving a question that makes people think about it. In a word, let's not talk about whether AMD's cache efficiency can be as it claims, doubling the first and second level cache throughput and multiplying the third level cache speed by 5. At least in terms of size, it is no problem to surpass Intel's current Skylake. Ternary Methodology: Real SMT Intel's application of simultaneous multithreading (SMT) can be traced back to 2008. Splitting a core into two threads is a difficult task. Just telling these two threads how to get along well with each other, use cache and resources reasonably, and not monopolize each other... These are enough for those engineers to drink a pot. Maybe AMD has not added SMT capabilities to its own CPUs for so many years, which has blocked the way for this problem. Next year, we should be able to see 8-core/16-thread AMD CPUs. From the inside, the scheduling between threads of the Zen core mainly follows the time-sharing strategy. Although considering that different threads can have many different occupancy characteristics, this is not the best solution. AMD still relies on its own set of thread marking/discrimination methodology to force it. There are usually three situations in which the process priority is interfered in Zen. One is that the CPU will analyze the data flow of each thread to determine which one has a higher algorithmic priority. When it comes to resource-heavy tasks such as branch prediction and integer/floating point renaming, the thread adjusts its priority accordingly. Another is that when the thread involves latency-sensitive operations such as TLB cache and Load queue operations (which are usually reflected in the upper layer in a timely response to user feedback), the CPU will assign thread processing priorities based on latency demand tags. For parts like microinstruction queues that follow a sequence, the CPU will use a static time-sharing strategy to allow threads to process alternately. As for the rest, it is much simpler and cruder. First come first served, whichever thread needs more corresponding core resources will rush to occupy them first. If we raise it to the level of the operating system and application software, and observe AMD's SMT from their perspective, it is similar to Intel's hyperthreading. Each thread is treated as a core and there will be no resource usage restrictions like a bulldozer. I don't know if AMD has learned from Intel's HyperThreading this time, and whether it can surpass Intel's, but it is certain that the performance of Zen floating-point calculations will be greatly improved compared to the previous generation of AMD CPUs. Process technology: See FinFET again. I'm sure everyone has heard the term FinFET so many times that their ears are callused. We have introduced this technology in our previous discussions on mobile phone processor chips, so let's make it short here. Power consumption has always been an aspect that AMD is used to taking care of when designing its own CPUs. To achieve a TDP of less than 100W, it is not only to set the gated clock more aggressively. Zen intends to use the Global Foundries 14nm FinFET process that they have actually tested on the Polaris GPU earlier. Moreover, AMD does not intend to copy the GPU solution. They also want to use this density-optimized process. After all, they need to control the DIE area-this is something AMD did not explain in this Hot Chips. If the current design forces them to use 500 square millimeters of 14nm DIE, it will go against AMD's usual pricing strategy, and the final product will definitely be expensive. However, considering that the birth of Zen has been accompanied by target adjustments, it is difficult to determine how much AMD's next-generation CPU can maintain what we see now. 40% on paper, 2% in reality? Having said so much, end users are actually hard to be fooled by PPT. No matter how exaggerated the numbers and architecture are, they want nothing more than two things: first, they must be able to buy it at a suitable price; second, the actual feeling of using it is really not slow. AMD did a Blender run for the attendees at Hot Chips 2016. The 3GHz 8-core Zen and the 3GHz 8-core Broadwell-E, Zen can be 2% faster than Broadwell under the same multi-threaded custom load. But AMD did not disclose more configuration details. When carrying the previous record of "rising on PPT", the choice of speaking can only be cautious, cautious and cautious. It is not an easy task to dispel public suspicion. If AMD can successfully ship Zen in batches in the first quarter of 2017 (actually, it has already been delayed, originally scheduled for October this year), consumers may first find the new CPU in branded computers. Maybe they can seize this opportunity to return to the high-end x86 CPU battlefield and compete with Intel again, but at least they must ensure that they have the cards to play when facing their old rivals, and don't delay again.

As a winner of Toutiao's Qingyun Plan and Baijiahao's Bai+ Plan, the 2019 Baidu Digital Author of the Year, the Baijiahao's Most Popular Author in the Technology Field, the 2019 Sogou Technology and Culture Author, and the 2021 Baijiahao Quarterly Influential Creator, he has won many awards, including the 2013 Sohu Best Industry Media Person, the 2015 China New Media Entrepreneurship Competition Beijing Third Place, the 2015 Guangmang Experience Award, the 2015 China New Media Entrepreneurship Competition Finals Third Place, and the 2018 Baidu Dynamic Annual Powerful Celebrity.

<<: With neither demand nor technology, Italy has placed all its hopes for new energy vehicles on China

>>: Can you buy the DIY consoles from e-commerce companies at low prices? The profits are staggering

Exclusive analysis: 4 ways to monetize mobile Internet! (Down)

During the Spring Festival, if you have a sudden heart attack, which emergency medicine should you choose? Check out the expert summary!

Blog

He discovered the Mayan "Lost City" in the rainforest, relying on a free map?

China Passenger Car Association: China's new energy vehicle sales are expected to exceed 15.65 million units in 2025, a year-on-year increase of 28%

According to recent news, Cui Dongshu, secretary g...

Brilliance Automotive Group has finally learned something from BMW. Can the Brilliance Automotive Group V7 create a miracle?

When talking about Zhonghua, what do you think of...

Why do humans love kissing? This may be the scientific explanation for the "love brain"...

When we are immersed in the romantic moment of ki...

Quick guide to host live broadcast!

1. The difference between live broadcast and live...

How much commission can you get by selling products on the Douyin platform? Introduction to Douyin's commission calculation and withdrawal methods

The development of short videos has been particul...

Leapmotor Financial Report: In 2024, Leapmotor will deliver 144,155 vehicles in the whole year, an increase of nearly 30% year-on-year

Leapmotor released its "2023 Annual Financia...

Digital RMB App is online: Mobile phone number registration is available and there are three major differences from Alipay and WeChat

Recently, the Digital RMB (pilot version) App was...

What advances have been made in the Zen architecture that will allow AMD and Intel to compete with each other?

Exclusive analysis: 4 ways to monetize mobile Internet! (Down)

They are all mung bean soups, but why are some red and some green? Which one is better?

Nezha used lotus root powder to rebuild his body! So what is the nutritional code of lotus root powder?

Can’t use your phone during thunderstorms? Don’t be fooled by rumors!

During the Spring Festival, if you have a sudden heart attack, which emergency medicine should you choose? Check out the expert summary!

He discovered the Mayan "Lost City" in the rainforest, relying on a free map?

The "blue chicken" unique to our country is mysterious and charming!

Good news for designers: Apple releases Figma design kit, covering all resources for iOS 17 and iPadOS 17

The strange naming rules of operating systems from the new version of Ubuntu

Can a rice cooker be used to boil water?

Recommend

Why do Swedish table tennis players use hexagonal rackets? What are the advantages?

Seven core questions about the Didi-Kuaidi merger: Will subsidies continue to be provided to new users?

The "blue chicken" unique to our country is mysterious and charming!

China Passenger Car Association: China's new energy vehicle sales are expected to exceed 15.65 million units in 2025, a year-on-year increase of 28%

Why is the click-through rate of Baidu bidding promotion so low? What to do if the search promotion click rate is low?

What knowledge is needed to develop WeChat Mini Programs? What language is needed to develop WeChat Mini Programs?

YouTube reveals: 5-second ads earn more than 120-second ads

Weibo, WeChat and Douyin: New rules for brands

How much does it cost to join an attendance app in Bayannur?

Brilliance Automotive Group has finally learned something from BMW. Can the Brilliance Automotive Group V7 create a miracle?

Why do humans love kissing? This may be the scientific explanation for the "love brain"...

Quick guide to host live broadcast!

How much commission can you get by selling products on the Douyin platform? Introduction to Douyin's commission calculation and withdrawal methods

Leapmotor Financial Report: In 2024, Leapmotor will deliver 144,155 vehicles in the whole year, an increase of nearly 30% year-on-year

Digital RMB App is online: Mobile phone number registration is available and there are three major differences from Alipay and WeChat