Although ARM is a small company, they are the core of the entire ARM processor camp. Except for a few companies such as Apple and Qualcomm that can develop ARM-compatible architectures on their own, most companies such as MediaTek and HiSilicon will directly use ARM's public version Cortex-A architecture license. Since the 64-bit era, ARM has released two architectures, Cortex-A57/A53, one large and one small, but only A53 is popular. The high-performance A57 core is only used by Samsung and Qualcomm in the mobile phone market, facing the embarrassment of difficult birth. To this end, ARM launched the successor to A57, the Cortex-A72 architecture, in February this year, claiming that its performance is 3.5 times that of A15 and its power consumption is reduced by 75%. At that time, we didn't know much about ARM's Cortex-A72 core. We only knew that the A72 core would be produced using a new generation of FinFET process, including Samsung/Globalfoundries' 14nm and TSMC's 16nm. Thanks to the advanced process, the A72 core frequency is higher, reaching 2.5GHz. The higher frequency will help A72 enhance its presence in the server market, which is also one of its target markets. ARM recently announced the detailed architecture of the Coretx-A72 core, and the Anandtech website also did some analysis. Let's take a look at what improvements the A72 architecture has made. After all, the A72 core is likely to become the standard for next year's flagship mobile phones/tablets. It is worth mentioning that the original explanation of the naming of the A72 core was given. Why not choose the name Cortex-A59? ARM explained that this was purely for the convenience of marketing. If the naming was very similar to the A57, it would be difficult for people to see the difference between the two cores - do people only know the difference by numbers? ARM previously advertised that the A72 core performance is 3.5 times that of the previous generation and the power consumption is reduced by 75%, but everyone should understand that this is just publicity. ARM did not directly compare the difference between A57 and A72. If you only look at the two, the A72 core is only 34% higher than the A57 under the same 14/16nm process (2.6x to 3.5x, and the frequency difference must also be taken into account), and under the same 28nm process, its power consumption is only reduced by about 20%. It should also be noted that the A72 core can run at a higher frequency, not simply the highest frequency. Previously, the A57 was too powerful, so it could only maintain the highest frequency for a short time before it had to be reduced in frequency. The data provided by ARM shows that the A72 core consumes only 750mW when running at 2.5GHz under the 16nm FinFET process. In addition to power consumption, ARM has also made a lot of optimizations on the A72 architecture. As shown in the figure above, the performance of integers, floating points, memory, etc. has been improved to varying degrees. Although some details are still missing, there is still a 16-30% improvement in IPC performance. A72 architecture upgraded from A57 ARM seems to have made comprehensive improvements in performance, power consumption and core area, which are also three important indicators of semiconductor design. This achievement was achieved after ARM re-optimized almost all A57 logic blocks, and the CPU architecture has made considerable improvements, including a new branch prediction unit, improved decoder pipeline design, etc. In the area of instruction prefetching, we can see that ARM has redesigned the branch prediction unit to support more complex algorithms, improve performance, reduce power consumption, misprediction rate and speculation rate. Specifically, compared with A57, the misprediction rate is reduced by 50% and the speculation rate is reduced by 25%. The redundant branch prediction units are disabled. - In actual work, if the branch prediction unit cannot work effectively, it will be bypassed. In addition, ARM has optimized the RAM organization by better coupling different IP blocks. Looking at the A72 pipeline design, the decode/renaming performance has also been improved. The decoder itself is a 3-instruction issue decoder, but ARM has put a lot of effort into improving performance and reducing power consumption. In order to improve performance, the effective decode bandwidth has been increased, and the decoder has also received some AArch64 instruction fusion enhancements. In addition, power consumption has been reduced in a variety of ways, including direct decoding. It seems that the instruction dispatch/retire unit is the biggest change in the improvement made to improve performance. The decoder can fuse instructions, and ARM's instruction dispatch unit can disrupt the ops operation into smaller micro-ops and send them to the execution unit, so that 3 issues can be turned into the equivalent of 5 issues in the instruction dispatch unit. This will increase the throughput of the decoder and also increase the number of micro-ops created by the instruction dispatch unit per cycle. For the A72 architecture, ARM says there are an average of 1.08 micro-ops per instruction, which will alleviate the performance of the instruction dispatch unit that is actually limited in the 57 architecture. On the other hand, the execution unit also has a new design, including a new generation of FP floating-point/advanced SIMD units. Since the FP floating-point pipeline is reduced from 9 to 6, the latency is lower. The latency of FMUL (floating-point multiplication) has also been reduced from 5 cycles to 3 cycles, FADD (floating-point addition) from 4 to 3, FMAC (floating-point accumulation multiplication) from 9 to 6, and CVT units from 4 to 2. The rendering pipeline length of the FP floating-point unit has been reduced from 19 to 16. The integer unit has also been improved. The bandwidth of the Radix-16 divider has been doubled, and the CRC unit delay has been reduced to 1 cycle. Compared with the A57 architecture, its bandwidth has been tripled. Another major performance improvement is the L/S load-store unit, ARM says the bandwidth of the L/S unit has been increased by 30% due to the introduction of a new prefetcher. The improvements on paper of the A72 architecture are impressive. It is an innovative upgrade of the A57 architecture. Whether it is performance, power consumption, or core area, the A72 has improvements. The A57 architecture entered the market in Q3 last year, but Samsung and Qualcomm's A57 architecture processors are only now being mass-produced and launched on the market, so it will take at least a year for the A72 core to truly enter the market. As a winner of Toutiao's Qingyun Plan and Baijiahao's Bai+ Plan, the 2019 Baidu Digital Author of the Year, the Baijiahao's Most Popular Author in the Technology Field, the 2019 Sogou Technology and Culture Author, and the 2021 Baijiahao Quarterly Influential Creator, he has won many awards, including the 2013 Sohu Best Industry Media Person, the 2015 China New Media Entrepreneurship Competition Beijing Third Place, the 2015 Guangmang Experience Award, the 2015 China New Media Entrepreneurship Competition Finals Third Place, and the 2018 Baidu Dynamic Annual Powerful Celebrity. |
<<: Surface 3 upgrade to Windows 10: Pain and joy
This article includes: concept Pros and Cons Anal...
In every industry, there are unique professional ...
What is your first reaction when you hear puffed ...
Now, as long as you open the various shopping app...
As the saying goes, "Food is the most import...
Program model handling IApplicationModelConventio...
Hong Kong servers are favored by foreign trade co...
I don't know since when, hair It has become a...
The 2021 National Science Popularization Day Beij...
On January 2, although restoring the factory sett...
The launch of WeChat public accounts has led to t...
1. Market background analysis Market development ...
niaoniao's 18th C4D+OC rendering course will ...
Any product or project faces the problem of new u...