Make your PHP 7 faster (GCC PGO)

Make your PHP 7 faster (GCC PGO)

[[137597]]

We have been working hard to improve the performance of PHP7. Last month we noticed that GCC PGO can bring nearly 10% performance improvement on WordPress, which makes us very excited.

However, PGO, as its name suggests (Profile Guided Optimization, you can Google it if you are interested), requires some use cases to get feedback, which means that this optimization needs to be bound to a specific scenario.

What you optimize for one scenario may not work for another scenario. It is not a universal optimization. So we cannot simply include these optimizations, nor can we release PGO-compiled PHP7 directly.

Of course, we are trying to find some common optimizations from PGO and then manually apply them to PHP7, but this obviously cannot achieve the effect that can be achieved by special optimization for a scenario, so I decided to write this article to briefly introduce how to use PGO to compile PHP7, so that your compiled PHP7 can make your own independent application faster.

First, you need to decide which scenario to use to give feedback to GCC. We usually choose: the page with the most visits, the most time-consuming, and the most resource-intensive page in the scenario you want to optimize.

Take WordPress as an example, we choose the homepage of WordPress (because the homepage is often the most visited).

Let's take my machine as an example:

  1. Intel(R) Xeon(R) CPU X5687 @ 3.60GHz X 16 (Hyperthreading),
  2. 48G Memory

php-fpm uses a fixed 32 workers, and opcache uses the default configuration (be sure to remember to load opcache)

Taking WordPress 4.1 as the optimization scenario..

First, let's test the current performance of WP in PHP7 (ab -n 10000 -c 100):

  1. $ ab -n 10000 -c 100 http: //inf-dev-maybach.weibo.com:8000/wordpress/  
  2. This is ApacheBench, Version 2.3 <$Revision: 655654 $>
  3. Copyright 1996 Adam Twiss, Zeus Technology Ltd, http: //www.zeustech.net/  
  4. Licensed to The Apache Software Foundation, http: //www.apache.org/  
  5.  
  6. Benchmarking inf-dev-maybach.weibo.com (be patient)
  7. Completed 1000 requests
  8. Completed 2000 requests
  9. Completed 3000 requests
  10. Completed 4000 requests
  11. Completed 5000 requests
  12. Completed 6000 requests
  13. Completed 7000 requests
  14. Completed 8000 requests
  15. Completed 9000 requests
  16. Completed 10000 requests
  17. Finished 10000 requests
  18.  
  19. Server Software: nginx/ 1.7 . 12  
  20. Server Hostname: inf-dev-maybach.weibo.com
  21. Server Port: 8000  
  22.  
  23. Document Path: /wordpress/
  24. Document Length: 9048 bytes
  25.  
  26. Concurrency Level: 100  
  27. Time taken for tests: 8.957 seconds
  28. Complete requests: 10000  
  29. Failed requests: 0  
  30. Write errors: 0  
  31. Total transferred: 92860000 bytes
  32. HTML transferred: 90480000 bytes
  33. Requests per second: 1116.48 [#/sec] (mean)
  34. Time per request: 89.567 [ms] (mean)
  35. Time per request: 0.896 [ms] (mean, across all concurrent requests)
  36. Transfer rate: 10124.65 [Kbytes/sec] received

It can be seen that WordPress 4.1 currently on this machine, the QPS of the home page can reach 1116.48. That is, it can process so many requests for the home page per second.

Now, let's start teaching GCC to compile PHP7 to run faster than WordPress4.1. First of all, GCC 4.0 or above is required, but I recommend everyone to use GCC-4.8 or above (now GCC-5.1).

The first step is to download the PHP7 source code and then do ./configure. There is no difference between

Now here's the difference, we have to compile PHP7 first, to make it generate the executable file that will generate the profile data:

  1. $ make prof-gen

Note that we use the prof-gen parameter (this is specific to PHP7 Makefile, don’t try this on other projects :) )

Then, let's start training GCC:

  1. $ sapi/cgi/php-cgi -T 100 /home/huixinchen/local/www/htdocs/wordpress/index.php >/dev/ null  

That is, let php-cgi run the homepage of wordpress 100 times, and generate some profile information in the process.

Then, we start compiling PHP7 for the second time.

  1. $ make prof-clean
  2. $ make prof-use && make install

OK, that's it, PGO compilation is complete, now let's take a look at the performance of PHP7 after PGO compilation:

  1. $ ab -n10000 -c 100 http: //inf-dev-maybach.weibo.com:8000/wordpress/  
  2. This is ApacheBench, Version 2.3 <$Revision: 655654 $>
  3. Copyright 1996 Adam Twiss, Zeus Technology Ltd, http: //www.zeustech.net/  
  4. Licensed to The Apache Software Foundation, http: //www.apache.org/  
  5.  
  6. Benchmarking inf-dev-maybach.weibo.com (be patient)
  7. Completed 1000 requests
  8. Completed 2000 requests
  9. Completed 3000 requests
  10. Completed 4000 requests
  11. Completed 5000 requests
  12. Completed 6000 requests
  13. Completed 7000 requests
  14. Completed 8000 requests
  15. Completed 9000 requests
  16. Completed 10000 requests
  17. Finished 10000 requests
  18.  
  19. Server Software: nginx/ 1.7 . 12  
  20. Server Hostname: inf-dev-maybach.weibo.com
  21. Server Port: 8000  
  22.  
  23. Document Path: /wordpress/
  24. Document Length: 9048 bytes
  25.  
  26. Concurrency Level: 100  
  27. Time taken for tests: 8.391 seconds
  28. Complete requests: 10000  
  29. Failed requests: 0  
  30. Write errors: 0  
  31. Total transferred: 92860000 bytes
  32. HTML transferred: 90480000 bytes
  33. Requests per second: 1191.78 [#/sec] (mean)
  34. Time per request: 83.908 [ms] (mean)
  35. Time per request: 0.839 [ms] (mean, across all concurrent requests)
  36. Transfer rate: 10807.45 [Kbytes/sec] received

Now we can process 1191.78 QPS per second, which is an improvement of ~7%. Not bad (Hey, didn’t you say 10%? How did it become 7%? Haha, as I said before, we tried to analyze what optimizations PGO has done, and then manually apply some common optimizations to PHP7. So in other words, the ~3% of more common optimizations have been included in PHP7, of course this work is still ongoing).

So it’s that simple. You can use the classic scenarios of your own products to train GCC. With just a few simple steps, you can get an improvement. Why not?

<<:  I asked the programmer goddess for her QQ number, but...

>>:  Hprose for Node.js 1.6.0 released

Recommend

Analysis of Zuoyebang’s product strategy (Part 2)

Zuoyebang is committed to providing learning tuto...

Kotlin memory optimization from a compilation perspective

Author: Yan Yongjun, Unit: China Mobile Smart Hom...

Application of workflow engine in vivo marketing automation

Author: Cheng Wangrong, vivo Internet Server Team...

Why did life evolve the function of death?

Survival or destruction? This is not only a philo...

Case: Why McDonald's and KFC are always opened next to each other

First, let me tell you a story: There was a pair ...

Data operation case: information flow feeds product optimization

The author of this article leads everyone to have...

up to date! Ranking of data rise and fall of 43 information flow platforms!

The latest traffic rankings of major information ...

Gigya: 2017 Consumer Attitudes towards Privacy and Security Report

199IT original compilation There are still many t...

Analysis of the 5 most accurate traffic channels for online marketing!

Traffic is a hot topic that all walks of life are...

Advertising: Can your ads be seen by users?

What are we talking about when we talk about ad v...