Just the GPU of GPT-4 consumes 24 million kWh of electricity for one training. Why does AI consume so much electricity? Where does all the electricity go? Is it possible to recycle the energy converted from this electricity? Written by | Mammoth Today, when we talk about artificial intelligence (AI), we mainly refer to generative AI, and a large part of it is generative AI based on large language models. They require large-scale data centers for training and reasoning. These data centers are composed of a large number of servers, and most of the power consumed by the servers is converted into heat energy, which is finally released through the water cooling system. So it can be said that the physical hardware of AI is a huge "electric water heater." This statement may sound a bit strange. We all know that a server is a kind of electronic computer, and what is processed in a computer is information. What is the relationship between information and energy? It really does exist. Processing information consumes energy In 1961, Rolf Landauer, a physicist working at IBM, published a paper proposing a theory that was later called "Landauer's Principle." The theory states that when information stored in a computer undergoes irreversible changes, it will emit a little bit of heat to the surrounding environment. The amount of heat emitted is related to the temperature of the computer at the time - the higher the temperature, the more heat it emits. Landauer's principle connects information and energy; more specifically, it connects to the second law of thermodynamics. Because logically irreversible information processing operations mean annihilation of information, this will lead to an increase in entropy in the physical world, thereby consuming energy. This principle has been questioned since it was proposed. However, in the past decade or so, Landauer's principle has been experimentally proven. In 2012, Nature magazine published an article in which a research team measured for the first time the tiny amount of heat released when a "bit" of data was deleted. Several subsequent independent experiments also proved Landauer's principle. So, processing information has an energy cost. The energy actually consumed by today's electronic computers when calculating is hundreds of millions of times this theoretical value. Scientists have been working hard to find more efficient calculation methods to reduce costs. However, judging from the current research progress, perhaps only when true room-temperature superconducting materials can be widely used in computing devices, this energy consumption may be closer to the theoretical value described by the Landauer principle. AI large models do require a lot of computing. Its working process can be roughly divided into two stages: training and inference. In the training stage, you first need to collect and preprocess a large amount of text data as input data. Then initialize the model parameters in the appropriate model architecture, process the input data, and try to generate output; then adjust the parameters repeatedly according to the difference between the output and the expected, until the performance of the model is no longer significantly improved. In the inference stage, the trained model parameters will be loaded first, the text data that needs to be inferred will be preprocessed, and then the model will generate output according to the learned language rules. Both the training and inference phases are a series of information reorganization processes, and they also follow the Landauer principle. It is not difficult to infer that the larger the number of model parameters, the more data needs to be processed, the greater the amount of calculation required, the greater the energy consumed, and the more heat released. However, this is only a tiny part of AI's power consumption. The greater consumption comes from another physical law that we are more familiar with: Joule's law. This starts with integrated circuits. The biggest energy consumption comes from current Today's electronic computers are built on the basis of integrated circuits. We often call integrated circuits chips. Each chip contains many transistors. In a loose sense, transistors can be understood as tiny switches. These switches are connected in series or in parallel to perform logical operations. "On" and "off" represent two states, the so-called 1 and 0, which are the basic units of calculation, the "bit". It is the basis of computer binary. Computers flip these switches by quickly changing the voltage. To change the voltage, electrons need to flow in or out. The flow of electrons in and out constitutes an electric current. And because there is always resistance in the circuit, heat energy is generated. Joule's law tells us that the heat generated is proportional to the square of the current, proportional to the resistance of the conductor, and proportional to the time the power is on. With the development of integrated circuit technology, the transistors in the chip have become extremely small. Therefore, the heat generated by a single transistor is not too high. But the problem is that the number of transistors on the chip is really beyond the imagination of ordinary people. For example, in the equivalent 2-nanometer process chip released by IBM a few years ago, there are an average of 330 million transistors per square millimeter. Even the smallest amount of heat, multiplied by this scale, must be quite considerable. An interesting fact that may surprise people is that the power per unit volume of today's chips is several orders of magnitude higher than that of the sun's core. The power of a typical CPU chip is about 100 watts per cubic centimeter, or 100 million watts per cubic meter; while the power of the sun's core is less than 300 watts per cubic meter. When OpenAI trained the large language model GPT-4, it took about three months to complete a training session, using about 25,000 Nvidia A100 GPUs. Each A100 GPU has 54 billion transistors, consumes 400 watts of power, and can perform 19.5 trillion single-precision floating-point operations per second, and each operation involves the switching of many transistors. It is easy to calculate that these GPUs alone use 240 million kilowatt-hours of electricity for one training. Almost all of this electricity is converted into heat energy, which can heat about 2 million cubic meters of ice water - about the amount of water in 1,000 Olympic-standard swimming pools - to boiling. Why does AI need so many powerful GPUs to train? Because the scale of large language models is too large. The GPT-3 model has 175 billion parameters, and it is estimated that GPT-4 has 1.8 trillion parameters, ten times that of GPT-3. To train a model of this size, it is necessary to iterate repeatedly on a large-scale data set, and each iteration requires calculating and adjusting the values of billions, tens of billions, or even hundreds of billions of parameters. These calculations will eventually manifest as the switching of transistors and the thin current in integrated circuits-and heat. Energy can neither be created nor destroyed, it can only be converted from one form to another. For electronic computers, the most important way to convert energy is from electrical energy to thermal energy. The same is true for large language models. Their demand for electricity and cooling water is causing increasingly serious environmental problems. Recovering heat from an "electric water heater"? Just a few days ago, an engineer from Microsoft said that in order to train GPT-6, Microsoft and OpenAI built a huge data center and will use 100,000 Nvidia H100 GPUs - which have stronger performance than A100 and of course consume more power - but these GPUs cannot be placed in the same state, otherwise it will cause the power grid to be overloaded and collapse. The energy shortage problem brought about by the development of AI has begun to emerge. At this year's World Economic Forum in Davos, OpenAI CEO Sam Altman believed that nuclear fusion may be the direction of energy development. However, it may take some time to develop truly usable nuclear fusion technology. The same is true for water. In the past few years, large companies that have taken the lead in the field of AI big models have faced a significant increase in water consumption. In June 2023, Microsoft released its 2022 Environmental Sustainability Report, in which water use increased significantly by more than 20%. Google is similar. Some researchers believe that the development of AI is the main reason for the sharp increase in water consumption of these technology giants - water cooling systems are the most common choice to cool the chips that heat up crazily. The data center that provides the hardware foundation for AI is like a huge "electric water heater." How can we prevent the lost heat from being wasted? The easiest way to think of and implement is heat recovery technology. For example, the heat recovered from the data center can be used to provide hot water for civilian use and heating for civilian use in winter. Some companies have already started to recycle waste heat for reuse, such as China Mobile Harbin Data Center and Alibaba Qiandao Lake Data Center. This is probably a solution, but it does not fundamentally solve the problem. The AI industry is developing at a speed that no other industry in human history can match. Balancing the development of AI technology with environmental sustainability may be one of our important issues in the next few years; the complex relationship between technological progress and energy consumption has never appeared so urgently before humans. This article is supported by the Science Popularization China Starry Sky Project Produced by: China Association for Science and Technology Department of Science Popularization Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd. Special Tips 1. Go to the "Featured Column" at the bottom of the menu of the "Fanpu" WeChat public account to read a series of popular science articles on different topics. 2. Fanpu provides a function to search articles by month. Follow the official account and reply with the four-digit year + month, such as "1903", to get the article index for March 2019, and so on. Copyright statement: Personal forwarding is welcome. Any form of media or organization is not allowed to reprint or excerpt without authorization. For reprint authorization, please contact the backstage of the "Fanpu" WeChat public account. |
<<: I’ve tried it for everyone and found that the “20-minute park effect” really works!
>>: World Quantum Day | You read that right! Lasers can really cool particles!
There is no doubt that the topic of mini programs...
Snapchat is finally going public. The company has...
Zhou Zhou is a beauty-loving lady. She is well-pr...
Flirting with girls is actually a feeling. Succes...
Strip away the outer shell of technology and para...
"We basically stay underwater all year round...
Living in the fast-paced modern life, from mental...
Mixed knowledge, Specially designed to cure confu...
Statistics show that the number of people who die...
The original intention of writing this article is ...
Recently, I have received messages from some mark...
After two years of development, most of the newly...
The LC3 (LinuxCon + ContainerCon + CloudOpen) con...
Editor's Note: Since its listing , Bilibili (...
On October 4, luxury phone maker Vertu launched a...