Von Neumann once said: "Four parameters can fit an elephant, and five parameters can make the elephant's trunk move!" This statement is intended to criticize the blind fitting of data and emphasize the importance of physical images. On the other hand, AI has shown its prowess in the field of science with massive parameters, and large-scale computing is expected to produce theoretical induction capabilities far beyond the empirical paradigm. Is it to pursue simplification or to admit that there are many differences? These two diametrically opposed research ideas happen to run through the nearly 100-year history of physical organic chemistry. Illustration: Blue Knight Written by Zheng Chao (Researcher at Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences) Background : In the previous article, "A barely passed defense helped chemistry turn over the "darkest chapter", physical chemists from Wilhelmy to van't Hoff and Arrhenius finally found a roadmap in the maze of chemical kinetics after nearly half a century of exploration, creating a paradigm for using physical images to support mathematical equations and study chemical reactions. Chemical reactions are complex, so what form should the mathematical equations that describe chemical reactions have? Simplifying the Complexity: Linear Free Energy Relationships The Arrhenius equation gives the law of the influence of temperature on the reaction rate constant, which is a milestone in the history of chemical kinetics. However, temperature is only an external factor that affects chemical reactions. If you want to carefully describe and deeply understand chemical reactions, especially the kinetic characteristics of complex organic reactions, you still have to start from the internal factors such as the microstructure of the substance and the details of the reaction process. Chemical reactions are caused by the rearrangement of the outer electrons of atoms, and the movement of electrons follows quantum mechanics. Although the theoretical edifice of quantum mechanics has been established in the 1920s, the complexity of chemical reactions has created huge obstacles to its application. As PAM Dirac, one of the founders of quantum mechanics, said: The mathematical laws contained in all chemistry are already completely clear, and the difficulty lies in the fact that the equations obtained by applying these laws cannot be solved. Organic chemists represented by British CK Ingold and R. Robinson did not "wait and see". They absorbed the basic idea of quantum mechanics about atoms sharing outer electrons to form chemical bonds, and constructed a qualitative theory in the 1930s, using a "pattern" composed of dots, lines and arrows to describe the electron transfer in organic reactions; at the same time, they connected with the traditional concepts of organic chemistry, using figurative language such as inductive effect, conjugation effect and steric effect to summarize the influence of organic molecular structure on reaction activity and selectivity. "The magic of application lies in one's heart." Although it may be difficult for beginners to understand and accept this set of formal logic, in the eyes of organic chemistry "masters", the "reaction mechanism" deduced on this basis is enough to interpret all organic reactions on paper. LP Hammett (1894~1987) However, concepts defined by figurative language are often ambiguous. Whether it is the inductive effect, the conjugation effect, or the steric effect, their physical essence is electromagnetic interaction. The basis and purpose of dividing the various "effects" in the molecular world is only to facilitate the understanding and use of human chemists. Can we find a way to quantify the vague but human-friendly concepts and use mathematical equations that are much simpler than quantum mechanics to relate molecular structure and reaction properties? Almost at the same time as Ingold and Robinson's work, on the other side of the Atlantic, LP Hammett of Columbia University in the United States demonstrated a possibility of simplifying the complex. He decoupled the complex influencing factors that were entangled with each other through clever theoretical design, and solved a series of typical organic reaction kinetic problems with only one-variable linear functions! Hammett was a rare physical chemist who cared about organic chemistry. His most important contribution was to connect these two very different secondary disciplines of chemistry. In 1940, Hammett published Physical Organic Chemistry: Reaction Rates, Equilibrium, and Mechanisms, which opened up a wave of research in this emerging field. Hammett invented an acidity function to characterize the acidity of concentrated acid solutions that cannot be measured by ordinary pH values. The basis of the Hammett acidity function is a group of aniline compounds with different substituents. Ionization process of benzoic acid with different substituents: (16) (16) The left and right sides of the equation are the change in activation Gibbs free energy of chemical reaction r caused by the introduction of a substituent X, and the change in reaction Gibbs free energy of chemical equilibrium a. This is an important law of physical organic chemistry - the linear free energy relationship. It predicts (in some cases) that structural changes in the starting molecules, such as the introduction of a substituent X, have proportional effects on the thermodynamics (equilibrium) and kinetics (rate) of the chemical reaction. The Hammett equation is the most important example of a linear free energy relationship. The linear free energy relationship is not a strict conclusion that can be derived from first principles, but an empirical rule. Nevertheless, it is still of great significance. As Hammett himself pointed out: The existence of linear free energy relationships brings a comforting fact - even if there is no theoretical basis to explain the simplicity of chemical reactions, there is no need to think that chemical reactions are hopelessly complicated. Although chemists are usually willing to explain linear free energy relationships through chemical images such as "entropy enthalpy complementarity", there is a more essential mathematical connotation behind this linear relationship. Assuming that the reaction activity fr is a function that is uniquely determined by the properties of the substituent σ, then as long as this functional relationship is not "hopelessly complicated Figure 3. σ parameters of common substituents determined by Hammett (left), ρ values of some organic reactions (middle), and the first linear free energy relationship diagram (right). Image source: J. Am. Chem. Soc. 1937, 59, 96. Therefore, if a certain linear free energy relationship holds true for a class of organic reactions, it means that there are certain substituent properties in this type of reaction, which play a deterministic role in the reaction kinetics. For Hammett's σ parameter, the substituent properties it describes can generally be understood as the sum of the inductive effect and a part of the conjugation effect. Hammett did not consider the substituents at the ortho-position (o) of benzoic acid in the process of establishing the σ parameter, precisely because he realized that the steric effect of the ortho-position substituent has a non-negligible influence on the ionization equilibrium of benzoic acid. Excluding them, the steric effect and the inductive/conjugation effect are separated, achieving the purpose of using the simplest univariate linear function to describe the law of change of the reaction rate constant. Hammett's pioneering work led the first wave of historical trends in parameterizing and quantifying organic chemistry. His followers proposed a wide variety of substituent parameters, some of which are specifically used to characterize the steric effect or conjugation effect of the substituent, some are specifically aimed at a certain type of reaction, and some are specifically used to describe the properties of the solvent... These substituent parameters provide a new tool for the study of the mechanism of organic reactions. With them, organic chemists can use kinetic experiments (usually measuring the ratio of rate constants of a series of similar reactions) to make up for the shortcomings of spectroscopic characterization methods. Through the positive and negative values and size of the ρ value in the Hammett equation, it is possible to peek into the structural characteristics of transient intermediates that are difficult to separate and identify in the reaction. For example, the famous "non-classical carbon cation" was first inferred by S. Winstein and others at the University of California, Los Angeles, who observed abnormal results in the kinetic experiment of the solvolysis reaction of 2-norbornylbenzenesulfonate. The linear free energy relationship represented by the Hammett equation is still a required content for senior undergraduate and graduate courses in organic chemistry today. For an organic chemistry graduate student who works in front of a fume hood all day, it would be a delightful moment if you could measure a Hammett plot like the one in Figure 3 by yourself! Chinese scientists have also made outstanding contributions to the research field of linear free energy relationships. The physical organic chemistry research team led by Mr. Jiang Xikui from the Shanghai Institute of Organic Chemistry of the Chinese Academy of Sciences made outstanding contributions to the research of linear free energy relationships in the international arena from the 1980s to the 1990s. The pinyin abbreviation of Mr. Ji Guozhen’s surname. Free radicals are a common intermediate in organic reactions. Unlike closed-shell intermediates such as carbon cations, carbon anions, and singlet carbenes, free radical species have unpaired electrons, so they have spin delocalization effects that closed-shell species do not have. How to quantitatively evaluate the influence of substituent properties on the spin delocalization effect of free radicals is an important scientific issue at the intersection of physical organic chemistry and free radical chemistry. Jiang Xikui and others cleverly used a two-parameter linear free energy relationship to provide an answer to this question. Jiang Xikui was born into a prominent family in Jinling and received a good education since childhood. After graduating from St. John's University in Shanghai, he went to the United States to study in 1948 and received a doctorate from the University of Washington in 1952. In 1955, Jiang Xikui gave up his high-paying job in an American company and returned to New China despite numerous obstacles. After returning to China, he worked at the Institute of Chemistry, Chinese Academy of Sciences and the Shanghai Institute of Organic Chemistry. While working at the MW Kellogg Company in the United States, Jiang Xikui discovered that trifluorochloroethylene can react with sulfur trioxide to form β-sultone, overturning the traditional understanding that polyfluoroethylene cannot undergo electrophilic reactions. This reaction laid the foundation for the subsequent synthesis of a series of fluorine-containing functional molecules. Thanks to his deep accumulation in the field of fluorine-containing olefin chemistry, Jiang Xikui led the research work on fluororubber in my country in the 1960s, prepared a variety of fluororubber products, broke the blockade of Western countries on this key military material, and made contributions to my country's national defense industry. During the development of fluororubber, Jiang Xikui discovered that trifluorostyrene would dimerize under heating conditions to generate diphenylhexafluorocyclobutane, and the reaction went through a 1,4-diradical intermediate. Starting from this discovery from applied research, Jiang Xikui relied on keen academic insight and unremitting efforts to make internationally recognized basic research results. Figure 4. (Left) Mr. Jiang Xikui's first paper on the free radical spin delocalization parameter σ•JJ; (Right) Mr. Jiang Xikui (second from right) discussing work with his assistants, Mr. Ji Guozhen (first from right). Image source: Acta Chimica Sinica, 1984, Vol. 42, No. 6, p. 599 (left); Reference 29 (right) For a long time, there has been controversy in the academic community about how to distinguish the effects of polarity and spin delocalization on the reactivity of free radicals. The substituent parameters reported in the literature cannot correctly describe the contribution of the spin delocalization effect. Jiang Xikui realized that the dimerization reaction of trifluorostyrene is an excellent platform for studying the influence of substituent properties on the reactivity of free radicals. He proposed a method to completely separate the polarity effect and spin delocalization effect in the same reaction system. First, the para-position with It has been widely recognized by international physical organic chemistry peers. In 2002, the research project led by Jiang Xikui, "Two Important Aspects of the Frontier Field of Physical Organic Chemistry - Research on Organic Molecular Clusters and Free Radical Chemistry", won the first prize of the National Natural Science Award. This was the first time that the award was awarded again after four consecutive years of vacancy, and it was also the first time that the basic theoretical research results of organic chemistry won the highest national science and technology award. Many and Different: Embracing the Invisible Equation The reason why the Hammett equation is widely used in organic chemistry is not only because of its simple mathematical form, but also because it attributes complex reaction kinetics problems to a single variable that conforms to chemical thinking, providing chemists with confidence and basis for understanding and regulating reaction activity. Along this line of thought, if chemical knowledge tells us that the kinetic behavior of a certain type of reaction is affected by two independent factors, then establishing a two-parameter regression equation is a very natural choice, just as we have in the free radical spin The subscripts F/x/a/R in formula (19) represent field effect, inductive effect, polarization effect, and resonance effect, respectively. However, this approach puts us in a dilemma. This is because adding parameters will inevitably sacrifice the chemical image of the equation and reduce the interpretability of the model (just as the problem encountered by many k ~ T relationships before the establishment of the Arrhenius formula). At the same time, it will bring the risk of "overfitting": even if the final result is very accurate in terms of numerical value, it may be difficult to distinguish whether the fitted equation correctly describes the scientific law or merely records the illusion caused by biased data samples. Regarding numerical fitting, the famous physicist F. Dyson once told an interesting story in his later years. In 1953, Dyson was still a young theoretical physicist at Cornell University. He used the pseudoscalar meson theory to calculate the scattering cross-sections of protons and mesons, and the results were very consistent with the experimental values of Fermi (E. Fermi). Dyson was overjoyed and hurried to Chicago to show his results to this predecessor. Unexpectedly, Fermi hardly looked at the manuscript handed over. He kindly asked Dyson to sit down and said calmly: "There are two calculation methods in theoretical physics. One is what I prefer: the calculation process has a clear physical image. The other has a precise and self-consistent mathematical form. Your calculation does not touch either side." Dyson was stunned but still had the courage to ask Fermi why the pseudoscalar meson theory was not a self-consistent mathematical form. After getting the answer, Dyson asked again in despair, how could his calculated values match the experimental values so well? Fermi asked back: "How many arbitrary parameters did you introduce in your calculations?" Dyson replied that it was four, and Fermi then said a famous saying: "My friend John von Neumann once said that he could fit an elephant with four parameters, and that he could make the elephant's trunk move with five parameters!" Dyson understood the implicit meaning and changed his research direction after finishing this work. He later recalled: "In just a few minutes, Fermi politely and ruthlessly destroyed the research plan that my students and I had been working on for many years. If it weren't for him, we might have been wandering in vain on the wrong path for several years. ... Looking back fifty years later, we can clearly see that Fermi was right. The key discovery to explain the strong interaction is quarks. Mesons and protons are composed of quarks. Before Gell-Mann (M. Gell-Mann) discovered quarks, any theory of strong interactions could not be sufficient. Fermi knew nothing about quarks and died before the discovery of quarks. But as early as the 1950s, Fermi had realized that the meson theory at the time was missing a key piece of the puzzle. Physical intuition told him that the pseudoscalar meson theory could not be correct. Therefore, it was Fermi's intuition, not the inconsistency between theory and experiment, that saved me and my students from the dead end." Figure 5. In 2010, someone used four complex parameters to fit the image of an elephant, and used a fifth complex parameter to make the elephant's trunk shake. Image source: Am. J. Phys. 2010, 78, 648. Perhaps it was because of the bottlenecks in both the interpretability of the model and the effectiveness of numerical fitting that the research craze for parameterizing and quantifying the kinetic characteristics of organic reactions based on linear free energy relationships died down in the 1980s and 1990s. Multi-parameter linear free energy equations did not bring new breakthroughs to physical organic chemistry. Among the numerous Hammett-type quantitative structure-activity relationships, the most popular among chemists is still the simplest form of equation (15). After all, being able to "translate" mathematical relationships into reasonable chemical images is the most reassuring. However, the kinetic behavior of organic reactions is ultimately a complex problem, and single-variable linear equations are destined to be unable to meet the needs of all occasions. To solve this dilemma, new ideas must be introduced. PW Anderson, the winner of the 1977 Nobel Prize in Physics, once said in his evaluation of condensed matter physics: "More is different." The original meaning of this sentence is that there are different levels of material structure, and each level will have unique properties and laws. The complexity of the material world increases with the expansion of the structural scale, so the philosophy of reductionism cannot guarantee the success of constructivism: even if the phenomena of the operation of all things can be reduced to the laws of a few basic particles, we cannot reconstruct the entire universe based on these laws alone. Looking at the intrinsically complex chemical kinetics problem from this perspective, can we explore a path that is somewhat different from the tradition of physical organic chemistry - abandoning the expectation of intuitive chemical images and no longer pursuing concise and analytical mathematical relationships; at the same time, introducing as many variables as possible, and writing the reaction activity fr as a function of a series of property parameters {σ} The task of selecting {σ} and determining the mathematical form of f is left to data fitting. This idea seems quite disturbing at first, after all, the "elephant shaking its trunk" is still vivid in our minds. Moreover, the success of the Arrhenius formula and the Hammett equation in history was achieved by getting rid of the blind fitting of data and being guided by clear chemical images. Without these guidelines, can the laws of organic reaction kinetics really emerge automatically from complex data relationships and "invisible equations"? JN Gray (1944~2012) In January 2007, at the meeting of the Computer Science and Telecommunications Committee of the National Research Council of the United States held in Mountain View, California, the United States, JN Gray, a famous computer scientist working for Microsoft and winner of the 1998 Turing Award, delivered a speech entitled "The Revolution of the Scientific Method". In his speech, he proposed to divide scientific research into four paradigms: empirical (experimental) science, theoretical science, computational science and data science. Gray believes that the journey of scientific research starts with observing and recording natural phenomena. In order to obtain more accurate and universal results, people abstract simplified models from experimental phenomena and construct scientific theories through mathematical equations. When the complexity of scientific theories rises to a level that the human brain cannot cope with, using computers for large-scale calculations becomes another way to explore nature alongside controlled experiments and theoretical deductions. With the continuous upgrading of computer computing power and algorithms, the collection of massive data replaces traditional empirical observations, and large-scale calculations supplement human thinking, which is expected to produce theoretical induction capabilities far beyond the empirical paradigm. This is the "fourth paradigm" of data-intensive scientific discovery advocated by Gray. In addition to being a computer scientist, Gray is also a sailing enthusiast. Half a month after the Mountain View conference, Gray sailed out to sea alone, planning to scatter his mother's ashes in the Farallon Islands near the coast of San Francisco, but failed to return. Months of searching did not find any trace of Gray and his sailboat, and five years later he was declared legally dead by the California District Court. The Mountain View speech became Gray's academic "last words" to the world, and scientific research under the Fourth Paradigm is thriving after his death. In 2016, Google's DeepMind launched the artificial intelligence Go program AlphaGo. It combines the Monte Carlo tree search algorithm with a deep neural network, and improves its strength by learning human chess records and self-playing games. It defeated the top players at the time, Lee Sedol and Ke Jie, in public competitions. In 2018, DeepMind released the artificial intelligence protein structure prediction program AlphaFold (AF), and launched its successor versions AF2 and AF3 in 2020 and 2024. The AF program learns the amino acid sequences of all known proteins and the protein structures that have been determined by X-ray crystallography experiments. It predicts the distance and interaction between amino acid residues based on the Transformer neural network, and gives the predicted structure of the target protein through multiple rounds of iterations. Its accuracy can compete with experimental results. In 2022, DeepMind announced that AF2 has predicted more than 200 million high-level protein structures, covering almost all protein molecules with known amino acid sequences. The leaders of the AF team, D. Hassabis and J. Jumper, shared the 2024 Nobel Prize in Chemistry for their work on protein structure prediction (the other winner of the award was D. Baker, a protein design expert at the University of Washington). So, what about organic reaction kinetics? It should be noted that the primary structure of proteins is completely encoded by one-dimensional amino acid sequences, and the formation of their higher-order structures mainly depends on non-covalent interactions between amino acid residues. Organic reactions involve the breaking and recombination of chemical bonds, and the process details and influencing factors are far more complex than protein folding. It may not be easy to easily "grasp" organic reaction kinetics by relying on artificial intelligence (the author expects to be slapped in the face). Nevertheless, research in this area is in the ascendant, and successful results have been published in top journals. In a sense, these research works can be regarded as the inheritance and continuation of Hammett's efforts to parameterize and quantify organic reactions in the new era. The basic idea can be summarized by equation (20), but the range of material parameters and function forms has been greatly broadened. Not only the Hammett parameter σ, but all the characteristics that characterize the microscopic geometry/electronic structure and macroscopic physical and chemical properties of molecules, whether for the ground state or transition state, whether from experimental determination or theoretical calculation, can be used as descriptors. In terms of models, from simple multivariate linear regression to complex Bayesian optimization and neural network algorithms, they can also find their place. The goals of fitting and prediction are not limited to the kinetic behavior of the reaction, but also include the yield, selectivity, and even the optimal reaction conditions of the target product. Coupled with high-throughput, automated equipment to provide high-quality experimental data... Organic chemists are ambitiously welcoming a paradigm shift in synthetic methodology research. Figure 6. The Bayesian optimization algorithm was used to predict the results of the Mitsunobu reaction and deoxyfluorination reaction of alcohols, and multiple sets of optimal reaction conditions that exceeded expert experience were found. Image source: Nature 2021, 590, 89. While being optimistic, we still need to be cautious. Whether it is the linear free energy relationship represented by the Hammett equation or the artificial intelligence model for organic reaction dynamics, they are essentially incomplete induction. They all try to extract experience or rules from limited experimental facts and hope that they can show generalization ability on unknown samples. The validity of inductive reasoning is a long-standing controversial issue in the history of philosophy. The Scottish philosopher Hume (D. Hume) of the Enlightenment advocated a "skeptical" position, believing that there is no way to follow to prove any conclusion of inductive reasoning. In the early twentieth century, the British philosopher Russell (B. Russell) pointed out the dangers of pure induction with the example of a chicken on a farm. When the chicken associates the farmer's footsteps with feeding it, it will not think that the farmer will slaughter it the next time he comes. Of course, Hume also admitted that even if there is no way to prove the conclusion of inductive reasoning through reason, humans still have to make and believe in this reasoning. Perhaps as researchers of physical organic chemistry, we can temporarily stop entangled in metaphysical speculation and look at and use all the research tools at hand with an open attitude. Facing an uncertain future, let us get involved and wait and see! References [1] EA Guggenheim, J. Chem. Educ. 1956, 33, 544. [2] E. Farber, Chymia 1961, 7, 135. [3] E. W. Lund, J. Chem. Educ. 1965, 42, 548. [4] PW Andersen, Science 1972, 177, 393. [5] J. Shorter, J. Chem. Educ. 1980, 57, 411. [6] MC King, Ambix 1981, 28(2), 70. [7] MC King, Ambix 1982, 29(1), 49. [8] KJ Laidler, J. Chem. Educ. 1984, 61, 494. [9] KJ Laidler, Arch. Rational Mech. 1985, 32, 43 [10] MH Abraham, J. Phys. Org. Chem. 1994, 7, 655. [11] X.-K. Jiang, Acc. Chem. Res. 1997, 30, 283. [12] F. Dyson, Nature 2004, 427, 297. [13] J. Quílez, Bull. Hist. Chem. 2006, 31, 45. [14] G. Nagendrappa, Resonance 2007, 12(5), 21. [15] J. Mayer, K. Khairy, J. Howard, Am. J. Phys. 2010, 78, 648. [16] J. Quílez, Found. Chem. 2019, 21, 221. [17] J. Quílez, Found. Chem. 2021, 23, 85. [18] BJ Shields, J. Stevens, J. Li, M. Parasram, F. Damani, JIM Alvarado, JM Janey, RP Adams, AG Doyle, Nature 2021, 590, 89. [19] E. Callaway, Nature 2022, 608, 15. [20] MH Back, KJ Laidler, Ed. Selected Readings in Chemical Kinetics. Pergamon Press, 1967. [21] P. Coffey, Cathedrals of Science: The Personalities and Rivalries That Made Modern Chemistry. Oxford Academy Press, 2008. [22] T. Hey, S. Tansley, K. Tolle, Eds. The Fourth Paradigm: Data Intensive Scientific Discovery. Microsoft Research, 2009. [23] NE Henriksen, FY Hansen, Theories of Molecular Reaction Dynamics, The Microscopic Foundation of Chemical Kinetics, 2nd Ed. Oxford University Press, 2019. [24] H. Westheimer, A Biographical Memoir of Louis Plack Hammett, National Academy Press, 1997. [25] Gregory D. Walcott and Jiang Xiaoyuan, eds., A Source Book in Chemistry, Chinese Publishing House, 2022. [26] Chen Minbo, Science Bulletin, Vol. 79, No. 3, 2016, p. 196. [27] Zhao Kaihua, Qualitative and Semi-quantitative Physics (Second Edition), Higher Education Press, 2008. [28] Y. I. Soloviev and H. A. Figurovsky, Arrhenius: Life and Activities, translated by Ding You, Commercial Press, 1965. [29] Li Zhanting, Famous Chinese Scientists of the 20th Century: Jiang Xikui, Jincheng Publishing House, 2008.
About the Author Dr. Zheng Chao is a researcher at the Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, and a recipient of the National Natural Science Foundation of China Excellent Young Scientist Fund Project. His research interests are physical organic chemistry and chiral synthesis. Special Tips 1. Go to the "Featured Column" at the bottom of the menu of the "Fanpu" WeChat public account to read a series of popular science articles on different topics. 2. Fanpu provides a function to search articles by month. Follow the official account and reply with the four-digit year + month, such as "1903", to get the article index for March 2019, and so on. Copyright statement: Personal forwarding is welcome. Any form of media or organization is not allowed to reprint or excerpt without authorization. For reprint authorization, please contact the backstage of the "Fanpu" WeChat public account. |
<<: Gastric acid is so strong, why does drinking dirty water cause diarrhea? | Ronggelao Ke
>>: Only by deceiving the fat can you truly lose weight!
Sony's Xperia Z5 Premium is currently the onl...
The company's product line expanded and multi...
The specific SEO optimization strategies are as f...
The world is entering the era of electrification,...
May 23 is the birthday of Carl Linnaeus, the fath...
The picture shows the inner page of "Mathema...
The Spring Festival is undoubtedly the most impor...
Chen Longyu - 10 Lectures on Difficult Hexagrams ...
“A good title is half the work.” If you can choos...
After nearly four years of rapid development, the...
"Pine pollen can fight cancer, prevent aging...
Last Friday, SquareTrade, a third-party smartphon...
2021 will still be a year of rapid development fo...
There are two short holidays in April and May ever...