Google open sources Swift for TensorFlow: Can we finally put Python aside?

Google open sources Swift for TensorFlow: Can we finally put Python aside?

At the TensorFlow Developer Summit in March this year, Google announced the Swift For TensorFlow project and mentioned that the project would be open source in April. Just as April was about to end, Google finally released the source code of Swift For TensorFlow on GitHub.

[[228170]]

When it comes to Swift language, the first thing that comes to mind is Apple. So naturally, Swift For TensorFlow, at first glance, seems to be something that only iOS developers need to care about.

However, in fact, iOS developers do not need to care about Swift For TensorFlow, but machine learning developers need to care about Swift For TensorFlow.

Swift For TensorFlow is not targeted at iOS development

Currently, if you want to integrate machine learning capabilities into iOS applications, you can use the Core ML framework provided by Apple:

The workflow of Core ML is shown in the figure above. The Core ML machine learning model above can be a ready-made one you found on the Internet, or it can be one you developed yourself (usually converted from a model developed based on MXNet, TensorFlow, etc.). Therefore, Core ML actually doesn’t care whether your model is written in Python plus TensorFlow, Swift plus TensorFlow, or even in MXNet. Ultimately, iOS applications call models in Core ML format through the Core ML framework.

Therefore, Swift For TensorFlow is not aimed at iOS development, but to replace Python!

This is not surprising. Although people always associate Swift with Apple, Chris Lattner, the creator of Swift, is actually at Google Brain (by the way, Guido Van Rossum, the creator of Python, left Google for Dropbox at the end of 2012).

Lattner tweeted before Swift was released: "Next month I will be the first and only person with 4 years of experience programming in Swift :-)"

What's wrong with Python?

Although Python is the most popular machine learning language, it actually has quite a few problems in machine learning scenarios:

  1. Deployment is cumbersome and the runtime depends on too many things. First, it is not realistic to bring a lot of Python packages to a mobile application. Second, many companies do not want to deploy a large number of Python packages in their production environments due to operation and maintenance requirements. The current remedy is to use Python to train the model, and rewrite the actual reasoning (application) phase in other languages, such as C++, which results in duplication of work and slows down the development cycle.
  2. Dynamic typing, no compile-time type checking. This results in many errors not being discovered until runtime. In machine learning scenarios, the consequences of this problem are even more serious, because machine learning models often need to be trained and run for a long time. In fact, large Python projects rely heavily on unit testing, which catches many errors. But in machine learning scenarios, unit testing is useless. For ordinary programs, running a unit test may take less than half an hour. If an error is found, just correct it and run it again. For machine learning, the model ran for half a month and reported an error. It was found that it was a coding error. Try to calculate the psychological shadow area at this time.
  3. Concurrency is difficult, and there is the notorious GIL problem . The greedy demand for computing power by machine learning models urgently needs to be alleviated by concurrency.
  4. The performance is too poor. In fact, frameworks like PyTorch have gone to great lengths to remedy Python's performance issues. TensorFlow relies on graph models (see the next section for details) and C++ and CUDA custom operations to circumvent Python's performance issues. Using C++ and CUDA custom operations brings two problems:
  • C++ is a complex language, especially for researchers and data analysts who do not have experience with C++.
  • Using C++/CUDA to customize TensorFlow operations leads to tight coupling with hardware (CUDA means it can only run on Nvidia GPUs), making it difficult to migrate to new hardware. This is especially critical for Google, because in addition to using Nvidia GPUs, Google also has its own TPU.

When using TensorFlow, are you really writing Python?

Let's look at a short TensorFlow code example:

  1. import tensorflow as tf
  2.  
  3. x = tf.placeholder(tf.float32, shape=[1, 1])
  4. m = tf.matmul(x, x)
  5.  
  6. with tf.Session() as sess:
  7. print(sess.run(m, feed_dict={x: [[2.]]}))

The above is legal Python code, but if we look closely at what these codes actually do, we will find that these codes actually build a graph m and then run the graph m through the run method of tf.Session().

The following code may be more obvious. We want to iterate over the dataset dataset. In TensorFlow, we need to write it like this:

  1. dataset = tf.data.Dataset.range(100)
  2. iterator = dataset.make_one_shot_iterator()
  3. next_element = iterator.get_next()
  4.  
  5. for i in range(100):
  6. value = sess.run(next_element)
  7. assert i == value

We can see that we cannot iterate the dataset directly using Python, but we need to build an iterator through the method provided by TensorFlow.

This situation can be compared to using Python to access a SQL database:

  1. t = ( 'RHAT' , )
  2. q = 'SELECT * FROM stocks WHERE symbol=?'  
  3. c. execute (q, t)

Here, we construct SQL request statements and then "execute" these statements through Python. On the surface, you are writing Python, but in fact, the key logic is in the SQL statement. More precisely, you are constructing SQL statements in Python and then running the constructed statements. This is called meta programming.

Similarly, in TensorFlow, you are writing Python on the surface, but in fact the key logic is in the TensorFlow graph. More precisely, you are constructing the TensorFlow graph in Python and then running the constructed graph.

In fact, on Halloween 2017 (October 31st), Google released TensorFlow Eager Execution, which allows you to program directly in Python instead of using Python to metaprogram the TensorFlow graph.

Using Eager Execution, the above two TensorFlow codes can be rewritten as:

  1. import tensorflow as tf
  2. import tensorflow.contrib.eager as tfe
  3.  
  4. # Enable greedy execution mode
  5. tfe.enable_eager_execution()
  6.  
  7. x = [[2.]]
  8. m = tf.matmul(x, x)
  9.  
  10. print(m)
  11.  
  12. dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])
  13.  
  14. dataset = dataset.map(tf.square).shuffle(2).batch(2)
  15. Python-style iterator class
  16. for x in tfe.Iterator(dataset):
  17. print(x)

You see, TensorFlow can be programmed in Python "properly". Why did we go through so much trouble before?

Because of performance.

Machine learning, especially modern complex models, has extremely high computational requirements. TensorFlow graphs can handle greedy computational requirements well, while Python cannot cope with them.

TensorFlow graphs are designed specifically for the needs of machine learning, so they can be well optimized to improve performance. However, performance optimization is not without cost. In order to optimize better, TensorFlow graphs have many assumptions about the model (these assumptions are also limitations in another sense), and also require construction and operation to be carried out in stages (static graph model). This affects the flexibility and expressiveness of the model. Therefore, not supporting dynamic graph models is a major pain point of TensorFlow.

Balance performance and flexibility

TensorFlow Eager Executation supports dynamic graphs, but the performance is poor (remember the performance and GIL issues of Python we mentioned earlier?); regular TensorFlow has good performance but lacks flexibility. So, is there a solution that can take both into account?

The machine learning community has done a lot of exploration in this area.

Traditionally, the performance of interpreters (TensorFlow Eager Executation is essentially an interpreter) can often be improved through JIT. PyTorch attempts to improve performance through Tracing JIT (Tracing Just-in-Time Compilation) (PyTorch, which is based on Python and supports dynamic graph models, suffers from Python performance issues). Simply put, Tracing JIT counts frequently executed operations and compiles them into machine code for execution, thereby optimizing performance. However, Tracing JIT has some problems, including that "unrolled" operations may result in very long traces, may pollute the trace and make debugging difficult, and cannot use "future code" for optimization, etc.

Therefore, TensorFlow finally chose the path of code generation. That is, it analyzes the dynamic graph model code and automatically generates the corresponding TensorFlow graph program. And it is this choice that led to Python's elimination.


Graph program extraction (yellow box) is the key technology of Swift For TensorFlow

Python has a lot of dynamic features that make it impossible to reliably analyze it statically.

Then, there are only two options:

  1. The Python language is tailored to obtain a subset that is convenient for static analysis.
  2. Change language.

In fact, Google open-sourced a Tangent project in 2017. At that time, Tangent was developed to solve the problem of automatic differentiation, which also relies on code analysis, and Python is difficult to analyze. However, Python classes are highly dependent on dynamic features and are difficult to support on such a subset. If even abstract levels such as classes are not supported, it is basically completely unlike Python.

So, just change the language.

By the way, TensorFlow chose the route of generating code and compiling after static analysis, but in fact, code generation does not necessarily require the use of a compiler. The Lightweight Modular Staging (LMS) technology proposed in 2010 can support code generation at runtime without a compiler. However, under the LMS technology, support for control flow requires some exotic features that are only supported by very few languages ​​​​(such as Scala). So even if LMS is used, Python still needs to be replaced. The reason why TensorFlow did not choose LMS is not only because there are very few languages ​​​​that can perform LMS, but also because LMS requires user intervention. For example, in Scala, data types need to be explicitly wrapped in Rep types to support LMS.

Why Swift?

In fact, although there are so many programming languages, the range of choices is not large.

First of all, the language ecosystem is very important. Choosing a language is actually choosing the ecosystem of this language.

, including development environment, debugging tools, documentation, tutorials, libraries, and users. This rules out the options of creating a new language and using most academic languages.

Then, dynamics led to a large number of languages ​​being eliminated. As mentioned before, Python's large number of dynamic features make it difficult to reliably perform static analysis. Similarly, dynamic languages ​​such as R, Ruby, and JavaScript are also excluded.

Even static languages ​​like TypeScript, Java, C#, and Scala are not acceptable, because dynamic dispatch is very common in these languages. Specifically, the main abstract features of these languages ​​(classes and interfaces) are actually based on highly dynamic construction. For example, in Java, Foo foo = new Bar();foo.m() calls the m method of the Bar class, not the m method of the Foo class.

Google's own Go language also has this problem. Go's interfaces are also dynamically distributed. Moreover, Go does not have generics. Of course, Go's map type has some generic-like features built into the language. If TensorFlow uses the Go language, Tensor must also be built into the Go language like map. However, the Go language community advocates lightweight design, and building Tensor into the Go language goes against its core concept.

[[228172]]

On the same day that Swift For TensorFlow was released, Go released a new logo design

So, the remaining options are few and far between:

  • C++
  • Rust
  • Swift
  • Julia

C++ is complex and has a bad reputation for having too many undefined behaviors. In addition, C++ relies heavily on C macros and template metaprogramming. Rust has a very steep learning curve. Julia is an interesting language. Although it is dynamic, Julia has a lot of black technology for type specialization, so it may be able to support TensorFlow graph feature extraction. However, Julia's community is smaller than Swift, and the creator of Swift is at Google Brain, so TensorFlow finally chose Swift.

Of course, it should be pointed out that Swift's classes are also highly dynamic, but Swift has static structures such as sturt and enum, and these structures also support generics, methods, and protocols (Swift's protocols provide interface-like features and mixins). This allows Swift to be reliably statically analyzed and have usable high-level abstractions.

Also, remember the shortcomings of Python we mentioned earlier? Let’s see how Swift performs in these aspects:

  1. Easy deployment. Swift can be compiled into machine code, and ML models written in Swift can be compiled into simple and easy-to-deploy .o/.h files.
  2. Static typing provides compiler checking. On the other hand, static typing also allows IDE to prompt errors more intelligently. This is extremely helpful in actual programming.
  3. Swift does not yet support concurrency at the language level, but it works well with pthreads. And Swift is about to add concurrency support at the language level.
  4. Swift has good performance and low memory requirements. Since Swift is widely used on mobile devices, the Swift community attaches great importance to performance optimization. After explicit memory ownership support is added, Swift can replace C++ in many scenarios. Swift is based on LLVM (don't forget that the father of Swift is also the father of LLVM), and can directly access the underlying LLVM, which can generate GPU cores for Nvidia and AMD graphics cards. Therefore, in the future, customizing TensorFlow operations based on Swift will also be an advantage of Swift For TensorFlow.

Of course, the machine learning community has accumulated a lot of Python components, so Swift For Python also provides Python interoperability. For example, the following code shows how to access Python's numpy library in Swift (commented as Python code):

  1. import Python
  2.  
  3. let np = Python.import( "numpy" ) // import numpy as np
  4. let a = np.arange(15).reshape(3, 5) // a = np.arange(15).reshape(3, 5)
  5. let b = np.array([6, 7, 8]) // b = np.array([6, 7, 8])

Is Python about to die?

Finally, let’s take a brief look at the future of Python.

Think about what scenarios Python is mainly used for now?

Teaching. Yes, Python is suitable as a teaching language, but being suitable as a teaching language is far from enough. Older readers may still remember the Logo language (operating a turtle to draw pictures) taught in microcomputer class (yes, computers were still called "microcomputers" at that time). How many people still use it now? The once popular Pascal language was developed for teaching. How many people still use it now? Moreover, the choice of teaching language is largely affected by the popularity of the language, not the other way around. One of the reasons why the MIT introductory computer science and programming course, which caused a sensation at the time, changed from scheme to python was that python was more popular.

Tools. Python is good for writing small tools because Python's standard library is excellent, writing is not verbose, small tools do not consider performance issues, the project size is small, and the lack of compile-time type detection is not a big problem. However, this area has been gradually eroded by Go, because Go's standard library is also excellent, it is also very concise to write, has better performance than Python, and is more convenient to deploy than Python.

Web development. Python's web development is actually more due to Python's popularity, many libraries, and easy recruitment, rather than its real suitability for web development. In terms of web development, Python has too many competitors. The old PHP is still vibrant, and PHP 7 has fixed many defects that have been criticized, and giants like Facebook have developed supporting tools for it. Ruby is still very popular. The popular Python web development framework Flask actually borrowed the design of Ruby's Sinatra framework. High-performance web development increasingly emphasizes high IO and non-blocking. Node.js and Go perform well in this regard, and Netty and Vert.x have also appeared in the Java community (who can compare with Java in terms of more libraries and easier recruitment?). Therefore, Python really has no advantage in this regard.

Scientific computing. Currently, Python still has a significant advantage in this area. However, once upon a time, Fortran also dominated the field of scientific computing. How many people still use it now? Moreover, due to Python's performance issues, in fact, a large number of Python's scientific computing libraries rely heavily on C or C++ at the bottom layer. If they are transferred one day, they will be much faster than Fortran back then.

Machine learning. It has to be said that the wave of AI and ML has given Python a shot in the arm. Because scientific computing is relatively less popular. In contrast, R is also very popular in the field of statistical analysis, but it has never been as popular as Python. This is because R is not suitable for writing tools and developing the web. However, the emergence of Swift For TensorFlow shows that Python's position as the mainstream language for machine learning is not stable (Python's popularity in the field of machine learning is more due to its accumulation and popularity in scientific computing, rather than its real suitability for modeling machine learning problems).

Therefore, the decline of Python is very likely~

<<:  Do users uninstall the app as soon as it is pushed? Pay attention to these 3 key points when doing push

>>:  A rough and fast Android full-screen adaptation solution

Recommend

Android 7.0 Nougat's five biggest flaws: no support for floating windows

[[170700]] Introduction: Currently, Android 7.0 N...

Tips for promoting new Pinduoduo events!

The author of this article sorted out Pinduoduo&#...

Ma Da Ge amateur internet celebrity incubation course

Introduction to the resources of Ma Da Ge amateur...

The calm before the outbreak? Comprehensive thinking about "Mini Programs"

Since WeChat announced the upcoming release of Mi...

How to do Tik Tok short video marketing in 2019?

Tik Tok was launched on September 26, 2016. After...

How to enter the R&D department of BAT?

[[145327]] I received offers from BAT and NetEase...

Why is KOL marketing so difficult?

If you want to do a marketing campaign that will ...

AppBase new applications (April 8) | "iTel Money Maker" takes the lead

All new apps on AppBase are in the entrants quadr...

Zhu Yonghai's Slow Bull Lecture on Quantitative Science Opens - Issue 21

Course Catalog Lao Zhu's 21st main course The...