Alternative ideas for hot updating of Node.js Web application code

[[135563]]

background

I believe that students who have used Node.js to develop web applications must have been troubled by the problem that the newly modified code must be updated after restarting the Node.js process. Students who are used to using PHP development will find it very difficult to use, and they will say that PHP is the best programming language in the world. Manually restarting the process is not only very annoying and repetitive work, but when the application scale is slightly larger, the startup time gradually becomes non-negligible.

Of course, as a programmer, no matter which language you use, you will not let such things torture yourself. The most direct and universal way to solve this kind of problem is to monitor file modifications and restart the process. There are many mature solutions for this method, such as the abandoned node-supervisor, the popular PM2, or the lightweight node-dev.

This article provides another approach that only requires a small modification to achieve true zero-restart hot update code and solve the annoying code update problem when developing Web applications with Node.js.
Overall idea

Speaking of hot code updates, the most famous one is the hot update function of Erlang language. The characteristics of this language are high concurrency and distributed programming, and its main application scenarios are similar to securities trading, game servers and other fields. These scenarios more or less require the service to have the means of operation and maintenance during operation, and hot code updates are a very important part of them, so we can first briefly understand the approach of Erlang.

Since I have never used Erlang, the following content is all hearsay. If you want to have an in-depth and accurate understanding of Erlang's code hot update implementation, it is best to consult the official documentation.

Erlang's code loading is managed by a module called code_server. Except for some necessary code at startup, most of the code is loaded by code_server.

When code_server finds that the module code has been updated, it will reload the module. New requests thereafter will be executed using the new module, while existing requests that are still being executed will continue to be executed using the old module.

After the new module is loaded, the old module will be labeled old, and the new module will be labeled current. When the next hot update is performed, Erlang will scan and kill the old module that is still executing, and then continue to update the module according to this logic.

Not all code in Erlang allows hot updates. For example, basic modules such as kernel, stdlib, and compiler are not allowed to be updated by default.

We can find that Node.js also has a module similar to code_server, namely the require system, so Erlang's approach should also be tried on Node.js. By understanding Erlang's approach, we can roughly summarize the key issues in solving code hot update in Node.js:

How to update module code

How to use the new module to handle requests

How to release resources of old modules

Then we will analyze these problems one by one.

How to update module code

To solve the problem of module code update, we need to read the module manager implementation of Node.js and directly link module.js. Through a simple reading, we can find that the core code lies in Module._load. I will post the code a little bit more simply.

 // Check the cache for the requested file.  
 // 1. If a module already exists in the cache: return its exports object.  
 // 2. If the module is native: call `NativeModule.require()` with the  
 // filename and return the result.  
 // 3. Otherwise, create a new module for the file and save it to the cache.  
 // Then have it load the file contents before returning its exports  
 // object.  
 Module._load = function(request, parent, isMain) {
 var filename = Module._resolveFilename(request, parent);
 var cachedModule = Module._cache[filename];
 if (cachedModule) {
 return cachedModule.exports;
 }
 var module = new Module(filename, parent);
 Module._cache[filename] = module;
 module.load(filename);
 return module.exports;
 };
 require.cache = Module._cache;

It can be found that the core is Module._cache. As long as the module cache is cleared, the module manager will reload the latest code the next time you require it.

Write a small program to verify

 // main.js  
 function cleanCache (module) {
 var path = require.resolve(module);
 require.cache[path] = null ;
 }
 setInterval(function () {
 cleanCache( './code.js' );
 var code = require( './code.js' );
 console.log(code);
 }, 5000 );
 // code.js  
 module.exports = 'hello world' ;

Let's execute main.js and modify the content of code.js. We can find that in the console, our code has been successfully updated to the latest code.

07a62b80376c32fc50100976e1e045dc141aa54b.jpg

Now that the module manager code update problem has been solved, let's see how we can actually execute the new module in the web application.

How to use the new module to handle requests

In order to better suit everyone's usage habits, we will take Express as an example to explain this issue. In fact, similar ideas can be applied to most web applications.

First of all, if our service is like Express DEMO, all the code is in the same module, we cannot hot load the module.

 var express = require( 'express' );
 var app = express();
 app.get( '/' , function(req, res){
 res.send( 'hello world' );
 });
 app.listen( 3000 );

To implement hot loading, we need some basic code that cannot be hot-updated to control the update process, just like the basic library that is not allowed in Erlang. And if operations like app.listen are re-executed, it is not much different from restarting the Node.js process. Therefore, we need some clever code to isolate the frequently updated business code from the infrequently updated basic code.

 // app.js basic code  
 var express = require( 'express' );
 var app = express();
 var router = require( './router.js' );
 app.use(router);
 app.listen( 3000 );
 // router.js business code  
 var express = require( 'express' );
 var router = express .Router();
 // The middleware loaded here can also be automatically updated  
 router.use(express. static ( 'public' ));
 router.get( '/' , function(req, res){
 res.send( 'hello world' );
 });
 module.exports = router;

Unfortunately, after this process, although the core code is successfully separated, router.js still cannot be hot-updated. First, due to the lack of a trigger mechanism for updates, the service cannot know when to update the module. Second, the app.use operation will always save the old router.js module, so even if the module is updated, the request will still be processed by the old module instead of the new one.

To improve it further, we need to make some adjustments to app.js, start file monitoring as a trigger mechanism, and solve the caching problem of app.use through closures.

 // app.js  
 var express = require( 'express' );
 var fs = require( 'fs' );
 var app = express();
 var router = require( './router.js' );
 app.use(function (req, res, next) {
 // Use the closure feature to get the latest router object to avoid app.use caching the router object  
 router(req, res, next);
 });
 app.listen( 3000 );
 // Listen for file modifications and reload code  
 fs.watch(require.resolve( './router.js' ), function () {
 cleanCache(require.resolve( './router.js' ));
 try {
 router = require( './router.js' );
 } catch (ex) {
 console.error( 'module update failed' );
 }
 });
 function cleanCache(modulePath) {
 require.cache[modulePath] = null ;
 }

If you try to modify router.js again, you will find that our code hot update has taken shape, and new requests will use the latest router.js code. In addition to modifying the return content of router.js, you can also try to modify the routing function, which will also be updated as expected.

Of course, to achieve a perfect hot update solution, more improvements need to be made in combination with one's own solution. First, in the use of middleware, we can declare some middleware that does not need hot update or does not want to be repeated for each update at app.use, and declare some middleware that we hope to be able to modify flexibly at router.use. Secondly, file monitoring cannot only monitor routing files, but also monitor all files that need hot update. In addition to file monitoring, you can also combine the extended function of the editor to send a signal to the Node.js process when saving or access a specific URL to trigger the update.

#p#

How to release resources of old modules

To explain how to release the resources of old modules, we actually need to understand the memory recycling mechanism of Node.js. This article does not intend to describe it in detail. There are many articles and books explaining the memory recycling mechanism of Node.js. Interested students can read more on their own. To sum up briefly, when an object is not referenced by any object, the object will be marked as recyclable and the memory will be released during the next GC process.

So our topic is how to ensure that no object retains the reference to the module after the code of the old module is updated. First, let's take the code in the section How to Update Module Code as an example to see what problems will occur if the old module resources are not recycled. To make the result more obvious, we modify code.js

 // code.js  
 var array = [];
 for (var i = 0 ; i < 10000 ; i++) {
 array.push( 'mem_leak_when_require_cache_clean_test_item_' + i);
 }
 module.exports = array;
 // app.js  
 function cleanCache (module) {
 var path = require.resolve(module);
 require.cache[path] = null ;
 }
 setInterval(function () {
 var code = require( './code.js' );
 cleanCache( './code.js' );
 }, 10 );

OK~ We have used a very clumsy but effective method to increase the memory usage of the router.js module. When we start main.js again, we will find that the memory usage has increased significantly, and Node.js will prompt process out of memory in less than a moment. However, if we actually observe the code of app.js and router.js, we don’t find any references to the old module.

We can quickly locate the problem with the help of some profile tools such as node-heapdump. In module.js, we found that Node.js automatically adds a reference to all modules.

 function Module(id, parent) {
 this .id = id;
 this .exports = {};
 this .parent = parent;
 if (parent && parent.children) {
 parent.children.push( this );
 }
 this .filename = null ;
 this .loaded = false ;
 this .children = [];
 }

Therefore, we can adjust the cleanCache function accordingly to remove this reference when the module is updated.

 // app.js  
 function cleanCache(modulePath) {
 var module = require.cache[modulePath];
 // remove reference in module.parent  
 if (module.parent) {
 module.parent.children.splice(module.parent.children.indexOf(module), 1 );
 }
 require.cache[modulePath] = null ;
 }
 setInterval(function () {
 var code = require( './code.js' );
 cleanCache(require.resolve( './code.js' ));
 }, 10 );

Execute it again. This time it is much better. The memory will only increase slightly, indicating that the resources occupied by the old module have been correctly released.

After using the new cleanCache function, there will be no problem with regular use, but it does not mean you can sit back and relax. In Node.js, in addition to the require system adding references, event monitoring through EventEmitter is also a commonly used function, and EventEmitter is very likely to have mutual references between modules. So can EventEmitter release resources correctly? The answer is yes.

 // code.js  
 var moduleA = require( 'events' ).EventEmitter();
 moduleA.on( 'whatever' , function () {
 });

When the code.js module is updated and all references are removed, moduleA will be automatically released as long as it is not referenced by other unreleased modules, including our event listeners inside it.

There is only one abnormal EventEmitter application scenario that cannot be handled by this system, that is, code.js will listen to the events of a global object every time it is executed, which will cause events to be mounted on the global object continuously. At the same time, Node.js will quickly prompt that too many event bindings are detected, suspected of memory leaks.

So far, we can see that as long as we handle the references that Node.js automatically adds for us in the require system, resource recycling of old modules is not a big problem. Although we cannot achieve such fine-grained control as Erlang to scan the remaining old modules in the next hot update, we can solve the problem of releasing old module resources through reasonable avoidance measures.

In Web applications, there is another reference problem, that is, unreleased modules or core modules have references to modules that need hot updates, such as app.use, which results in the inability to release the resources of the old modules, and the inability to correctly use the new modules to process new requests. The solution to this problem is to control the exposed entry of global variables or references, and manually update the entry during the hot update process. For example, how to use the new module to process the encapsulation of the router in the request is an example. Through the control of this entry, no matter how we reference other modules in router.js, they will be released when the entry is released.

Another issue that may cause resource release is operations such as setInterval, which will keep the object's life cycle from being released. However, we rarely use this type of technology in Web applications, so it is not considered in the solution.

end

So far, we have solved the three major problems of hot code updates in Node.js web applications. However, since Node.js itself lacks an effective scanning mechanism for retained objects, it cannot 100% eliminate the problem of resources of old modules that cannot be released due to setInterval. Due to this limitation, the YOG2 framework we currently provide mainly applies this technology to the development and debugging period to achieve rapid development through hot updates. Code updates in the production environment still use restarts or PM2's hot reload function to ensure the stability of online services.

Since hot updates are actually closely related to the framework and business architecture, this article does not provide a general solution. For reference, let me briefly introduce how we use this technology in the YOG2 framework. Since the YOG2 framework itself supports the splitting of front-end and back-end subsystem Apps, our update strategy is to update the code at the App granularity. At the same time, since operations like fs.watch will have compatibility issues, some alternatives such as fs.watchFile will consume more performance, so we combined the test machine deployment function of YOG2 to inform the framework that the App code needs to be updated by uploading and deploying new code. While updating the module cache at the App granularity, the route cache and template cache will be updated to complete the update of all code.

If you are using a framework like Express or Koa, you only need to follow the methods in this article and modify the main route according to your business needs to apply this technology well.

<<: Software Engineer Entrepreneurship Trap - Taking Private Jobs

>>: Top 10 new features in Android M that Google didn't mention

How can I tell the authenticity of the wine I bought for my dad?

[Creative Cultivation Program] "From fish to humans" has more evidence, is the human middle ear transformed from fish gills?

Author: Liu Sen Recently, there is a very interes...

Alternative ideas for hot updating of Node.js Web application code

How can I tell the authenticity of the wine I bought for my dad?

Huawei announces new flagship Honor 6 Plus: rear dual camera

What does Doraemon, who has big buck teeth, hide in his pocket?

Do you really know how to use Apple iCloud? You may not know all of these 7 hidden features

How will Japanese cars end after Takata airbag bankruptcy?

Buffaloes "lie down" and refuse to plow the land? In fact, cows have two faces in private...

Brand promotion: How to make data brands generate value!

Don’t be afraid, talking about money with users is not that scary!

OPPO Wi-Fi 6 Router AX5400 Review: Good-looking and powerful

Let’s talk about how to customize the appearance of symbol images in SwiftUI

Recommend

Getting Started Guide to Short Video Promotion!

Will the body feel pain when cancer strikes? Not necessarily!

Daytime Research Society C4D Creative Design Course Gray Day - C4D IP Character Binding

Gastric acid is so strong, why does drinking dirty water cause diarrhea? | Ronggelao Ke

A female programmer used code to send humans from the earth to the moon

Guangzhou Hotel Management Mini Program, How to Make a Hotel Accommodation Mini Program?

Are you a seventh-stage programmer?

8 ways to acquire customers for B2B products!

Which is better, vitamin C that costs 2 yuan or vitamin C that costs hundreds of yuan?

There are less than 400 of them. Can this group of "weirdos" in Taihang Mountains dominate the world again?

2016 Christmas Advertising Video Collection 2: Warmth makes winter no longer cold!

What medicine should I take for depression? Backlinks are a very important factor in getting good search engine rankings

Bilibili product analysis report!

The secret to reducing the cost of information flow video advertising by 50%, proven to be effective!

[Creative Cultivation Program] "From fish to humans" has more evidence, is the human middle ear transformed from fish gills?