Exploration and practice of Ctrip Hotel's unified cloud mobile phone platform

Exploration and practice of Ctrip Hotel's unified cloud mobile phone platform

About the Author

The hotel wireless performance R&D team is responsible for the research and development of the hotel wireless team's basic capability platform, such as the Cloud Touch platform (cloud mobile phone), content operation platform, automated testing process, etc. It provides solutions through abstract summaries of daily regular affairs to improve the overall efficiency of the business carried by the platform.

1. Background

There are a large number of departments or teams within Ctrip that need to complete all functional tests in the R&D phase of the new version of the App and the new site, and then conduct unfettered acceptance from the customer's perspective (such as competitive product comparison, localization experience, etc.) in the pre-release phase. For the versions that have been released, our customer service staff also have the demand to use production resources to obtain the same perspective as the customer in occasions such as customer assistance and new employee training. This poses a huge challenge to the applicability of the existing test environment in our R&D process. Whether in terms of operating experience or resource alignment, the test environment is difficult to meet the requirements.

  • For example, in order to meet the overseas acceptance requirements of Trip.com's new site before it goes online, it is necessary to support the ability to release the new functions of the new site to targeted groups in advance.

  • For example, in order to ensure that customer service staff can fully understand the customer's incoming questions when assisting the customer, an App operating platform is needed that supports maintaining a consistent perspective between employees and customers.

2. Full-Scene Construction

After comprehensively evaluating similar demands and the possible impact of platform capabilities, the following figure shows our expectations of the applicable population of the system (hereinafter uniformly named Cloud Touch):

Take the on-site acceptance scenario as an example: Based on the Cloud Touch platform, the equipment management of the acceptance personnel can be unified, and a unified remote operation portal can be provided through the cloud platform. In this way, no matter where our employees are in the world, they can conveniently use the centrally maintained equipment. At the same time, the RC package of the new site will be put on the Cloud Touch in advance to complete the deployment of the version to be accepted.

Take the customer service assistance scenario as an example: based on the customer service workbench, employees are provided with a unified entrance to Cloud Touch, which allows employees to understand the corresponding App version of the guest during the conversation with the guest, and can quickly select the relevant preset real machine in the device pool for scenario identification.

(Schematic diagram above)

3. Technical solutions based on Cloud Touch

3.1 Core Platform Design

Based on the above analysis, the platform needs to solve the allocation and management of device resources covering different application scenarios, realize the centralized distribution of requests from different regions, provide remote control of local devices and real-time synchronization of images, so as to achieve an interactive experience similar to that of a remote desktop. And based on the platform's integration capabilities, it can provide unified preset parameters and environment configurations for different business scenarios, so that the standardization of various tasks can be further improved.

3.2 Equipment pool design

We have a large number of customer service staff seats, as well as test acceptance personnel in the R&D line. A large enough device pool is a hardware condition, but how to effectively use these devices and coordinate the relationship between people and devices in different application scenarios requires a set of allocation strategy designs that meet the core scenarios. The main core processes are as follows:

3.3 Design and implementation of remote device control

After achieving platformization and unified distribution of equipment, the core of the technology lies in how to select and implement an end-to-end remote control solution.

Because the docking technologies of different systems are different, we take the implementation of iOS as an example here. WebDriverAgent is a new iOS mobile testing framework (WDA) launched by Facebook at the 17th SeleniumConf conference. WebDriverAgent implements a WebDriver Server on the iOS side to interact with the browser. Its implementation uses the classic Server-Client architecture (C/S). The client sends a Request and the server returns a Response. With the help of this Server, we can remotely control iOS devices.

WDAClient : A client of WDA based on WebDriverAgent. facebook-wda is the Python client library of WDA, which communicates directly with WebDriverAgent by directly constructing HTTP requests.

WDAServer : The machine that runs the WDA App and implements the WebDriver communication protocol.

Session : The server needs to maintain the client's session. The first request sent by the client is '/session/sessionId/url'. The server opens the corresponding URL address according to the URL, and parses the sessionId into a real value, and then returns it to the client. When the client sends a request to the browser in the future, it will carry the session value.

WebElement : An object in the WebDriverAPI that represents a DOM element on a page.

JsonWireProtocol : It is a web service protocol that communicates with the remote server by using webdriver. It completes the interaction with the remote server through http requests.

Mobile JSON Wire Protocol Specification : Mobile automation protocol.

(Some basic technical descriptions of WDA official are quoted here. If you are interested, you can refer to the Facebook archive project on GitHub for further information)

3.3.1 Instruction Set Adaptation

The client can receive many different types of instructions to complete different actions, mainly including the following:

(1) Basic command communication format (iOS/Android share the same format, but the processing is slightly different. The following uses iOS as an example):

 { "serial":"00008030-000D48A40291802E", // IOS设备udid "type":"M_TOUCH", // 命令类型枚举值"message":{ "action":0, // 鼠标或键盘0按下1松开"keycodeType":"ascii", // 代表键盘事件输入的是ascii码"keyCode":60, // 键盘按下了哪个键非ascii时响应对应系统键"position":{ "x":687, // 鼠标点击事件x像素坐标"y":1116, // 鼠标点击事件y像素坐标} } }

(2) Basic instructions: mouse events (click/slide operations)

  • The front-end page calculates the mouse pixel position x,y and assembles the mouse event command based on the resolution reported by the device and the position of the user's operation on the screen
  • When the client receives the actinotallow=0 command (i.e. when the mouse is pressed), the coordinates of the mouse press and the time of the command are recorded
  • When the client receives the actinotallow=1 command (that is, when the mouse is released), the coordinates of the mouse release and the time of the command are recorded.
  • The client converts the pixel coordinates of the command into UI operation coordinates according to the device scale (the scaling ratio of the IOS device pixels and UIKit), and obtains the starting and ending points of the command. The time difference between pressing and releasing is used as the execution time of the command, and the WDA command is assembled.
  • The URL for requesting WDA is: /wda/swipe. Depending on the starting point, end point, command execution time, and command trigger frequency, the effects of click, long press, double click, and slide can be generated.

(3) Basic instructions: key events

  • The front end records the keys pressed by the user and converts them into ascii codes, assembles keyboard input events, and generates commands continuously by pressing for a long time; the system keys (power, home, menu keys) clicked by the user on the page are also converted into keyboard input events
  • When the client receives actinotallow=0, if it receives an ascii-coded character, it triggers a character input event; if it receives a system key, it assembles the corresponding command to complete the operation
  • Character input events: The /wda/keys interface has a synchronous snapshot mechanism by default, which consumes a lot of time to ensure that the input is in order, with an average response time of 1 character per second. Cloud phones have higher requirements for timeliness, so the WDA snapshot mechanism is deleted, and a queue is used in the Client to merge multiple characters in a short period of time into one string. Calling /wda/keys once can complete the input of multiple characters, so that input can respond in real time.
  • Power button: Request /wda/locked to get the current lock screen status, then call /wda/lock or /wda/unlock to lock and unlock the screen.
  • Home key: request /wda/home to return to the home page
  • Menu key (APP selection page): WDA does not provide a corresponding interface. By assembling the upward swipe command request /wda/dragfromtoforduration, simulate the upward swipe to enter the menu page. Note: /wda/swipe cannot be used here, it has no effect
  • When the client receives actinotallow=1, it means the user has let go and will not respond.

(4) Complex script instructions

  • In addition to the basic operations mentioned above, the Client can also accept more command inputs and support invoking UI automation scripts. The automation scripts will complete more complex instructions, thereby achieving intelligent control and use.
  • Receive the startup app type, user account password, page deeplink, etc., and directly jump to the corresponding page after the app completes the user login
  • Receive app download address, version number, etc., uninstall and install the app, and process pop-up windows and other information

3.4 Design and implementation of remote screen synchronization

Regarding the synchronization of the images, let's first talk about the well-known ffmpeg, which is an open source cross-platform audio and video processing tool that can be used for a variety of audio and video operations such as recording, conversion and streaming media processing. We capture frames, process the data through ffmpeg, and then transcode it to h.264, and push the encoding information to the web-side live broadcast service. The current 30s video is about 30M, and after h.264 transcoding, it is only 3MB. The image stream is currently set to 20 frames per second.

3.4.1 Screen capture

iOS device screen capture process:

(1) WDA mjpegServer

WDA comes with mjpegServer, which will continuously call the screenshot API, compress the screenshot data, assemble it into the mjpeg data stream format and send it to the port of the screen stream.

(2) Screen capture speed/compression quality parameters

WDA mjpegServer can set the screenshot speed and compression quality through parameters. According to the server performance and usage scenarios, adjust FBMjpegServerScreenshotQuality and FBMjpegServerFramerate to get the best effect.

The human eye perceives the number of frames at around 24, so we set FBMjpegServerFramerate to 24 so that users will not feel any lag when using it (the selection of frame rate is described in the fourth section of 3.4.2)

 static NSUInteger FBMjpegScalingFactor = 100; // 截图缩放比,默认100,一般不做修改static NSUInteger FBMjpegServerScreenshotQuality = 25; // 截图压缩质量,范围1-100,默认25。值越大图片质量越好。 static NSUInteger FBMjpegServerFramerate = 24; // 截图输出速度,即帧率,默认10

(3) Client screen acquisition

When the user starts using it, a screen initialization command will be generated and sent to the Client.

The client can obtain a continuous mjpeg image stream by requesting the image stream port through GET.

The obtained picture stream data format is a series of mjpeg pictures separated by --BoundaryString, and each picture can be saved as a jpeg picture separately.

3.4.2 Streaming Media Processing

iOS screen stream to video stream process:

The client mentioned above can get jpeg images one by one through GET request to the image stream port. mjpeg is intra-frame coded and the data is very large. If the image stream data is pushed directly to the server, the bandwidth requirement for the user will be very high, so it needs to be converted to h.264 inter-frame coding.

(1) Client requests the image stream port and captures the image frame by frame

Request the picture stream port through ffmpeg and grab each jpeg picture through the decoder.

(2) H.264 encoding

Pass each captured jpeg image to the ffmpeg encoder, set the parameters, encode it in h.264 and output it to standard output.

Supplement: The frame rate settings of the decoder and encoder need to be slightly higher than the screenshot speed set by WDA to ensure that the picture response is always real-time.

(3) Pushing the stream to the streaming server

We used the streaming server provided by the multimedia group of the framework architecture R&D department of the platform R&D center. By introducing the JAR package provided by the framework team, we can easily push the data to the server.

Each frame of the ffmpeg encoder's standard output is sent to the streaming server using the device's primary key on the platform as a unique identifier.

After receiving the data, the company's streaming server will generate a playback address similar to the live broadcast room based on the unique identifier. The front end can see the mobile phone screen by accessing the address.

(4) Streaming bitrate

We need to select the appropriate frame rate and bit rate to achieve a balance between video smoothness and clarity:

Taking the bit rate upper limit as 4.5 Mbps as an example, the peak network speed required by the user end is about 550 KB/s.

Required bandwidth (KB/s) ≈ Maximum streaming bitrate (bps)/8/1024.

Because the user's operation speed is not very fast, the bandwidth occupied will be less. The bandwidth required for screen changes caused by general operations is about 150-200KB/s, while the bandwidth required in the static state is only 5-40KB/s.

Taking all aspects into consideration, we added appropriate key frames based on the WDA screenshot speed of 24, set the Client streaming frame rate at 30 frames/s, and set the bit rate upper limit to 4.5mbps. The actual bandwidth occupied is about 350KB/s, and the picture display is smooth, clear, and without screen distortion.

The maximum download speed of the WIFI we use is about 7.5MB/s, so the streaming bitrate and bandwidth are not bottlenecks. The bottleneck is mainly the efficiency of ffmpeg in converting image streams into video streams. Through calculation, the transcoding efficiency of the client-side java single-threaded ffmpeg is about 40 frames per second, which can be improved through technical optimization.

4. Data Collection

As a basic platform that employees of related jobs will rely on to carry out their daily work, its stability must be detectable in all dimensions. It not only needs to support the monitoring of the health of the daily operation of the system, but also support the collection of sufficient operating data to provide it to platform R&D personnel for analysis and promote subsequent iterations.

Platform stability : Improve user-perceived stability through various monitoring dimension data and logs;

Usage detection : used to evaluate the amount of users’ reliance on the platform to work, and the impact of later platform iterations on users;

V. Practice Summary

In the field of automated testing, including Ctrip, there are actually many UI automated testing solutions that use similar technologies, and even use the same technical base. For example, the WDA framework is a new iOS mobile testing framework launched by Facebook.

Coincidentally, after our team initially implemented some technical functions, we also focused on promoting them in testing scenarios. However, Ctrip's business scope is very broad. We not only have development and testing scenarios, but also content verification scenarios. In particular, we are at the forefront of internationalization, and a large number of overseas employees also have to participate in many acceptance links.

Then, synchronization or training of colleagues in different countries on application versions, parameter configuration, environment initialization, and resource preparation is quite labor-intensive and costly, and the effect is not good. Based on our in-depth analysis and evolution of technology and platforms, we found that technology actually has a wide range of applications. After a basic technology platform is established, it is easy to integrate scenarios, personnel, equipment, and configurations. Many communication costs can be directly reduced, and the problem of underutilization of acceptance equipment has been well solved, especially the discovery and resolution of common problems have become efficient.

In our subsequent work, we will continue to optimize the following aspects based on the current experience:

  • Simulator scenes support concurrent installation packages
  • Multiple scenarios of a single device

Ultimately, the platform experience can completely replace actual machine operations, allowing our potential users to personally feel that using the platform is more convenient and efficient than doing various tasks on their mobile phones.

<<:  The powerful functions of Android terminal emulator Termux: application calls Termux to execute commands

>>:  Difference between FragmentPagerAdapter and FragmentStatePagerAdapter

Recommend

Re-understanding the R8 compiler from an online question

background In the past period of time, JD Android...

How did electric cars get into a fight with apes?

Now, there are more and more "new energy veh...

Taobao Live: The logic behind Sydney’s 300 million yuan sales

When it comes to Taobao Live , are the only peopl...

Enterprise Accounts, Tencent’s corporate conspiracy

Finally, it's here, whether you like it or no...

How much does it cost to attract investment for the Wujiaqu Hotel mini program?

How much does it cost to attract investment in th...

Baidu promotion creative writing skills, how to write Baidu promotion creative?

Nowadays, the competition in Internet promotion i...

5 ways to promote your brand!

Brand promotion is not just a high-sounding conce...

Offline traffic: the core value of elevator advertising!

In the fragmented era of mobile Internet, we chec...