Teach you step by step to create a Word to HTML program using PHP

Teach you step by step to create a Word to HTML program using PHP

[Original article from 51CTO.com] In some special scenarios, the data that users need to display is not manually entered plain text or manually edited rich text content, but the HTML content that needs to be displayed on the APP generated by uploading a Word file. That is, converting the uploaded Word file into an HTML file that needs to be displayed on the APP. So what should we do in the face of such a demand?

Business Scenario

At the request of a friend, the functions that need to be completed this time are as shown in the figure above. From the figure above, we can see that first, the WORD file is uploaded from the management background, and then the back-end script converts the WORD file into an HTML file, and then saves the HTML file in a certain directory, and then directly displays the HTML file in the mobile APP. In other words, the WORD file is converted into an HTML file, and this HTML file is compatible with the HTML5 standard, so that the APP can display the HTML file as friendly as possible.

The main solutions currently

[[203561]]

There are many third-party libraries or software that can convert WORD files to HTML. However, there are two most commonly used ones: Apache OpenOffice and LibreOffice. The biggest advantage of these two solutions is that they are cross-platform, that is, they both provide versions for Windows, Linux, and Mac OS operating systems, so that we can use them with confidence and reduce the cost of porting code as much as possible. Here, I choose the LibreOffice solution.

Environment Introduction

The local environment for this development is as follows:

OS: Windows 10

PHP: 7.1 or above

MySQL: 5.6 or later

WEB SERVER: Apache 2.4

PHP Framework: LV Framework

IDE: PhpStorm

Server environment introduction:

OS: Ubuntu

PHP: *** version

MySQL: *** version

WEB SERVER: Nginx

Install LibreOffice Environment

Since the local environment is a Windows environment, I only need to download the Windows version of the software package and install it. This version of the software is an exe file, so the installation is very simple. You only need to click Next to complete the entire installation like installing general software.

Convert using the command line

In fact, to convert Word files into HTML files, the soffice.exe file under the installation directory of LibreOffice is mainly used. The picture below is my local file path.

Next, I will create a new directory to test the conversion. Please refer to the screenshots below for the new directory, conversion files, conversion commands, conversion results, etc.

At the beginning, there is only a 20170818.docx file. Next, we will generate an html file under this directory.

After the above operation, we can see that an HTML file named 20170818.html is generated. The main command used is: "soffice.exe --convert-to html --outdir HTML file save directory to convert the file name".

Convert using PHP code

We have verified above that we can successfully convert Word files to HTML files using the command line. Since our environment is a PHP script, we need to use PHP to call the soffice.exe file for conversion. The functions that call these execution programs in PHP are: shell_exec, exec, system, passthru, etc. The following is a code snippet that I used LV to convert to HTML.

There is one more problem that needs to be solved

Although I have successfully used PHP to convert the Word file uploaded in the background into an HTML file and save it, there is still a fatal problem, that is, the converted HTML file cannot adapt. This causes the display effect to be very poor when the page is opened in the APP, and even a horizontal scroll bar will appear. Text will start to display in the lower right corner of the picture, etc. In order to solve this problem, I have to read the content of the generated HTML file and then add various HTML tags and CSS attributes to it. Refer to the code below.

At the same time, one thing that needs to be explained is that the images in Word files are base64 encoded when converted to HTML files.

A potential performance issue

Because the script execution time may be very long when converting Word to HTML files, if there are many Word files or many users are converting Word files at the same time, it is not recommended to convert them after uploading. Instead, the conversion can be done by a separate host or a separate process. After the back-end upload is successful, just add the conversion task to the message queue.

Some additional notes

Since the conversion time may be long and the uploaded Word file may be large, we need to set some PHP configuration options, such as the maximum script execution time, the maximum file upload size, the maximum POST upload size, and so on.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  This APP can automatically convert the operation video into an AR tutorial, teaching you how to repair furniture step by step

>>:  Go Hack 2017 registration is open: a Go language brain-burning battle in Shanghai in October

Recommend

HR Excel daily practice video tutorial

HR's daily practice of Excel As an HR profess...

Google starts testing Android Q

Android 9.0, codenamed "Pie", was offic...

One article to understand Toutiao advertising placement

" Toutiao " is a personalized recommend...

Camera2 custom camera development process detailed explanation

[[432612]] Preface Today I will introduce the det...

Landing page optimization method!

Looking at today's marketing situation, brand...

Providing offline support for mobile apps

Offline support for mobile applications can be un...

International Star Hotel Soft Decoration Design Course

International star hotel soft decoration design c...

Mini Program Development Custom Components

Dongguan applet develops custom components, creat...

Wei Chunyang: Practical Course on Institutional Trading Codes (February 2022)

Wei Chunyang: Practical Course on Institutional T...

Ps+Ai double major! The first compulsory course for designers

Ps+Ai double major! The first compulsory course f...