Recently, I need to generate Word doc and docx files in a project. After searching on Baidu and Google, I found that the mainstream implementation in Java language is Apache POI component. In addition to POI, there is another implementation, but I haven't studied it. Students who are interested can study it. About POI You can visit the official website of Apache POI for detailed information. Let’s get to the point! Since only doc and docx components are used in the project, the following only introduces the use of these two components. 1. How to use POI components in Android Studio From the POI official website, it seems that IntelliJ IDE is not supported yet, as shown in the figure below, so here we use the method of directly downloading the jar package and importing the project. Through the official website->Overview->Components, you can see that the d and docx files correspond to the components HWPF and XWPF respectively, and HWPF and XWPF correspond to poi-scratchpad and poi-ooxml download Go to the Apache download page and select the latest version to download, as shown below. Select The latest beta release is Apache POI 3.16-beta2 to jump to poi-bin-3.16-beta2-20170202.tar.gz, then click poi-bin-3.16-beta2-20170202.tar.gz and select the mirror to successfully download. Note: For Linux system, choose .tar.gz. For Windows system, choose .zip. Unzip Decompress the downloaded compressed package and you will get the following files. Import Students who are not familiar with how to import can take a look at the Android Studio import jar package tutorial 1.doc For doc files, you need to put the jar packages, poi-3.16-beta2.jar and poi-scratchpad-3.16-beta2.jar in the lib folder into the libs directory of the android project (my project did not have any abnormalities even if junit-4.12.jar and log4j-1.2.17.jar in the lib folder were not put there, so the less the better). 2. docx For docx, you need to import the jar packages in the lib folder, poi-3.16-beta2.jar, poi-ooxml-3.16-beta2.jar, poi-ooxml-schemas-3.16-beta2.jar and the packages in ooxml-lib. Since I always get the error Warning: Ingoring InnerClasses attribute for an anonymous inner class, and since doc basically meets my needs and importing so many jars will increase the size of the apk, I did not implement it. Interested students can study it. 2. Realize the reading and writing of doc files The HWPF module in Apache POI is specifically used to read and generate doc format files. In HWPF, we use HWPFDocument to represent a word doc document. Before looking at the code, it is necessary to understand several concepts in HWPFDocument: Note: Section, Paragraph, CharacterRun, and Table all inherit from Range. Note before reading and writing: The HWPFDocument class provided by Apache POI can only read and write standard .doc files. That is to say, if you use the method of modifying the suffix name to generate a doc file or directly create it by naming, an error "Your file appears not to be a valid OLE2 document" will appear. Invalid header signature; read 0x7267617266202E31, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document DOC Read There are two ways to read doc files: (a) read files through WordExtractor (b) read files through HWPFDocument In daily applications, we rarely read information from word files, but write content into word files. There are two main ways to read data from word doc files using POI: read through WordExtractor and read through HWPFDocument. When reading information inside WordExtractor, it is still obtained through HWPFDocument. Read using WordExtractor When using WordExtractor to read a file, we can only read the text content of the file and some document-based properties. As for the properties of the document content, we cannot read them. If you want to read the properties of the document content, you need to use HWPFDocument to read it. The following is an example of using WordExtractor to read a file:
Reading with HWPFDocument HWPFDocument is the representative of the current Word document, and its functions are stronger than WordExtractor. Through it, we can read tables, lists, etc. in the document, and we can also add, modify, and delete the content of the document. However, after these additions, modifications, and deletions, the relevant information is saved in the HWPFDocument, which means that we change the HWPFDocument, not the file on the disk. If you want to make these changes effective, we can call the write method of HWPFDocument to output the modified HWPFDocument to the specified output stream. This can be the output stream of the original file, the output stream of the new file (equivalent to Save As) or other output streams. The following is an example of reading a file through HWPFDocument:
DOC write Writing files using HWPFDocument When using POI to write a word doc file, we must first have a doc file, because we write a doc file through HWPFDocument, and HWPFDocument is attached to a doc file. So the usual practice is to prepare a blank doc file on the hard disk first, and then create an HWPFDocument based on the blank file. After that, we can add content to the HWPFDocument, and then write it to another doc file, which is equivalent to using POI to generate a word doc file.
However, in actual applications, when we generate word files, we generate a certain type of file. The format of this type of file is fixed, but some fields are different. Therefore, in actual applications, we don't have to generate the entire word file content through HWPFDocument. Instead, we first create a new word document on the disk, whose content is the content of the word file we need to generate, and then replace some of the contents belonging to the variables in it with a method similar to "${paramName}". In this way, when we generate a word file based on certain information, we only need to get the HWPFDocument based on the word file, and then call the replaceText() method of Range to replace the corresponding variable with the corresponding value, and then write the current HWPFDocument to the new output stream. This method is used more in actual applications, because it not only reduces our workload, but also makes the format of the text clearer. Let's make an example based on this method. Suppose we have a template like this: We then use this file as a template, replace the variables in it with relevant data, and then output the replaced document to another doc file. The specific steps are as follows:
3. Realize the reading and writing of docx files POI reads and writes word docx files through the xwpf module, the core of which is XWPFDocument. An XWPFDocument represents a docx document, which can be used to read docx documents and also to write docx documents. XWPFDocument mainly contains the following objects: At the same time, XWPFDocument can directly create a new docx file without the need for a template like HWPFDocument. For details, please refer to POI reading and writing docx files written by this classmate. IV. Conclusion We welcome your suggestions and corrections to any errors that may exist in this article. Thank you for your support. |
<<: “Zero Inventory” is achieved, JD.com builds a complete smart supply chain
>>: The six easiest programming languages to learn for beginners
Some time ago, I read an article about e-commerce...
According to Tencent's Q2 financial report, l...
Alan Turing, a British computer scientist, mathem...
When it comes to the color that protects eyesight...
This article is based on answering similar questi...
On September 17, market research firm Juniper Res...
The latest monitoring data from CCTV Market Resea...
Screen: Sound Effects: operate: Plot: Experience:...
...
The World Wide Web Consortium (W3C) announced tha...
Today (January 17), the 2022 Spring Festival trav...
"Making money from those who want to make mo...
"My eyes are dry, so I can just put in some ...
A report released on the 5th by the United Nation...
Produced by: Science Popularization China Author:...