How to read and write Word doc/docx and PDF files in Android?

How to read and write Word doc/docx and PDF files in Android?

Recently, I need to generate Word doc and docx files in a project. After searching on Baidu and Google, I found that the mainstream implementation in Java language is Apache POI component. In addition to POI, there is another implementation, but I haven't studied it. Students who are interested can study it.

About POI You can visit the official website of Apache POI for detailed information.

Let’s get to the point!

Since only doc and docx components are used in the project, the following only introduces the use of these two components.

1. How to use POI components in Android Studio

From the POI official website, it seems that IntelliJ IDE is not supported yet, as shown in the figure below, so here we use the method of directly downloading the jar package and importing the project.

Through the official website->Overview->Components, you can see that the d and docx files correspond to the components HWPF and XWPF respectively, and HWPF and XWPF correspond to poi-scratchpad and poi-ooxml

download

Go to the Apache download page and select the latest version to download, as shown below. Select The latest beta release is Apache POI 3.16-beta2 to jump to poi-bin-3.16-beta2-20170202.tar.gz, then click poi-bin-3.16-beta2-20170202.tar.gz and select the mirror to successfully download.

Note: For Linux system, choose .tar.gz. For Windows system, choose .zip.

Unzip

Decompress the downloaded compressed package and you will get the following files.

Import

Students who are not familiar with how to import can take a look at the Android Studio import jar package tutorial

1.doc For doc files, you need to put the jar packages, poi-3.16-beta2.jar and poi-scratchpad-3.16-beta2.jar in the lib folder into the libs directory of the android project (my project did not have any abnormalities even if junit-4.12.jar and log4j-1.2.17.jar in the lib folder were not put there, so the less the better).

2. docx For docx, you need to import the jar packages in the lib folder, poi-3.16-beta2.jar, poi-ooxml-3.16-beta2.jar, poi-ooxml-schemas-3.16-beta2.jar and the packages in ooxml-lib. Since I always get the error Warning: Ingoring InnerClasses attribute for an anonymous inner class, and since doc basically meets my needs and importing so many jars will increase the size of the apk, I did not implement it. Interested students can study it.

2. Realize the reading and writing of doc files

The HWPF module in Apache POI is specifically used to read and generate doc format files. In HWPF, we use HWPFDocument to represent a word doc document. Before looking at the code, it is necessary to understand several concepts in HWPFDocument:

Note: Section, Paragraph, CharacterRun, and Table all inherit from Range.

Note before reading and writing: The HWPFDocument class provided by Apache POI can only read and write standard .doc files. That is to say, if you use the method of modifying the suffix name to generate a doc file or directly create it by naming, an error "Your file appears not to be a valid OLE2 document" will appear.

Invalid header signature; read 0x7267617266202E31, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document

DOC Read

There are two ways to read doc files: (a) read files through WordExtractor (b) read files through HWPFDocument

In daily applications, we rarely read information from word files, but write content into word files. There are two main ways to read data from word doc files using POI: read through WordExtractor and read through HWPFDocument. When reading information inside WordExtractor, it is still obtained through HWPFDocument.

Read using WordExtractor

When using WordExtractor to read a file, we can only read the text content of the file and some document-based properties. As for the properties of the document content, we cannot read them. If you want to read the properties of the document content, you need to use HWPFDocument to read it. The following is an example of using WordExtractor to read a file:

  1. //Read the file through WordExtractor public class WordExtractorTest {
  2.  
  3. private final String PATH = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "test.doc" ); private static final String TAG = "WordExtractorTest" ;
  4. private void log(Object o) {
  5. Log.d(TAG, String.valueOf(o));
  6. } public void testReadByExtractor() throws Exception {
  7. InputStream is = new FileInputStream(PATH);
  8. WordExtractor extractor = new WordExtractor( is ); // Output all the text of the word document
  9. log(extractor.getText());
  10. log(extractor.getTextFromPieces()); //Output the content of the header
  11. log( "Header:" + extractor.getHeaderText()); //Output the content of the footer
  12. log( "Footer: " + extractor.getFooterText()); //Output the metadata information of the current word document, including the author, document modification time, etc.
  13. log(extractor.getMetadataTextExtractor().getText()); //Get the text of each paragraph
  14. String paraTexts[] = extractor.getParagraphText(); for ( int i=0; i<paraTexts.length; i++) {
  15. log( "Paragraph " + (i+1) + " : " + paraTexts[i]);
  16. } // Output some information of the current word
  17. printInfo(extractor.getSummaryInformation()); //Output some information of the current word
  18. this.printInfo(extractor.getDocSummaryInformation()); this.closeStream( is );
  19. }
  20. /**
  21. * Output SummaryInfomation
  22. * @param info
  23. */
  24. private void printInfo(SummaryInformation info) { //Author
  25. log(info.getAuthor()); //Character statistics
  26. log(info.getCharCount()); //Number of pages
  27. log(info.getPageCount()); //Title
  28. log(info.getTitle()); //Topic
  29. log(info.getSubject());
  30. }
  31. /**
  32. * Output DocumentSummaryInfomation
  33. * @param info
  34. */
  35. private void printInfo(DocumentSummaryInformation info) { //Classification
  36. log(info.getCategory()); //Company
  37. log(info.getCompany());
  38. }
  39. /**
  40. * Close the input stream
  41. * @param is  
  42. */
  43. private void closeStream(InputStream is ) { if ( is != null ) { try {
  44. is . close ();
  45. } catch (IOException e) {
  46. e.printStackTrace();
  47. }
  48. }
  49. }}

Reading with HWPFDocument

HWPFDocument is the representative of the current Word document, and its functions are stronger than WordExtractor. Through it, we can read tables, lists, etc. in the document, and we can also add, modify, and delete the content of the document. However, after these additions, modifications, and deletions, the relevant information is saved in the HWPFDocument, which means that we change the HWPFDocument, not the file on the disk. If you want to make these changes effective, we can call the write method of HWPFDocument to output the modified HWPFDocument to the specified output stream. This can be the output stream of the original file, the output stream of the new file (equivalent to Save As) or other output streams. The following is an example of reading a file through HWPFDocument:

  1. //Use HWPFDocument to read files public class HWPFDocumentTest {
  2.    
  3. private final String PATH = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "test.doc" ); private static final String TAG = "HWPFDocumentTest" ;
  4. private void log(Object o) {
  5. Log.d(TAG, String.valueOf(o));
  6. } public void testReadByDoc() throws Exception {
  7. InputStream is = new FileInputStream(PATH);
  8. HWPFDocument doc = new HWPFDocument( is ); // Output bookmark information
  9. this.printInfo(doc.getBookmarks()); //Output text
  10. log(doc.getDocumentText());
  11. Range range = doc.getRange(); //Read the whole
  12. this.printInfo(range); //Read table
  13. this.readTable(range); //Read list
  14. this.readList(range); this.closeStream( is );
  15. }
  16. /**
  17. * Close the input stream
  18. * @param is  
  19. */
  20. private void closeStream(InputStream is ) { if ( is != null ) { try {
  21. is . close ();
  22. } catch (IOException e) {
  23. e.printStackTrace();
  24. }
  25. }
  26. }
  27. /**
  28. * Output bookmark information
  29. * @param bookmarks
  30. */
  31. private void printInfo(Bookmarks bookmarks) { int   count = bookmarks.getBookmarksCount();
  32. log( "Number of bookmarks: " + count );
  33. Bookmark bookmark; for ( int i=0; i< count ; i++) {
  34. bookmark = bookmarks.getBookmark(i);
  35. log( "Bookmark" + (i+1) + "The name is:" + bookmark.getName());
  36. log( "Starting position: " + bookmark.getStart());
  37. log( "End position: " + bookmark.getEnd());
  38. }
  39. }
  40. /**
  41. * Read table
  42. * Each carriage return character represents a paragraph, so for a table, each cell contains at least one paragraph, and each line ends with a paragraph.
  43. * @param range
  44. */
  45. private void readTable(Range range) { //Traverse the table within the range .
  46. TableIterator tableIter = new TableIterator(range);
  47. Table   table ;
  48. TableRow row;
  49. TableCell cell; while (tableIter.hasNext()) {
  50. table = tableIter. next (); int rowNum = table .numRows(); for ( int j=0; j<rowNum; j++) {
  51. row = table .getRow(j); int cellNum = row.numCells(); for ( int k=0; k<cellNum; k++) {
  52. cell = row.getCell(k); //Output cell text
  53. log(cell.text().trim());
  54. }
  55. }
  56. }
  57. }
  58. /**
  59. * Read list
  60. * @param range
  61. */
  62. private void readList(Range range) { int num = range.numParagraphs();
  63. Paragraph para; for ( int i=0; i<num; i++) {
  64. para = range.getParagraph(i); if (para.isInList()) {
  65. log( "list: " + para.text());
  66. }
  67. }
  68. }
  69. /**
  70. * Output Range
  71. * @param range
  72. */
  73. private void printInfo(Range range) { //Get the number of paragraphs
  74. int paraNum = range.numParagraphs();
  75. log(paraNum); for ( int i=0; i<paraNum; i++) {
  76. log( "Paragraph" + (i+1) + ":" + range.getParagraph(i).text()); if (i == (paraNum-1)) { this.insertInfo(range.getParagraph(i));
  77. }
  78. } int secNum = range.numSections();
  79. log(secNum);
  80. Section   section ; for ( int i=0; i<secNum; i++) {
  81. section = range.getSection(i);
  82. log( section .getMarginLeft());
  83. log( section .getMarginRight());
  84. log( section .getMarginTop());
  85. log( section .getMarginBottom());
  86. log( section.getPageHeight ());
  87. log( section.text ());
  88. }
  89. }
  90. /**
  91. * Insert content into Range, which will only be written to memory
  92. * @param range
  93. */
  94. private void insertInfo(Range range) {
  95. range.insertAfter( "Hello" );
  96. }}

DOC write

Writing files using HWPFDocument

When using POI to write a word doc file, we must first have a doc file, because we write a doc file through HWPFDocument, and HWPFDocument is attached to a doc file. So the usual practice is to prepare a blank doc file on the hard disk first, and then create an HWPFDocument based on the blank file. After that, we can add content to the HWPFDocument, and then write it to another doc file, which is equivalent to using POI to generate a word doc file.

  1. //Write string into word
  2. InputStream is = new FileInputStream(PATH);
  3. HWPFDocument doc = new HWPFDocument( is ); //Get Range
  4. Range range = doc.getRange(); for ( int i = 0; i < 100; i++) { if( i % 2 == 0 ) {
  5. range.insertAfter( "Hello " + i + "\n" ); //Insert String at the end of the file
  6. } else {
  7. range.insertBefore( " Bye " + i + "\n" ); //Insert String into the file header
  8. }
  9. } //Write to the original file
  10. OutputStream os = new FileOutputStream(PATH); //Write to another file
  11. //OutputStream os = new FileOutputStream(other path);
  12. doc.write(os); this.closeStream( is ); this.closeStream(os);

However, in actual applications, when we generate word files, we generate a certain type of file. The format of this type of file is fixed, but some fields are different. Therefore, in actual applications, we don't have to generate the entire word file content through HWPFDocument. Instead, we first create a new word document on the disk, whose content is the content of the word file we need to generate, and then replace some of the contents belonging to the variables in it with a method similar to "${paramName}". In this way, when we generate a word file based on certain information, we only need to get the HWPFDocument based on the word file, and then call the replaceText() method of Range to replace the corresponding variable with the corresponding value, and then write the current HWPFDocument to the new output stream. This method is used more in actual applications, because it not only reduces our workload, but also makes the format of the text clearer. Let's make an example based on this method.

Suppose we have a template like this:

We then use this file as a template, replace the variables in it with relevant data, and then output the replaced document to another doc file. The specific steps are as follows:

  1. public class HWPFTemplateTest { /**
  2. * Use a doc document as a template, then replace the content and write it into the target document.
  3. * @throws Exception
  4. */
  5.      
  6. @Test
  7. public void testTemplateWrite() throws Exception {
  8. String templatePath = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "template.doc" );
  9.  
  10. String targetPath = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "target.doc" ;
  11. InputStream is = new FileInputStream(templatePath);
  12. HWPFDocument doc = new HWPFDocument( is );
  13. Range range = doc.getRange(); //Replace ${reportDate} in the range with the current date
  14. range.replaceText( "${reportDate}" , new SimpleDateFormat( "yyyy-MM-dd" ).format(new Date ()));
  15. range.replaceText( "${appleAmt}" , "100.00" );
  16. range.replaceText( "${bananaAmt}" , "200.00" );
  17. range.replaceText( "${totalAmt}" , "300.00" );
  18. OutputStream os = new FileOutputStream(targetPath); //Output doc to the output stream
  19. doc.write(os); this.closeStream(os); this.closeStream( is );
  20. }
  21. /**
  22. * Close the input stream
  23. * @param is  
  24. */
  25. private void closeStream(InputStream is ) { if ( is != null ) { try {
  26. is . close ();
  27. } catch (IOException e) {
  28. e.printStackTrace();
  29. }
  30. }
  31. }
  32. /**
  33. * Close the output stream
  34. * @param os
  35. */
  36. private void closeStream(OutputStream os) { if (os != null ) { try {
  37. os.close ();
  38. } catch (IOException e) {
  39. e.printStackTrace();
  40. }
  41. }
  42. }}

3. Realize the reading and writing of docx files

POI reads and writes word docx files through the xwpf module, the core of which is XWPFDocument. An XWPFDocument represents a docx document, which can be used to read docx documents and also to write docx documents. XWPFDocument mainly contains the following objects:

At the same time, XWPFDocument can directly create a new docx file without the need for a template like HWPFDocument.

For details, please refer to POI reading and writing docx files written by this classmate.

IV. Conclusion

We welcome your suggestions and corrections to any errors that may exist in this article. Thank you for your support.

<<:  “Zero Inventory” is achieved, JD.com builds a complete smart supply chain

>>:  The six easiest programming languages ​​to learn for beginners

Recommend

Celebrity X-Files: Alan Turing | Celebrating Turing's 110th Birthday

Alan Turing, a British computer scientist, mathem...

If aliens come, can humans hide if they can't defeat them?

This article is based on answering similar questi...

Juniper: Global smartphone shipments to reach 1.2 billion this year

On September 17, market research firm Juniper Res...

Mobile games on large screens - TV version of Dungeon Hunter 4 tested

Screen: Sound Effects: operate: Plot: Experience:...

“Tik Tok Likers” earn over 10,000 yuan a month, is it reliable?

"Making money from those who want to make mo...

If my eyes are dry, can I just put in some eye drops?

"My eyes are dry, so I can just put in some ...

UN Environment Programme: 2018 Global Renewable Energy Investment Trends Report

A report released on the 5th by the United Nation...

Glue that can be used in water? Get a new underwater adhesion skill!

Produced by: Science Popularization China Author:...