Tuesday, January 9, 2018

How to Remove Unused Hidden Images From LibreOffice 5 Writer Documents

As a game designer, I've spent years writing. My application of choice has been LibreOffice, which is surprisingly versatile. But, despite the massive community behind it, the software is not perfect.

Digital Crud

One issue I've noticed over the years of writing is the accumulation of my writer document's growing file-size. I've noticed that as I added and took away images (for both backgrounds and inline objects), the images didn't actually go away. They disappeared, sure, and couldn't be seen; but they were embedded somewhere inside the document.

Cleaning the Crud

The process below will show you how to perform some "digital surgery" to an ODT document and remove unwanted graphics that could be taking up valuable memory/bandwidth.

1. Backup Your Files

Because we're going to be altering the document in a round-about-way, its very easy to mess up, which will result in LibreOffice branding your file "corrupt" and refusing to open it. So, backup everything important encase you have a misstep.

If you botch the process, LibreOffice will refuse to load the file. If this happens to you, as far as I know, you have to start over with a new copy of the file.




2. Rename the Extension to .zip

An ODT file is actually just a collection of other files (an archive). If you rename the ODT file with a ".zip" extension (e.g. "document.zip"), you can use a zip program such as Winrar to open it up.


3. Find Unwanted Images

Inside, you'll find several directories (folders), XML files, and other important documents. You'll also find a "Pictures" directory. Open that folder and inside you'll find every image used inside your writer document.

If your goal is to remove images that are no longer used but still embedded, you'll need to go through each image and deem whether it should stay or go. But don't delete anything just yet! Instead, when you find a file you don't want anymore, right click on the file and choose "Rename". This will open a prompt with the filename; which you can easily copy and use in the following steps. Copy the names verbatim, because any misspelling will result in an error.


4. Open styles.xml and manifest.xml

If you simply delete the images and call it a day, LibreOffice will consider your document corrupt. So, we also have to remove any mention of the files ever being there.

Open the styles.xml and manifest.xml (found in the META-INF directory). XML files are just text files, so opening them in a text editor should suffice. Be aware that these files may be quite large in size (several megabytes), which may require a robust text editor to handle them properly. I prefer using Geany because it's open source, free, and has a robust "find" feature.




5. Find the image entries and remove them

In both documents, press CTRL+F to invoke the find feature. Paste the filename of the image you wish to remove into the text field, and click "Find" or "Next" to locate where the image is mentioned.


There may be multiple entries of the image, so be sure to search the entirety of both documents.

In the styles.xml: The image is mentioned in a XML <draw:fill-image/> element. If you don't know or understand XML; that's okay. Simply copy and delete the following:

<draw:fill-image draw:name="ImageName" draw:display-name="ImageName" xlink:href="Pictures/FileName" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>

Where ImageName and FileName will be specific to your document. The exact code may be slightly different than mine, but the main point is to remove everything in-between "<draw:fill-image" and "/>" (including those text).

Remove it all, then save the document. It'll ask if you want to update the styles.xml file. Say Yes.




In the manifest.xml: The image is mentioned in a XML <manifest:file-entry/> element. Remove the following code:

 <manifest:file-entry manifest:full-path="Pictures/FileName" manifest:media-type="image/FileExtension"/>
Where FileExtension is the extension of the image (e.g. png, tif, gif, etc.). Remove it all, then save the document. It'll ask if you want to update the manifest.xml file. Say Yes.


6. Delete the Image File

Now that we've deleted all entries of the file, its safe to remove the burdensome image. Permanently delete the image from the archive.


7. Repeat Steps 3 Through 6 For All Unwanted Images

The above steps should be repeated if you have multiple images you wish to delete. Once finished, proceed to step 8.

8. Rename the File Back to a .odt Extension and open the document

If the surgery went well, LibreOffice should open up the document normally and be none-the-wiser about what happened, but a whole lot lighter (file size-wise).