Any Microsoft Word (2010) experts here?

statsman

Thinks s/he gets paid by the post
Joined
Apr 17, 2008
Messages
2,025
I have a VBA macro that I use to clean up converted documents in Word. One of the first steps in the macro converts all whitespace (space, non-breaking space, tab) to a single space for each occurence (multiple whitespace characters merge down to a single space). I'm using Find/Replace, where the Find text is ^w and the Replace text is " ". This has worked fine for several years now.

Today, I modified the clean-up macro to add a couple more steps, and I went about running the macro for all of the documents impacted by this. Actually, I'm doing a subset at a time as there are several Find/Replace steps in the macro. All of the documents should have single character spacing throughout from having previously run this macro.

What's confusing is the above whitespace step shouldn't change the document, and by appearances it hasn't. But the size of the documents have increased by 30-40%. The character counts and pages are nearly the same (I say "nearly" because of the added steps which could merge a few words/symbols).

As a test, I modified the macro to comment out the whitespace step, and the file sizes have barely changed, which is what I would have expected. Running the macro with just the whitespace step increases the file sizes. Maybe even stranger than all of this is any conversion of the updated Word document into PDF for archive results in a large PDF file also.

Is there something about the above Find/Replace whitespace step that has a gotcha in it with respect to its use?
 
Last edited:
Are the documents older, and being saved with a newer version of word?

It might be adding in some new compatibility features which could affect file size.
 
Are the documents older, and being saved with a newer version of word?

It might be adding in some new compatibility features which could affect file size.
Not at all. Documents were originally created/modified with Word 2010, and it's what I am still using at home.
 
My guess is that you're encountering one of the conditions listed in the second answer here:

https://stackoverflow.com/questions/32966167/ms-word-2007-increased-file-size-after-removing-content

(Allow Fast Saves or Save Version on Close)

(The first answer at the link above seems completely silly to me.)
The first answer does provide some information. I tried saving as RTF and back to DOCX, but the file size was still the same, larger size. I went ahead and saved the document as HTM and back to DOCX, but in this case the file size is close to the original size (there were some edits, so I would expect a small difference).

None of the situations in the second answer apply to me. There isn't an Allow Fast Save option in Word 2010. Most of the documents been edited under Word 2010 for at least the last three years. Fonts are not embedded. Version saving is also off.

But there has to be something about the global replace of a space with a space (" ") that is causing the file size to jump 25-40%. Baffling. But I will admit, I have only run this macro on a file initially before a lot of editing and massaging, then never again, until now.

My solution for now is to go back to the original document, run the macro without the global replace of white space with a space, and use that copy. I'll just have to remember to run that part of the macro once and only once on a document. Probably split it out into its own macro and add it to my batch script for new files.

Edit: Copying the DOCX into a file that I renamed with the ZIP extension, I took a look at the document.xml file, which is what has expanded a great deal in size (more so in uncompressed format). In the original document, there are mostly complete sentences with some coding between each sentence or paragraph (i.e. </w:t></w:r>...). In the modified document, this coding exists between each word. I did replace each white space with a space - unnecessarily, but still. That's where the size increase is coming from. In the modified document, document.xml is almost 9 times as large, compressing down to 1.6 times as large. That's about the extent of my knowledge in trying to solve this.
 
Last edited:
I have a VBA macro that I use to clean up converted documents in Word. One of the first steps in the macro converts all whitespace (space, non-breaking space, tab) to a single space for each occurence (multiple whitespace characters merge down to a single space). I'm using Find/Replace, where the Find text is ^w and the Replace text is " ". This has worked fine for several years now.

Today, I modified the clean-up macro to add a couple more steps, and I went about running the macro for all of the documents impacted by this. Actually, I'm doing a subset at a time as there are several Find/Replace steps in the macro. All of the documents should have single character spacing throughout from having previously run this macro.

What's confusing is the above whitespace step shouldn't change the document, and by appearances it hasn't. But the size of the documents have increased by 30-40%. The character counts and pages are nearly the same (I say "nearly" because of the added steps which could merge a few words/symbols).

As a test, I modified the macro to comment out the whitespace step, and the file sizes have barely changed, which is what I would have expected. Running the macro with just the whitespace step increases the file sizes. Maybe even stranger than all of this is any conversion of the updated Word document into PDF for archive results in a large PDF file also.

Is there something about the above Find/Replace whitespace step that has a gotcha in it with respect to its use?
Add a step to Save As. This eliminates any cached information in the document.
 
If you search for info on </w:t></w:r> you'll find info about runs. It sounds like something has inserted such information between each word. This means there is something wrong with your VBA step that searches and replaces, or another line is interacting in an unexpected way.

Dissecting the XML doc was a nice find., BTW.
 
Problem is your quotes around the space. The macro is inserting the space character as a literal space.
 
Problem is your quotes around the space. The macro is inserting the space character as a literal space.
I am not sure that's the problem (the quotes). Or maybe it is. I can perform the same Find/Replace in the Word menu, and I get the same file size increase. I don't think I ever noticed this before because (a) it's one of the first steps I perform when cleaning up a newly converted document, and (b) I rarely if ever repeat this step. Here's the macro code snippet:

With Selection.Find
.Text = "^w"
.Replacement.Text = " "
.Execute Replace:=wdReplaceAll
End With

I just think replacing all white space (which should be a single blank space in the documents by now) with a single blank space is inserting a ton of XML code into the document.

I also think it was sloppy on my part to just rerun the macro for the few added steps. I probably should have placed those new steps both in the main macro for any future conversion and in a temporary macro to be run separately on the existing documents. Doing the latter results in a file that barely changes size (at most a few hundred bytes for a several hundred thousand byte document).
 
Back
Top Bottom