EDN Admin
Well-known member
I am trying to write code that will extract the embedded objects in a Word document and save the result into another file. I found code that will work with Word and Excel embedded documents (the code activates the embedded object, then uses the
Word.Document.SaveAs).
Here is the code I have for extracting the Word documents. I am limiting the versions of Word being processed to Word.Document.8 for now.
<div style="color:Black;background-color:White; <pre>
<span style="color:Blue; Private <span style="color:Blue; Sub processFileButton_Click(<span style="color:Blue; ByVal sender <span style="color:Blue; As System.Object, <span style="color:Blue; ByVal e <span style="color:Blue; As System.EventArgs) <span style="color:Blue; Handles processFileButton.Click
<span style="color:Blue; Dim oWord <span style="color:Blue; As Word.Application
<span style="color:Blue; Dim oDoc <span style="color:Blue; As Word.Document
<span style="color:Blue; Dim inl <span style="color:Blue; As Word.InlineShape
<span style="color:Blue; Dim embeddedWord <span style="color:Blue; As Word.InlineShape
<span style="color:Blue; Dim wordDocument <span style="color:Blue; As Word.Document
<span style="color:Blue; Dim i <span style="color:Blue; As <span style="color:Blue; Integer
<span style="color:Blue; Dim outputFileName <span style="color:Blue; As <span style="color:Blue; Object
<span style="color:Blue; If fileNameTextBox.Text <= <span style="color:#A31515; "" <span style="color:Blue; Then
MessageBox.Show(<span style="color:#A31515; "File name is required", <span style="color:#A31515; "Error", MessageBoxButtons.OK)
<span style="color:Blue; Exit <span style="color:Blue; Sub
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Blue; If MessageBox.Show(<span style="color:#A31515; "Press OK to process file:" & fileNameTextBox.Text, <span style="color:#A31515; "Ready to process file?", MessageBoxButtons.OKCancel, MessageBoxIcon.Question, MessageBoxDefaultButton.Button1) = Windows.Forms.DialogResult.Cancel <span style="color:Blue; Then
MessageBox.Show(<span style="color:#A31515; "Process cancelled", <span style="color:#A31515; "Process cancelled.", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
<span style="color:Blue; Exit <span style="color:Blue; Sub
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Green; Start Word and open the document template.
oWord = CreateObject(<span style="color:#A31515; "Word.Application")
oWord.Visible = <span style="color:Blue; False
oDoc = oWord.Documents.Open(fileNameTextBox.Text)
i = 0
<span style="color:Green; process each embedded object
<span style="color:Blue; For <span style="color:Blue; Each inl <span style="color:Blue; In oDoc.InlineShapes
i = i + 1
<span style="color:Blue; If oDoc.InlineShapes.Item(i).OLEFormat.ProgID = <span style="color:#A31515; "Word.Document.8" <span style="color:Blue; Then
outputFileName = fileNameTextBox.Text & <span style="color:#A31515; "-embed-word" & i & <span style="color:#A31515; ".doc"
embeddedWord = oDoc.InlineShapes(i)
embeddedWord.OLEFormat.Activate()
wordDocument = oDoc.InlineShapes.Item(i).OLEFormat.Object
wordDocument.SaveAs(outputFileName)
wordDocument.Close()
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Blue; Next inl
oDoc.Close()
oWord.Quit()
MessageBox.Show(<span style="color:#A31515; "File:" & fileNameTextBox.Text & <span style="color:#A31515; " processed.", <span style="color:#A31515; "File Processed", MessageBoxButtons.OK, MessageBoxIcon.Information)
<span style="color:Blue; End <span style="color:Blue; Sub
[/code]
The problem I now have is extracting other types of files, such as PDF files. Adobe has a SDK for PDF files (that I havent gotten to work yet), but there is another type of embedding for files that I dont know how to handle. If someone embeds
an object that does not have a handler, the object gets embedded with the <span style="font-family:Consolas; font-size:x-small
<span style="font-family:Consolas; font-size:x-small OLEFormat.ProgID = "Package". This also happens to PDF documents if the PC running Word does not have Adobe installed on it.
Ive seen other posts that mention having to get the data from the OleNative stream of the OLE object, but I havent seen any code that will do that.
Does anyone have Visual Basic code that will extract the OleNative stream of the OLE embedded object in Word 2003 (.doc files), and also let me know if there would be a difference between the embedded documents created for other versions of Word (ie,
will it work for .docx files)?
Please let me know if this should be posted in a different forum.
View the full article
Word.Document.SaveAs).
Here is the code I have for extracting the Word documents. I am limiting the versions of Word being processed to Word.Document.8 for now.
<div style="color:Black;background-color:White; <pre>
<span style="color:Blue; Private <span style="color:Blue; Sub processFileButton_Click(<span style="color:Blue; ByVal sender <span style="color:Blue; As System.Object, <span style="color:Blue; ByVal e <span style="color:Blue; As System.EventArgs) <span style="color:Blue; Handles processFileButton.Click
<span style="color:Blue; Dim oWord <span style="color:Blue; As Word.Application
<span style="color:Blue; Dim oDoc <span style="color:Blue; As Word.Document
<span style="color:Blue; Dim inl <span style="color:Blue; As Word.InlineShape
<span style="color:Blue; Dim embeddedWord <span style="color:Blue; As Word.InlineShape
<span style="color:Blue; Dim wordDocument <span style="color:Blue; As Word.Document
<span style="color:Blue; Dim i <span style="color:Blue; As <span style="color:Blue; Integer
<span style="color:Blue; Dim outputFileName <span style="color:Blue; As <span style="color:Blue; Object
<span style="color:Blue; If fileNameTextBox.Text <= <span style="color:#A31515; "" <span style="color:Blue; Then
MessageBox.Show(<span style="color:#A31515; "File name is required", <span style="color:#A31515; "Error", MessageBoxButtons.OK)
<span style="color:Blue; Exit <span style="color:Blue; Sub
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Blue; If MessageBox.Show(<span style="color:#A31515; "Press OK to process file:" & fileNameTextBox.Text, <span style="color:#A31515; "Ready to process file?", MessageBoxButtons.OKCancel, MessageBoxIcon.Question, MessageBoxDefaultButton.Button1) = Windows.Forms.DialogResult.Cancel <span style="color:Blue; Then
MessageBox.Show(<span style="color:#A31515; "Process cancelled", <span style="color:#A31515; "Process cancelled.", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
<span style="color:Blue; Exit <span style="color:Blue; Sub
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Green; Start Word and open the document template.
oWord = CreateObject(<span style="color:#A31515; "Word.Application")
oWord.Visible = <span style="color:Blue; False
oDoc = oWord.Documents.Open(fileNameTextBox.Text)
i = 0
<span style="color:Green; process each embedded object
<span style="color:Blue; For <span style="color:Blue; Each inl <span style="color:Blue; In oDoc.InlineShapes
i = i + 1
<span style="color:Blue; If oDoc.InlineShapes.Item(i).OLEFormat.ProgID = <span style="color:#A31515; "Word.Document.8" <span style="color:Blue; Then
outputFileName = fileNameTextBox.Text & <span style="color:#A31515; "-embed-word" & i & <span style="color:#A31515; ".doc"
embeddedWord = oDoc.InlineShapes(i)
embeddedWord.OLEFormat.Activate()
wordDocument = oDoc.InlineShapes.Item(i).OLEFormat.Object
wordDocument.SaveAs(outputFileName)
wordDocument.Close()
<span style="color:Blue; End <span style="color:Blue; If
<span style="color:Blue; Next inl
oDoc.Close()
oWord.Quit()
MessageBox.Show(<span style="color:#A31515; "File:" & fileNameTextBox.Text & <span style="color:#A31515; " processed.", <span style="color:#A31515; "File Processed", MessageBoxButtons.OK, MessageBoxIcon.Information)
<span style="color:Blue; End <span style="color:Blue; Sub
[/code]
The problem I now have is extracting other types of files, such as PDF files. Adobe has a SDK for PDF files (that I havent gotten to work yet), but there is another type of embedding for files that I dont know how to handle. If someone embeds
an object that does not have a handler, the object gets embedded with the <span style="font-family:Consolas; font-size:x-small
<span style="font-family:Consolas; font-size:x-small OLEFormat.ProgID = "Package". This also happens to PDF documents if the PC running Word does not have Adobe installed on it.
Ive seen other posts that mention having to get the data from the OleNative stream of the OLE object, but I havent seen any code that will do that.
Does anyone have Visual Basic code that will extract the OleNative stream of the OLE embedded object in Word 2003 (.doc files), and also let me know if there would be a difference between the embedded documents created for other versions of Word (ie,
will it work for .docx files)?
Please let me know if this should be posted in a different forum.
View the full article