XML Conversion ??

feurich

Well-known member
Joined
Oct 21, 2003
Messages
168
Location
Holland
Hi There,

Can anyone help me out with this one. I need to read an XML file and convert it in to an other XML file with less tags. The original XML file contains information on single Page TIFF files and I need to convert them in to MultiPage TIFF files. This is not the problem. The problem is to read the original XML file with x amount of IMAGEFILE Tags en convert that in to an XML File with only 1 IMAGEFILETAG. The original XML File also contain Meta-information about the documents and relational information between the singlepage TIFF files, that information needs to be preserved.

My thoughts where, reading the XML file into an DataSet and then process the Single Page TIFFS in to an Multipage TIFF and then export the data in the dataset to an new XML file. But how do I remove the dataset entries Singlepage TIFFS and replace them with a single Dataset entry for the Multipage TIFF?

Any help is usefull.

Cire
 
Original XML File.

<DOCUMENTS>
<VERSION>2.1</VERSION>
<LICENTIEHOUDER>Hummingbird</LICENTIEHOUDER>
<XTN>Bulkimport</XTN>
<ARCHIEFNAAM>Hummingbird archief</ARCHIEFNAAM>
<DOCINDEXNAAM1>Organisatie</DOCINDEXNAAM1>
<DOCINDEXNAAM2>Persoon extern</DOCINDEXNAAM2>
<DOCINDEXNAAM3>Intern</DOCINDEXNAAM3>
<DOCINDEXNAAM4>Project</DOCINDEXNAAM4>
<DOCINDEXNAAM5>Soort</DOCINDEXNAAM5>
<DOCINDEXNAAM6>Onderwerp</DOCINDEXNAAM6>
<DOCINDEXNAAM7>Trefwoorden</DOCINDEXNAAM7>
<DOCINDEXNAAM8>Documentdatum</DOCINDEXNAAM8>
<DOCUMENT>
<BRON>2.xx conversie Original DocId: 7875</BRON>
<INDEXEERDATUM>2001-03-07</INDEXEERDATUM>
<DOCINDEXWAARDE1>Index01</DOCINDEXWAARDE1>
<DOCINDEXWAARDE2>Index02</DOCINDEXWAARDE2>
<DOCINDEXWAARDE3>Index03</DOCINDEXWAARDE3>
<DOCINDEXWAARDE4>Index04</DOCINDEXWAARDE4>
<DOCINDEXWAARDE5>Index05</DOCINDEXWAARDE5>
<DOCINDEXWAARDE6>Index06</DOCINDEXWAARDE6>
<DOCINDEXWAARDE7>2001-03-07</DOCINDEXWAARDE7>
<FILE>SinglePage1.TIF</FILE>
<TYPE>TIF</TYPE>
<FILE>SingelPage2.TIF</FILE>
<TYPE>TIF</TYPE>
</DOCUMENT>
</DOCUMENTS>

Converted XML File


<DOCUMENTS>
<VERSION>2.1</VERSION>
<LICENTIEHOUDER>Hummingbird</LICENTIEHOUDER>
<XTN>Bulkimport</XTN>
<ARCHIEFNAAM>SinglePage Tiff archief</ARCHIEFNAAM>
<DOCINDEXNAAM1>Organisatie</DOCINDEXNAAM1>
<DOCINDEXNAAM2>Persoon extern</DOCINDEXNAAM2>
<DOCINDEXNAAM3>Intern</DOCINDEXNAAM3>
<DOCINDEXNAAM4>Project</DOCINDEXNAAM4>
<DOCINDEXNAAM5>Soort</DOCINDEXNAAM5>
<DOCINDEXNAAM6>Onderwerp</DOCINDEXNAAM6>
<DOCINDEXNAAM7>Trefwoorden</DOCINDEXNAAM7>
<DOCINDEXNAAM8>Documentdatum</DOCINDEXNAAM8>
<DOCUMENT>
<BRON>conversie Original DocId: 7875</BRON>
<INDEXEERDATUM>2001-03-07</INDEXEERDATUM>
<DOCINDEXWAARDE1>Index01</DOCINDEXWAARDE1>
<DOCINDEXWAARDE2>Index02</DOCINDEXWAARDE2>
<DOCINDEXWAARDE3>Index03</DOCINDEXWAARDE3>
<DOCINDEXWAARDE4>Index04</DOCINDEXWAARDE4>
<DOCINDEXWAARDE5>Index05</DOCINDEXWAARDE5>
<DOCINDEXWAARDE6>Index06</DOCINDEXWAARDE6>
<DOCINDEXWAARDE72001-03-07</DOCINDEXWAARDE7
<FILE>MultiPage.TIF</FILE>
<TYPE>TIF</TYPE>
</DOCUMENT>
</DOCUMENTS>

Note the <FILE> TAGS.

Sorry for the bad markup. Couldnt find the XML markup TAG :-)
 
If by ignore you mean remove then yes. If by ignore you mean leave them in the XML and dont use them no.
In the converted XML File there has to be only one <FILE> and <TYPE> Tag
 
A stylesheet similar to
Code:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
	<DOCUMENTS>
		<VERSION>
			<xsl:value-of select="DOCUMENTS/VERSION"/>
		</VERSION>
		<LICENTIEHOUDER>
			<xsl:value-of select="DOCUMENTS/LICENTIEHOUDER"/>
		</LICENTIEHOUDER>
		<XTN>
			<xsl:value-of select="DOCUMENTS/XTN"/>
		</XTN>
		<ARCHIEFNAAM>
			<xsl:value-of select="DOCUMENTS/ARCHIEFNAAM"/>
		</ARCHIEFNAAM>
		<DOCINDEXNAAM1>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM1"/>
		</DOCINDEXNAAM1>
		<DOCINDEXNAAM2>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM2"/>
		</DOCINDEXNAAM2>
		<DOCINDEXNAAM3>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM3"/>
		</DOCINDEXNAAM3>
		<DOCINDEXNAAM4>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM4"/>
		</DOCINDEXNAAM4>
		<DOCINDEXNAAM5>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM5"/>
		</DOCINDEXNAAM5>
		<DOCINDEXNAAM6>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM6"/>
		</DOCINDEXNAAM6>
		<DOCINDEXNAAM7>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM7"/>
		</DOCINDEXNAAM7>
		<DOCINDEXNAAM8>
			<xsl:value-of select="DOCUMENTS/DOCINDEXNAAM8"/>
		</DOCINDEXNAAM8>
		<DOCUMENT>
			<BRON>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/BRON"/>
			</BRON>
			<INDEXEERDATUM>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/INDEXEERDATUM"/>
			</INDEXEERDATUM>
			<DOCINDEXWAARDE1>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE1"/>
			</DOCINDEXWAARDE1>
			<DOCINDEXWAARDE2>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE2"/>
			</DOCINDEXWAARDE2>
			<DOCINDEXWAARDE3>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE3"/>
			</DOCINDEXWAARDE3>
			<DOCINDEXWAARDE4>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE4"/>
			</DOCINDEXWAARDE4>
			<DOCINDEXWAARDE5>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE5"/>
			</DOCINDEXWAARDE5>
			<DOCINDEXWAARDE6>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE6"/>
			</DOCINDEXWAARDE6>
			<DOCINDEXWAARDE7>
				<xsl:value-of select="DOCUMENTS/DOCUMENT/DOCINDEXWAARDE7"/>
			</DOCINDEXWAARDE7>
			<xsl:choose>
			<xsl:when test="count(DOCUMENTS/DOCUMENT/FILE) > 1">
				<FILE>
					<xsl:value-of select="Multipage Tif"/>
				</FILE>
				<TYPE>
					<xsl:value-of select="Tif"/>
				</TYPE>
				</xsl:when>
				<xsl:otherwise>
					<FILE>
						<xsl:value-of select="DOCUMENTS/DOCUMENT/FILE"/>
					</FILE>
					<TYPE>
						<xsl:value-of select="DOCUMENTS/DOCUMENT/TYPE"/>
					</TYPE>
				</xsl:otherwise>
			</xsl:choose>
		</DOCUMENT>
	</DOCUMENTS>
</xsl:template>
</xsl:stylesheet>
should do the trick (not really tested it). If you need to do this via code then have a look at the System.Xml.Xsl.XslTransform class as this will allow you to load an xml document and a stylesheet and save the resultant document somewhere.
 
Last edited by a moderator:
Ok, If I understand this thing wright:

1) Create an Create an XsltTransform Object
2) Load the StyleSheet
3) Create an XPAth Document
4) Load the XML Data in to de Xpath Document
5) Transform the data

The thing I dont understand is: I can load the original XML data in to an Xpath document and through the transform I can export it to an other xml file with the same deminesions as the Stylesheet. But how does the stylesheet knows which TAGS need to be merged? or am I getting it completly wrong..? :confused:
 
Just updated the code in the original - the previous didnt generate the Multipage tif entry correctly - you might want to try the new version.
 
Back
Top