xml into sql server - advice needed please!

jgm

New member
Joined
Mar 20, 2006
Messages
2
Hi,

Im new to all of this - vb, .net, xml, etc. In fact the last programming I did was about 10 years ago. So please forgive any noobie-ness.

Im using vb in vs 2005, and I want to read in xml files, validate them against a schema, and insert the data into a sql server express database.

The files are quite large - 20-50MB, and Im told could get as large as 100MB+

Heres my questions:

Should I load the xml into a xmlDataDocument / dataset? Will it work ok with such large files? If I do this, Am I correct to think Ill need to create a new dataset schema which will define the relationships between the tables?

Or should I step through the xml node by node, and insert one row at a time into the database? Obviously I realise this is not very OO - there must be a better way...?

The xml data has no key / id columns. The only unique identifier would be a combination of columns. If I take the second approach, I understand I could retrieve the last row ID from the database table and increment it for each new row I insert. If I use a dataset, how can I do this?


Sorry if these are basic questions, like I said - Im new to this. Any advice or links to examples will be very appreciated.

Thanks!!

JM

Heres the existing tables & schema. (I cannot change the schema, although I think it could probably be improved)

orders table
OrderID - Key
OrderField1
OrderField2
OrderField3
OrderDateTime
OrderField4

OrderItems table
ItemID - Key
OrderID - FK
ItemField1
ItemField2
ItemField3
ItemField4

Code:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
	<xs:element name="orders">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="order_details" maxOccurs="unbounded">
					<xs:complexType>
						<xs:sequence>
							<xs:element name="orderfield1" type="xs:integer"/>
							<xs:element name="orderfield2" type="xs:string"/>
							<xs:element name="orderfield3" type="xs:string"/>
							<xs:element name="orderdate">
								<xs:complexType>
									<xs:sequence>
										<xs:element name="year" type="xs:integer"/>
										<xs:element name="month" type="xs:integer"/>
										<xs:element name="day" type="xs:integer"/>
										<xs:element name="hour" type="xs:integer"/>
										<xs:element name="minute" type="xs:integer"/>
										<xs:element name="second" type="xs:integer"/>
									</xs:sequence>
								</xs:complexType>
							</xs:element>
							<xs:element name="orderfield4">
								<xs:simpleType>
									<xs:restriction base="xs:string">
										<xs:enumeration value="option1"/>
										<xs:enumeration value="option2"/>
										<xs:enumeration value="option3"/>
									</xs:restriction>
								</xs:simpleType>
							</xs:element>
							<xs:element name="order_items" maxOccurs="unbounded">
								<xs:complexType>
									<xs:sequence>
										<xs:element name="itemfield1" type="xs:string"/>
										<xs:element name="itemfield2" type="xs:string"/>
										<xs:element name="itemfield3" type="xs:string"/>
										<xs:element name="itemfield4" type="xs:string"/>
									</xs:sequence>
								</xs:complexType>
							</xs:element>
						</xs:sequence>
					</xs:complexType>
				</xs:element>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
</xs:schema>
 
If you can load the XML into a DataSet / DataTable then you could always take advantage of the System.Data.SqlClient.SqlBulkCopy class to make the actual inserting easier.

If the documents get too large (really depends on available RAM) then an alternate solution may be required.

Also if the existing XML already has a defined schema you can use that within VS - otherwise you will benefit from creating a schema / strongly typed DataSet within VS.
 
Thanks for your comments. Ill check out the SqlBulkCopy class.

Id rather go the dataset way - it seems the correct approach and less work. Im just not sure how to maintain the relationships between the tables. Should I derive a dataset from the tables or from the schema?


Also if the existing XML already has a defined schema you can use that within VS - otherwise you will benefit from creating a schema / strongly typed DataSet within VS.

Is this just a matter of taking the existing schema and adding codegen & msdata attributes to define the relationships etc?

What about adding ID / key fields? All the examples Ive seen have a unique ID element in each record in the xml already.

Thanks again,

JM
 
Last edited by a moderator:
Back
Top