File compare

talahaski

Active member
Joined
Apr 29, 2004
Messages
35
Hi, I am really-really new to VB .NET 2003, to the point where I dont know much syntax yet but Im learning by working through some problems and looking at example. Anyway I was hoping somebody can help me by actually working through the full code with me on how to do this:

I have a directory with many files, some of these files are duplicates of each other with the only difference being the name of the file itself. I would like to create a way to loop through all files and if the file is unique, move the file into a seperate directory. So when done, the second directory will contain 1 version of each file without any duplicates.

There are a few other things I want to do, but any help getting me started would be great. Some other things I want to do is to rename every file placed into the unique directory with a sequential 1 up number. So the first file would be 1.dat, the second 2.dat...

I know this is asking for a lot, feel free to limit your response.

Oh, I forgot, these files are binary files, NOT text files so a text compare will not work.
 
Last edited by a moderator:
Forum is not unactive, some people simply dont know the answer or dont notice a particular thread :).

You could generate hashes of the files using some of the cryptography classes which make it easy to make a hash from an IO stream. All you would have to then do it compare the results. This method is used a lot to check downloaded files for corruption (mostly by the open source community).
An example:
Code:
   create a new object that will compute the hash for you
   Dim h As New Security.Cryptography.MD5CryptoServiceProvider
   declare an array of bytes that will store the produced hash
   The ComputeHash method takes an IO stream as an argument, just what you need
   Dim res As Byte() = h.ComputeHash(New IO.FileStream("path to the file", IO.FileMode.Open))
   decode the bytes if you want so you can easily compare them later
   MessageBox.Show(System.Text.ASCIIEncoding.ASCII.GetString(res))
This should work for you as in theory no two different sets of data can create the same hash, but some people are trying to disprove the theory :).
 
So what your saying is I would need to create this hash for my primary file, and then loop through every file in the folder and create hashes for each of them, then compare the hash values.

get primary file

Dim FolderToSearch as string="c:\temp"

ofd.show()
Dim PrimaryHash As New Security.Cryptography.MD5CryptoServiceProvider

Dim PrimaryHashBytes As Byte() = PrimaryHash.ComputeHash(New IO.FileStream(ofd.filename, IO.FileMode.Open))

Loop for each file in folder FolderToSearch -- Not sure yet how to do this loop

Dim TempHash As New Security.Cryptography.MD5CryptoServiceProvider
Dim TempHashBytes As Byte() = TempHash.ComputeHash(New IO.FileStream(NextFileName, IO.FileMode.Open))

If PrimaryHashBytes=TempHashBytes then
messagebox("Files are the same")
end if


Does this appear correct? Can you help with the loop through the folder please.

Also, what kind of clean-up do I need to perform. If the folder has a lot of files, Im guessing opening all these files and hashing them is going to create a lot of overhead.
 

Similar threads

Back
Top