Efficient binary file writing

CryoEnix

Well-known member
Joined
Jan 11, 2003
Messages
93
Location
Wrexham, Wales
Hey all!

Im currently working on a file downloading program, and am brainstorming on the best way to write the files to disk. Heres a hypothetical:

I have a file to download, lets say for arguments sake its exactly one gig in size. before I begin the download, I set a filestream to pad out the target file with a gigs worth of zeros, to create the initial file.

Now, say I run through the download and after every filestream.write() that amounts to 2 meg or so, I flush the file to actually write this data to disk. Will this flush simply replace the zeros in question with the new binary data (the GOOD scenario), or will the entire one gig file be rewritten in its entirety (the BAD scenario)?

Any further insight into efficient, potentially large file writing would help - cheers in advance!
 
Are you currently doing something that is experiencing performance issues or is this just something you are thinking about?

The FileStream class itself is a buffered stream and as such you may not need to worry about buffering the data yourself. If you want to create the file and just add data to the file as you go then you might consider creating a Sparse File, although this option is only available on ntfs volumes.
 
You could always use performance monitor (Start -> Run -> perfmon) to do some testing to see if writing a 2MB file uses similar Disk IO as updating 2MB of your 1GB file.
 
You think that overwriting data is faster than writing data? Why?

I dont see a point in filling up a file beforehand.

Also, what happens if the download gets interrupted? How do you tell where the download left off? What if the actual file content contained streams of zeroes?

I would say that having mechanisms that handle interruptions and failures is the most important part. Disk access performance is almost outside of your control, since you are using .Net.


To answer your original question, using StreamWriter, the area of the file not written to was preserved.

When you get some results, please share the knowledge.

Code:
namespace FileOverwriteTest
{
    using System.IO;

    class Program
    {
        static void Main(string[] args)
        {
            FileStream fs = new FileStream("test", FileMode.OpenOrCreate, FileAccess.Write);
            StreamWriter writer = new StreamWriter(fs);

            writer.Write(new string(0, 30000));
            writer.Close();
            fs.Close();

            FileStream bs = new FileStream("test", FileMode.Open, FileAccess.Write);
            StreamWriter lwriter = new StreamWriter(bs);

            lwriter.Write(new string(1, 1000));
            lwriter.Close();
            bs.Close();
        }
    }
}
 
It can make sense to write zeros initially to prevent disk fragmentation, but I agree that there are other things to take into consiteration.

Bittorrent uses HTTP to transfer the hash count of the file and then the application writes zeros to the file and downloads the parts necessary based on the hash.
 
Guys, thanks for all your responses. Its all hypothetical at the moment, and I was thinking about how Azureus (a bit torrent client) initially builds an output file in zeros before writing to it.

Ill take your points into consideration and let you know how it goes - cheers to all of you!
 
Back
Top