Windows 7 Question about NTFS and fragmentation of MFT records

  • Thread starter Thread starter LGS
  • Start date Start date
L

LGS

Guest
Context

In a small file stored on an NTFS partition, the locations of the clusters in use by the file are stored in the same MFT record as the rest of the information about the file (name, size, last modified, etc). However, that list of locations can grow to be very large if the file becomes very fragmented. For example: instead of storing 1 "data run" that starts at LCN #300,000 and runs for 1,000 clusters (which NTFS can store very efficiently), you could (theoretically) have 1,000 data runs, each 1 cluster long, positioned all over the disk.

Now, as the number of data runs climbs, eventually they won't all fit in that record with the rest of the data. When that happens, NTFS allocates a second record in the MFT, and has the "base" record point to it (via an ATTRIBUTE_LIST). As the file/fragmentation gets even larger, 1 record might not be enough, so NTFS will allocate even more records. When using the default size of MFT records (1k), I'm seeing a max of ~200 data runs on a page.

The problem

I'm seeing files that have multiple records allocated to hold data runs (which I would expect). But instead of the ~200 data runs per record I'm expecting, each of these MFT records only holds a single data run. In an extreme example, I've got a file with 637 MFT records allocated, all with exactly 1 data run on them. So instead of taking up 4 records in the MFT, it's using 637. Which means that when I walk the file, I'll not only be reading each of the pages of data from the file, NTFS is going to have to do an additional 637 reads to find out where the data is. Ouch.

My questions

  1. What is happening that causes this to happen to some files and not others? And even to some parts of a file and not others (I've got a file that has 6 records with 1 data run apiece, and another 7 records that are completely full).
  2. (More importantly) What API can I use to "defrag" these 637 records back to the 4 it should take?

Things that don't work

  • Using FSCTL_MOVE_FILE to defrag the file will move the clusters that hold the file data next to each other. But it will NOT cause the MFT records to coalesce. Intentionally fragging then defragging the file data doesn't work either.
  • "fsutil repair initiate" on an affected file does not cause the records to coalesce. Presumably the associated DeviceIoControl won't help either.
  • Presumably copying the file, deleting the original, and renaming the copy would work. But this is not a practical solution. I need to be able to tell NTFS to clean up the file's records without copying gigabytes of data around.
  • FSCTL_INITIATE_FILE_METADATA_OPTIMIZATION sounds like it might do what I need (from the name). But unfortunately it is only supported on W10 and is totally undocumented. I need a solution that works for W7 & up. Documentation is also good.

Tidbits

  • I'm seeing this behavior on 2 W7 machines and a W8.
  • The more use the computer has seen, the more affected files there are.
  • Oddly, c:\Windows\inf\setupapi.dev.log shows the problem on all three machines.
  • One of the machines has an SSD, the others do not.
  • The files are neither compressed nor sparse.

More...
 
Back
Top