18

I have a 900GB ext4 partition on a (magnetic) hard drive that has no defects and no bad sectors. The partition is completely empty except for an empty lost+found directory. The partition was formatted using default parameters except that I set the number of reserved filesystem blocks to 1%.

I downloaded the ~900MB file xubuntu-15.04-desktop-amd64.iso to the partition's mount point directory using wget. When the download was finished, I found that the file was split into four fragments:

filefrag -v /media/emma/red/xubuntu-15.04-desktop-amd64.iso
Filesystem type is: ef53
File size of /media/emma/red/xubuntu-15.04-desktop-amd64.iso is 1009778688 (246528 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   32767:      34816..     67583:  32768:            
   1:    32768..   63487:      67584..     98303:  30720:            
   2:    63488..   96255:     100352..    133119:  32768:      98304:
   3:    96256..  126975:     133120..    163839:  30720:            
   4:   126976..  159743:     165888..    198655:  32768:     163840:
   5:   159744..  190463:     198656..    229375:  30720:            
   6:   190464..  223231:     231424..    264191:  32768:     229376:
   7:   223232..  246527:     264192..    287487:  23296:             eof
/media/emma/red/xubuntu-15.04-desktop-amd64.iso: 4 extents found

Thinking this might be releated to wget somehow, I removed the ISO file from the partition, making it empty again, then I copied the ~700MB file v1.mp4 to the partition using cp. This file was fragmented too. It was split into three fragments:

filefrag -v /media/emma/red/v1.mp4
Filesystem type is: ef53
File size of /media/emma/red/v1.mp4 is 737904458 (180153 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   32767:      34816..     67583:  32768:            
   1:    32768..   63487:      67584..     98303:  30720:            
   2:    63488..   96255:     100352..    133119:  32768:      98304:
   3:    96256..  126975:     133120..    163839:  30720:            
   4:   126976..  159743:     165888..    198655:  32768:     163840:
   5:   159744..  180152:     198656..    219064:  20409:             eof
/media/emma/red/v1.mp4: 3 extents found

Why is this happening? And is there a way to prevent it from happening? I thought ext4 was meant to be resistant to fragmentation. Instead I find that it immediately fragments a solitary file when all the rest of the volume is unused. This seems to be worse than both FAT32 and NTFS.

EmmaV
  • 3,985
  • 4
  • 30
  • 61
  • 4
    I'm trying to imagine under what circumstances this could possibly matter, and I'm coming up empty. – Greg Hewgill May 18 '15 at 01:53
  • 5
    @GregHewgill: It mattered because I thought it was abnormal. Now I know that it's normal, it doesn't matter. – EmmaV May 18 '15 at 03:00

2 Answers2

16

3 or 4 fragments in a 900mb file is very good. Fragmentation becomes a problem when a file of that size has more like 100+ fragments. It isn't uncommon for fat or ntfs to fragment such a file into several hundred pieces.

You generally won't see better than that at least on older ext4 filesystems because the maximum size of a block group is 128 MB, and so every 128 MB the contiguous space is broken by a few blocks for the allocation bitmaps and inode tables for the next block group. A more recent ext4 feature called flex_bg allows packing a number of ( typically 16 ) block groups' worth of these tables together, leaving longer runs of allocatable blocks but depending on your distribution and what version of e2fsprogs was used to format it, this option may not have been used.

You can use tune2fs -l to check the features enabled when your filesystem was formatted.

psusi
  • 17,007
  • 3
  • 40
  • 51
  • Very interesting. I assumed all the inode tables etc. were at the start of the volume. – EmmaV May 18 '15 at 03:02
  • 1
    @EmmaV distributing them across the disk, relatively close to the data they refer to, results in shorter seeks and faster disk access :) – hobbs May 18 '15 at 16:22
  • are you sure NTFS have such large number of fragments? It has a free space bitmap and a different allocation algorithm so it doesn't produce so many fragments as the FAT family – phuclv Mar 15 '22 at 02:58
  • @phuclv, I haven't seen any evidence that they use a better allocation strategy. It isn't so much of an issue of the filesystem itself ( in other words, NTFS having an allocation bitmap doesn't matter ), but rather the allocation strategy Microsoft's filesystem drivers have always used is terrible. Rather than improve their allocation strategy, they just bought a defrag program from another developer that created it and bundled it with Windows and even set it to automatically run periodically by default these days. If you run it manually you can see how much fragmentation there is. – psusi Mar 31 '22 at 19:35
  • yes I used various defragmenters regularly since Windows 9x and even watch them running for hours. I have no need to care about them for years because NTFS also use extents like ext4 and never produces hundreds of fragments unless the disk is severely full and there's no space for the MFT to grow, in which case ext4 also has the same limimtation – phuclv Apr 01 '22 at 01:33
  • @phuclv, I've always seen relatively small files with hundreds or thousands of fragments for no apparent reason on an NTFS volume that has never been anywhere near full going back to NT 4.0 or was it 3.51 when you had to buy Disk Keeper yourself because MS hadn't yet bought it and included it with Windows. – psusi Apr 01 '22 at 14:58
  • obviously the NTFS version in 4.0 or 3.5 isn't the same as the one nowadays. That era may even has 4KB MFT entry instead of the modern small MFT record and lacks lots of new features. NTFS has supported extents long before ext4 so there's no reason to use it. Have you ever run `fsutil` to check the real number of fragments or just seen the number from some buggy implementation? How long is each fragment run? – phuclv Apr 01 '22 at 16:27
  • @phuclv, the format of the filesystem hasn't changed over the years. MFT records have always been 1kb. They may have finally improved the allocation algorithm the driver uses lately but I doubt it. I haven't run Windows in several years now so I don't know. Whether the allocations are stored sequentially or as an extent doesn't really matter; event ext2 using triple indirect blocks circa 1995 was much better about keeping fragmentation levels low than Windows with NTFS or FAT. I still maintain the ancient `e2defrag` program just for fun and got it working on ext4. – psusi Apr 01 '22 at 17:29
  • then you know nothing about NTFS. In older NTFS versions the MFT record was 4KB long, and they don't have transactions, variable compression algorithm, new features in reparse point... Each new version of Windows adds some new features to NTFS. NTFS in Windows 10 is very different from NTFS in XP, let alone NT 4, for example One Drive can't run on old NTFS – phuclv Apr 01 '22 at 17:34
  • @phuclv, according to my copy of Inside NTFS by Helen Custer, published in 1994 by Microsoft Press, they were 1KB long and there were transactions. I also used hex editors to manipulate them back then for myself. Blocks were allocated 4kb at a time, but the MFT subdivided them into 1KB records. Reparse points are a new feature but they just added a new attribute to the extensible list of attributes the original format allowed for. An older implementation wouldn't know what to do with them but would otherwise not have a problem mounting the filesystem and accessing regular files. – psusi Apr 01 '22 at 17:53
  • @psusi NTFS 1.0 has 4KB MFT and MS only changed to 1KB later to save space. And it's possible to change the MFT record in modern NTFS with some tools. There's a length field where positive values mean record size is a multiple of block size and negative values mean the block is divided into multiple MFT records – phuclv Apr 02 '22 at 01:08
9

I can't truly answer but I think this might help:

Notice how each fragment is, at most, 32768 blocks in size (a power of 2, that should raise a flag that something is going on, and also give you a hint for something to look for).

Also worth noting, those physical offsets between extents are pretty close to each other.

From: Ext4 Disk Layout

An ext4 file system is split into a series of block groups. To reduce performance difficulties due to fragmentation, the block allocator tries very hard to keep each file's blocks within the same group, thereby reducing seek times. The size of a block group is specified in sb.s_blocks_per_group blocks, though it can also calculated as 8 * block_size_in_bytes. With the default block size of 4KiB, each group will contain 32,768 blocks, for a length of 128MiB

And further down:

The first tool that ext4 uses to combat fragmentation is the multi-block allocator. When a file is first created, the block allocator speculatively allocates 8KiB of disk space to the file [...] A second related trick that ext4 uses is delayed allocation. Under this scheme, when a file needs more blocks to absorb file writes, the filesystem defers deciding the exact placement on the disk until all the dirty buffers are being written out to disk. By not committing to a particular placement until it's absolutely necessary (the commit timeout is hit, or sync() is called, or the kernel runs out of memory), the hope is that the filesystem can make better location decisions.

So I'd say the allocator only cares about data locality within the block group (those 32K blocks), but not about block groups being contiguous to each other.

outlyer
  • 1,093
  • 7
  • 11
  • The first quote you gave answers my question. – EmmaV May 18 '15 at 03:06
  • 1
    Each *extent* has a maximum of 32k blocks because that is the maximum length an extent descriptor can cover. Extents are not fragments. If you notice several of the extents' physical blocks immediately follow those of the previous extent, and so do not constitute a fragment ( 6 extents vs 3 fragments ). – psusi May 18 '15 at 13:05