6

I imagine that adding n xattrs of length l of to f files and d directories may generate costs:

  • storage
  • path resolution time / access time ?
  • iteration over directories? (recursive find over (fresh after reboot not-cached) filesystem?)

I wonder what are those costs? E.g. if tagging all files would significantly impact storage and performance? What are critical values below which it's negligible, and after which is hammering file-system?

For such analysis, obviously it would be nice to consider what are limits of xattr -> how much and how bit xattrs we can put on different filesystems.

(Be welcome to include bits regarding other filesystems than just ext4 and btrfs if you find it handy - Thank you)

Satō Katsura
  • 13,138
  • 2
  • 31
  • 48
Grzegorz Wierzowiecki
  • 13,865
  • 23
  • 89
  • 137
  • Did you wrote and run some *simple* benchmarks? – Basile Starynkevitch Sep 04 '17 at 18:52
  • Benchmarks sounds like good suggestion to get initial idea cheaply! Although in the end, I was more interested in deeper insights, e.g. including pointers to filesystem datastructures referring to places like [Ext4 Disk Layout](https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout) or [Btrfs design](https://btrfs.wiki.kernel.org/index.php/Btrfs_design). – Grzegorz Wierzowiecki Sep 05 '17 at 10:59
  • BTW, it is probably unimportant (except for pathological cases): for "small" files, all the data and metadata sits in the page cache; for "big" files, the disk IO (even with SSDs) is a big bottleneck. – Basile Starynkevitch Sep 05 '17 at 11:01
  • @BasileStarynkevitch that's why I wrote in question assumption that system has nothing in cache, that we assume that we either just rebooted or freshly inserted harddrive -> such analysis (even cases may be considered not typical workloads) are valuable to me, as I tend to use machines in non typical ways ;), so basically costs not always amortize as assumed for typical workloads. – Grzegorz Wierzowiecki Sep 05 '17 at 11:09

1 Answers1

4

For ext4 (I can't speak for BtrFS), storing small xattrs fit directly into the inode, and do not affect path resolution or directory iteration performance.

The amount of space available for "small" xattrs depends on what size the inodes are formatted as. Newer ext4 filesystems use a default inode size of 512 bytes, older ext4 filesystems used 256 bytes, less about 192 bytes for the inode itself and xattr header. The rest can be used for xattrs, though typically there are already xattrs for SELinux and possibly others ("getfattr -d -m - -e hex /path/to/file" will dump all xattrs on an inode). Any xattrs that do not fit into this space will be stored in an external block, or if they are larger than 4KB and you have a new kernel (4.18ish or newer) they can be stored in an external inode.

It is possible to change the inode size at format time with the "mke2fs -I <size>" option to provide more space for xattrs if xattr performance is important for your workload (e.g. Samba).

LustreOne
  • 1,555
  • 5
  • 19