While looking through some filesystems to see what consumes disk space I found out a directory called GNUSparseFile.0. I can't easily find out if its contents are used. Can it be some temporary leftover from untarring? The OS is FreeBSD so it may have been untarred with BSD tar.
- 183
- 1
- 7
-
I don't understand a word of it, but the GNU tar manual has a section on it: https://www.gnu.org/software/tar/manual/html_node/Sparse-Recovery.html – Paul_Pedant Apr 23 '23 at 16:51
-
I read that and also understood nothing. :) – filo Apr 23 '23 at 16:54
-
filo, do you know what a sparse file is? (in general, not in the context of GNU tar) – Marcus Müller Apr 23 '23 at 17:12
-
I get the idea of sparse files. The problem I am facing is a cleanup of a web server. I don't think that anybody before me used clever ways to store and serve sparse files. I suspect that this directory maybe a leftover from using a wrong option during creation of the tarball, or extraction, or Linux/BSD tar mismatch. – filo Apr 23 '23 at 17:16
1 Answers
I've been bitten by this before.
You're spot on. GNU tar's manual tries to sell you a lie:
They claim to have found a portable way to archive sparse files, without expanding the sparse regions into zeros. You can extract the archive correctly with any implementation of tar.
Now, the tar format has no option to specify something like "the region from A to be is zeros and to be represented sparse on extraction". So, what they did, instead of adding something to the file header (not much space there, to be honest) to solve that (and let an unpacker notify the user if they can't unpack), or integrating the compression actually into their tar tool instead of piping the data through gzip or similar, is that they invented a new data structure, a contianer, that contains first a map of the sparse regions of a file, and the non-sparse rest of that file.
Then they archive such a "pseudo-sparse container" when you use GNU tar and tell it to preseve sparsity.
Of course, upon extraction, unless you use the same implementation, you get the container – and not the sparse file. Which, to say the least, might be slightly surprising to the user and breaks the idea of "portability" (namely, the data that went in is the data that came out, on any machine with any tar). Also note how badly the interface to xsparse is designed here; you're supposed to know the original sparse file name, and will have to look it up manually :/.
Probably someone unpacked such a GNU tar file containing such a sparse-file-container, then later cleaned up the files they were expecting to be in that archive (possibly through a file list that came to them in a different way than just the tar archive). They were expecting the original file – which then during cleanup possibly could not be found – but not the sparse container, so to them it looked like the sparse container wasn't from what they unpacked, and they ignored it.
So, that's what you got here: something that came from a GNU tar-created TAR, and which never got finally unpacked from its container.
- 21,602
- 2
- 39
- 54