FILE size limitation according to Robert Love's textbook

Question

From Robert Love's Linux System Programming (2007, O'Reilly), this is what is given in the first paragraph (Chapter 1, Page 10):

The file position’s maximum value is bounded only by the size of the C type used to store it, which is 64-bits in contemporary Linux.

But in the next paragraph he says:

A file may be empty (have a length of zero), and thus contain no valid bytes. The maximum file length, as with the maximum file position, is bounded only by limits on the sizes of the C types that the Linux kernel uses to manage files.

I know this might be very, very basic, but is he saying that the file size is limited by the FILE data type or the int data type?

goldilocks · Accepted Answer · 2014-07-17T10:41:22.777

5

He's saying it's bound by a 64-bit type, which has a maximum value of (2 ^ 64) - 1 unsigned, or (2 ^ 63) - 1 signed (1 bit holds the sign, +/-).

The type is not FILE; it's what the implementation uses to track the offset into the file, namely off_t, which is a typedef for a signed 64-bit type.¹ (2 ^ 63) - 1 = 9223372036854775807. If a terabyte is 1000 ^ 4 bytes, that's ~9.2 million TB. Presumably the reason a signed type is used is so that it can hold a value of -1 (for errors, etc), or a relative offset.

Functions like fseek() and ftell() use a signed long, which on 64-bit GNU systems is also 64-bits.

^{1. See types.h and typesizes.h in /usr/include/bits.}

edited Jul 17 '14 at 10:41

answered Jul 16 '14 at 12:38

goldilocks

86,451
30
200
258

2

It is bound by type `off_t`. This will be `typedef`ed to int64 for the next few years. However history tells us that this will change, it used to be 32 bit. So always use the correct type `off_t` or your program will become obsolete. (not `size_t`: in 32bit system `size_t` is 32bit, `off_t` is not) – ctrl-alt-delor Jul 16 '14 at 13:56
@richard Thanks for the clarification -- corrected. – goldilocks Jul 16 '14 at 14:28
The range of a N-bit integer type without special tricks is `0 .. (2^N - 1)` for unsigned, and `-(2^(N-1)) .. +(2^(N-1) - 1)` for signed, all inclusive, given a two-complement architecture (not guaranteed with C). Both allow 2^N discrete values, but the range is shifted. Hence, unsigned 16 bits integer is 0..65535 inclusive (2^16 = 65536), and signed 16 bits integer is -32768..32767 inclusive (2^15 = 32768). For longer integers, just use larger two-exponents. – user Jul 16 '14 at 14:36
As for fseek() in particular, note that it can seek from the current position or file end, hence the requirement to be able to take a negative offset. – user Jul 16 '14 at 14:39
@MichaelKjörling Right -- I forgot the zero. So 9223372036854775807. Added -1 to the range formula. – goldilocks Jul 16 '14 at 15:38
@goldilocks That's what comments are for :) – user Jul 16 '14 at 17:32

FILE size limitation according to Robert Love's textbook

1 Answers1