Preface
I just solved this problem myself with Python. It's simple in theory, but in practice, it actually requires quite a bit of code to do correctly. I wanted to share my work here so others don't have to figure this out by themselves.
The Simple (Bad) Way
The simplest method (posted previously by letmaik) is to load the file into memory as a string of bytes, use Python's .rstrp() to remove trailing null bytes from the bytestring, then save that bytestring over the original file.
def strip_file_blank_space(filename):
# Strips null bytes at the end of a file, and returns the new file size
# This will process the file all at once
# Open the file for reading bytes (then close)
with open(filename, "rb") as f:
# Read all of the data into memory
data = f.read()
# Strip trailing null bytes from the data in-memory
data = data.rstrip(b'\x00')
# Open the file for writing bytes (then close)
with open(filename, "wb") as f:
# Write the data from memory to the disk
f.write(data)
# Return the new file size
return(len(data))
new_size = strip_file_blank_space("file.bin")
This will probably work most of the time, assuming the file is smaller than the available system memory. But with larger files (32+ GB) or on systems with less RAM (Raspberry Pi), the process will either crash the computer, or it will be killed by the system memory manager.
The Difficult (Correct) Way
The only way around the limited memory problem is to load one small block of data at a time, process it, delete it from memory, and then repeat on the next small block until the whole file is processed. Normally you can do this in python with very compact code, but because we need to process in blocks from the end of the file, moving backward, it takes a bit more work.
I've done the work for you. Here it is:
import os
import shutil
from math import floor
import warnings
import tempfile
def strip_file_blank_space(filename, block_size=1*(1024*1024)):
# Strips null bytes at the end of a file, and returns the new file size
# This will process the file in chunks, to conserve memory (default = 1 MiB)
file_end_loc = None # This will be used if the file is larger than the block size
simple_data = None # This is used if the file is smaller than the block size
# Open the source file for reading
with open(filename, "rb") as f:
# Get original file size
filesize = os.fstat(f.fileno()).st_size
# Test if file size is less than (or equal to) the block size
if filesize <= block_size:
# Load data to do a normal rstrip all in-memory
simple_data = f.read()
# If the file is larger than the specified block size
else:
# Compute number of whole blocks (remainder at beginning processed seperately)
num_whole_blocks = floor(filesize / block_size)
# Compute number of remaining bytes
num_bytes_partial_block = filesize - (num_whole_blocks * block_size)
# Go through each block, looking for the location where the zeros end
for block in range(num_whole_blocks):
# Set file position, relative to the end of the file
current_position = filesize - ((block+1) * block_size)
f.seek(current_position)
# Read current block
block_data = f.read(block_size)
# Strip current block from right side
block_data = block_data.rstrip(b"\x00")
# Test if the block data was all zeros
if len(block_data) == 0:
# Move on to next block
continue
# If it was not all zeros
else:
# Find the location in the file where the real data ends
blocks_not_processed = num_whole_blocks - (block+1)
file_end_loc = num_bytes_partial_block + (blocks_not_processed * block_size) + len(block_data)
break
# Test if the end location was not found in the full blocks loop
if file_end_loc == None:
# Read partial block at the beginning of the file
f.seek(0)
partial_block_data = f.read(num_bytes_partial_block)
# Strip from the right side
partial_block_data = partial_block_data.rstrip(b"\x00")
# Test if this block (and therefore the entire file) is zeros
if len(partial_block_data) == 0:
# Warn about the empty file
warnings.warn("File was all zeros and will be replaced with an empty file")
# Set the location where the real data begins
file_end_loc = len(partial_block_data)
# If we are doing a normal strip:
if simple_data != None:
# Strip right trailing null bytes
simple_data = simple_data.rstrip(b'\x00')
# Directly replace file
with open(filename, "wb") as f:
f.write(simple_data)
new_filesize = os.fstat(f.fileno()).st_size
# Return the new file size
return len(simple_data)
# If we are doing a block-by-block copy and replace
else:
# Create temporary file (do not delete, will move myself)
temp_file = tempfile.NamedTemporaryFile(mode="wb", delete=False)
# Open the source file for reading
with open(filename, "rb") as f:
# Test if data is smaller than (or equal to) the block size
if file_end_loc <= block_size:
# Do a direct copy
f.seek(0)
data = f.read(file_end_loc)
temp_file.write(data)
temp_file.close()
# If the data is larger than the block size
else:
# Find number of whole blocks to copy
num_whole_blocks_copy = floor(file_end_loc / block_size)
# Find partial block data size (at the end of the file this time)
num_bytes_partial_block_copy = file_end_loc - (num_whole_blocks_copy * block_size)
# Copy whole blocks
f.seek(0)
for block in range(num_whole_blocks_copy):
# Read block data (automatically moves position)
block_data = f.read(block_size)
# Write block to temp file
temp_file.write(block_data)
# Test for any partial block data
if num_bytes_partial_block_copy > 0:
# Read remaining data
partial_block_data = f.read(num_bytes_partial_block_copy)
# Write remaining data to temp file
temp_file.write(partial_block_data)
# Close temp file
temp_file.close()
# Delete original file
os.remove(filename)
# Replace original with temporary file
shutil.move(temp_file.name, filename)
# Return the new file size
return(file_end_loc)
new_size = strip_file_blank_space("file.bin") # Defaults to 1 MiB blocks
As you can see, it takes many more lines of code, but if you're reading this, then those are lines you don't have to write now! You're welcome. :)
I've tested this function using 4+ GB files on a Raspberry Pi with 1 GB of RAM, and the process never used more memory than 50 MB in total. It took a while to process, but it worked flawlessly.
Conclusion
When programming, do be mindful of how much data you are loading into memory at any given time. Keep in mind the potential largest file size you'll be working with, and the potential lower limits of the memory available to you.
I hope this helps someone down the line!