Skip first 3 bytes of a file

Question

I am using AIX 6.1 ksh shell.

I want to use one liner to do something like this:

cat A_FILE | skip-first-3-bytes-of-the-file

I want to skip the first 3 bytes of the first line; is there a way to do this?

Jonathan Leffler · Accepted Answer · 2017-02-03T23:40:03.197

28

Old school — you could use dd:

dd if=A_FILE bs=1 skip=3

The input file is A_FILE, the block size is 1 character (byte), skip the first 3 'blocks' (bytes). (With some variants of dd such as GNU dd, you could use bs=1c here — and alternatives like bs=1k to read in blocks of 1 kilobyte in other circumstances. The dd on AIX does not support this, it seems; the BSD (macOS Sierra) variant doesn't support c but does support k, m, g, etc.)

There are other ways to achieve the same result, too:

sed '1s/^...//' A_FILE

This works if there are 3 or more characters on the first line.

tail -c +4 A_FILE

And you could use Perl, Python and so on too.

edited Feb 03 '17 at 23:40

answered Oct 24 '12 at 15:38

Jonathan Leffler

1,479
13
14

Thanks for your help. Both the sed and the tail commands work in AIX 6.1. For the dd command, it should be `dd if=A_FILE bs=1 skip=3` in AIX 6.1 – Alvin SIU Oct 25 '12 at 13:55
You may want to use standard input as such cat A_FILE | tail -c +4 with gnu. – MUY Belgium Nov 08 '13 at 07:57

score 24 · Answer 2 · answered Oct 24 '12 at 15:29

24

Instead of using cat you can use tail as such:

tail -c +4 FILE

This will print out the entire file except for the first 3 bytes. Consult man tail for more information.

answered Oct 24 '12 at 15:29

squiguy

341
2
5

Don't know about AIX, but on Solaris you must use `/usr/xpg4/bin/tail`, at least on my machine. Good tip nonetheless! – BellevueBob Oct 24 '12 at 19:34
1

@BobDuell It's hard to post something that is compatible with every OS. – squiguy Oct 24 '12 at 20:10
Yes, it works in AIX 6.1 – Alvin SIU Oct 25 '12 at 13:54
@AlvinSIU Good to know. Glad I could help. – squiguy Oct 25 '12 at 15:43
Thank you, this is a much better choice for working with large files with a tiny amount of garbage at the beginning. I used `dd` over an ssh connection to get a file image and I needed to remove the "[sudo] password for X:" at the beginning of the resulting file. – Compholio Jun 19 '22 at 23:11

score 1 · Answer 3 · answered Feb 04 '17 at 03:59

If one has Python on their system, one can use small python script to take advantage of seek() function to start reading at the nth byte like so:

#!/usr/bin/env python3
import sys
with open(sys.argv[1],'rb') as fd:
    fd.seek(int(sys.argv[2]))
    for line in fd:
        print(line.decode().strip())

And usage would be like so:

$ ./skip_bytes.py input.txt 3

Note that byte count starts at 0 (thus first byte is actually index 0), thus by specifying 3 we're effectively positioning the reading to start at 3+1=4th byte

score 0 · Answer 4 · answered Feb 04 '16 at 03:42

I needed to recently do something similar. I was helping with a field support issue and needed to let a technician see real time plots as they were making changes. The data is in a binary log that grows throughout the day. I have software that can parse and plot the data from logs, but it is currently not real time. What I did was capture the size of the log before I started processing the data, then went into a loop that would process the data and each pass create a new file with the bytes of the file that had not yet been processed.

#!/usr/bin/env bash

# I named this little script hackjob.sh
# The purpose of this is to process an input file and load the results into
# a database. The file is constantly being update, so this runs in a loop
# and every pass it creates a new temp file with bytes that have not yet been
# processed.  It runs about 15 seconds behind real time so it's
# pseudo real time.  This will eventually be replaced by a real time
# queue based version, but this does work and surprisingly well actually.

set -x

# Current data in YYYYMMDD fomat
DATE=`date +%Y%m%d`

INPUT_PATH=/path/to/my/data
IFILE1=${INPUT_PATH}/${DATE}_my_input_file.dat

OUTPUT_PATH=/tmp
OFILE1=${OUTPUT_PATH}/${DATE}_my_input_file.dat

# Capture the size of the original file
SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

# Copy the original file to /tmp
cp ${IFILE1} ${OFILE1}

while :
do
    sleep 5

    # process_my_data.py ${OFILE1}
    rm ${OFILE1}
    # Copy IFILE1 to OFILE1 minus skipping the amount of data already processed
    dd skip=${SIZE1} bs=1 if=${IFILE1} of=${OFILE1}
    # Update the size of the input file
    SIZE1=`ls -l ${IFILE1} | awk '{print $5}'`

    echo

    DATE=`date +%Y%m%d`

done

If only because I'm in that kind of mood, and don't like coding against the output of `ls`; have you considered using `stat -c'%s' "${IFILE}"` instead of that `ls|awk` combo? That is, assuming GNU coreutils... — jimbobmcgee, Oct 26 '16 at 18:51

Skip first 3 bytes of a file

4 Answers4