Inserting zero values into a table

Question

I am trying to get the last column into an NxN table, but there are missing zero values. To get to the table I think I can just use awk/xarg, but would need the missing zeros? The first two columns are just identifiers in the original raw data files. In this case the first column goes from 1 to 2, the second column goes 1 to 5 and the last column is the actual data where missing zeros are needed to be inserted. The identifiers are always the same length as it corresponds to a row and column. In practice there are actually 1000's of lines of data but the above example is a reduced example and would work for the true data set.

Edit: To clarify, by 1000's I mean to first column would range from 1-1000's and the second would also range from 1-1000's. But the lines are missing that I want to add with a zero value in the third column. However I think if it can be done for this example below then it can be done for a larger file.

data set

Expected data set

I tried using a suggestion from here using python Credit - heemayl

with open('test.sum') as f:
    check = 0
    for line in f:
        if int(line.split()[1]) == check + 1:
            check = int(line.split()[1])
            print line.rstrip()
        else:
            check = int(line.split()[1])
            print int(line.split()[1]) - 1, '\t0'
            print line.rstrip()
    print int(line.split()[1]) + 1, '\t0'
    print int(line.split()[1]) + 2, '\t0'

But it appears it adds a zero between my rows where 1 and 2 meet (in the first column) and also I can't seem to get it to work on 3 columns. Open to awk or any simplier ideas however!

Many thanks for any help!

How can we know when a line is missing a value? Should we be looking at the second column? You need to tell us these things, don't expect us to guss from your data. If we do need to look at the 2nd column, how can we know if everything is OK? Should there always be 5 values for each value in the 1st column? Can there be more? Less? Please [edit] your question and clarify what you need. — terdon, Aug 21 '19 at 10:35
The first two columns are just identifiers in the original raw data files. In this case the first column goes from 1 to 2, the second column goes 1 to 5 and the last column is the actual data where missing zeros are needed to be inserted. The identifiers are always the same length as it corresponds to a row and column. In practice there are actually 1000's of lines of data but the above example is a reduced example. Thanks! — f4r7, Aug 21 '19 at 10:48
please edit your question to clarify. If you have thousands, how can the first column only take two values? I think you want to ensure that each value in column one has 5 rows so add a new row for any column one value that doesn't have 5. But that doesn't make sense if you can only have 2 values in column 1. — terdon, Aug 21 '19 at 11:04

score 0 · Accepted Answer · answered Aug 21 '19 at 14:06

0

Probably not the most performant for 1000x1000 lines, but it will do the job and should be fairly easy to understand:

for i in {1..2}; do
    for j in {1..5}; do
        grep "^$i[[:blank:]]*$j" file || printf '%s\t%s\t0\n' $i $j
    done
done

answered Aug 21 '19 at 14:06

pLumo

22,231
2
41
66

Thank you that worked great! – f4r7 Aug 21 '19 at 17:45

Inserting zero values into a table

1 Answers1