This is similar to Shuffle two parallel text files
I have:
two large csv files with parallel lines. (they represent 'before' and 'after' states for particular items). The fields are sometimes strings, sometimes numbers.
a sufficiently long random data file to use with
shuf
when I want to get a matching random sample I thought of:
shuf -n10 --random-source="random.csv" "file1"
shuf -n10 --random-source="random.csv" "file2"
but these files no longer match.
However, if I put line-numbers in front, it solves the problem:
shuf -n10 --random-source="random.csv" <(cat -n "file1")
shuf -n10 --random-source="random.csv" <(cat -n "file2")
Can someone explain why?
here is sample of random.csv
0.293076138
0.446732207
0.552989654
0.16141527
0.099383023
...
Here is a snippet from the two files:
VA,DEFAULT,72.8027,11.9534.....
VA,DEFAULT,61.8356,11.9342....
VA,DEFAULT,61.8356,....
Note that the first two fields are identical in most of the rows in both files. Maybe this is the issue? I don't know shuf well enough.