2

I'd like to randomly select a line after a given number of lines. For example here's my input:

8 blue
8 red
8 yellow
8 orange
3 pink
3 white
3 cyan
3 purple
1 magenta
1 black
1 green
1 brown

and with random selection a line from every four lines, my output would be:

8 orange
3 pink
1 green

The best I've come up with is:

awk '!(NR%4){a=NR+4};NR<=a|"shuf -n 1"'

but it doesn't work.

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
mtherk16
  • 21
  • 1

2 Answers2

4

With the GNU implementation of the split command:

split -l 4 --filter='shuf -n1' inputfile
  • -l N - put N lines/records per output file
  • --filter=COMMAND - write to shell COMMAND; file name is $FILE
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
2

To select p=1 line out of every n=4 lines at random:

awk -v n=1000 -v p=50 '
  BEGIN {srand(); remaining = p}
  NR > n {remaining = p; NR = 1}
  rand()*(n + 1 - NR) < remaining {
    print; remaining--
  }' < your-file

To have awk invoke GNU shuf every 4 lines, you'd need:

awk -v cmd="shuf -n 1" '{print | cmd}; NR % 4 == 0 {close(cmd)}'

But that means running one sh and one shuf command every 4 lines of the file which is a lot less efficient.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501