After processing some data, I get out a file that has a certain number of data points in it --- one per line. I need to pass these data points on to another tool that will do more number crunching --- in the tool, I need to set the "batch size" for a given run:
./gen_data.sh > data.txt
./process_data.sh < data.txt > parsed.bin
./crunch_data.sh --total=$(wc -l < data.txt) --batch_size=N --infile=parsed.bin
A batch size N that's too low will take prohibitively long to process; a batch size that's too high will give me low-quality output. The batch size must divide the number of data points, given by M=$(wc -l < data.txt). Values of N around M/10 seem to be pretty good. It's not a big deal if the batch size does something strange in strange cases (i.e. N=M for a prime M --- that case almost certainly won't occur, so I'm not worried about it).
Is there a slick way to do this with shell tools? I know I can get the factors of M with factor. In Python I might write something like:
total_portion = 1
for factor in factors(M):
total_portion *= factor
if total_portion > 10:
return M/total_portion
And now I'd have some fraction of M that's a bit smaller than 1/10th of it, depending on how many factors M had.
I'm not sure how I'd do this as a shell script, or what tools I could use to make it easier. Can this be done nicely? Am I better off just passing the list of factors to a tiny Python script and doing the logic there?