Find a "moderately large" divisor of a given number?

Question

After processing some data, I get out a file that has a certain number of data points in it --- one per line. I need to pass these data points on to another tool that will do more number crunching --- in the tool, I need to set the "batch size" for a given run:

./gen_data.sh > data.txt
./process_data.sh < data.txt > parsed.bin
./crunch_data.sh --total=$(wc -l < data.txt) --batch_size=N --infile=parsed.bin

A batch size N that's too low will take prohibitively long to process; a batch size that's too high will give me low-quality output. The batch size must divide the number of data points, given by M=$(wc -l < data.txt). Values of N around M/10 seem to be pretty good. It's not a big deal if the batch size does something strange in strange cases (i.e. N=M for a prime M --- that case almost certainly won't occur, so I'm not worried about it).

Is there a slick way to do this with shell tools? I know I can get the factors of M with factor. In Python I might write something like:

total_portion = 1
for factor in factors(M):
    total_portion *= factor
    if total_portion > 10:
        return M/total_portion

And now I'd have some fraction of M that's a bit smaller than 1/10th of it, depending on how many factors M had.

I'm not sure how I'd do this as a shell script, or what tools I could use to make it easier. Can this be done nicely? Am I better off just passing the list of factors to a tiny Python script and doing the logic there?

shell isn't very good for doing calculations, especially non-integer. if you've already got what you want in python, your best bet is to adapt it so it can be called from your shell script. BTW, if the python script needs to be called many times (as in thousands or more) from the shell script, you'll be better off re-writing your entire shell script in python. — cas, Nov 24 '15 at 11:12
Strictly integer arithmetic here. I can push it into python I guess --- my implementation here was just a sketch of one way to go --- it's just that in this project we tend to use python for "heavy duty logic" and it will be a little odd to push through a 5 line script (and our testing guidelines will probably require 10-20 lines of tests for it). If there was a nice solution with awk or even a Perl 1-liner, it would be better suited to the scope of the problem. This will only ever be called a few times. — Patrick Collins, Nov 24 '15 at 11:18

cas · Answer 1 · 2015-11-24T11:44:40.853

2

Here's a shell version of your python algorithm using GNU factor:

#! /bin/bash

function total_portion() {
    local M="$1"
    local total_portion=1

    for factor in $(factor "$M" | sed -e 's/^[0-9]\+: //'); do
        ((total_portion *= factor))
        if [ "$total_portion" -gt 10 ] ; then
            echo $((M / total_portion))
            return 
        fi
    done
}

M=$(wc -l < data.txt)

tp=$(total_portion "$M")

edited Nov 24 '15 at 11:44

answered Nov 24 '15 at 11:37

cas

1
7
119
185

You probably mean `for factor in $(factor "$...` instead of `factors` – marcolz Nov 24 '15 at 11:44
We ended up going a different route and throwing the output from `factor` into a Python script and using an easier method to get "moderately large" divisors (taking the product of half of the prime factors), but this answers the question as asked. – Patrick Collins Nov 26 '15 at 02:35

Find a "moderately large" divisor of a given number?

1 Answers1

Linked