1

Say I have an executable script process_image that performs actions on a base 64 encoded image. I am storing every image in a file images_file line by line. Every line of images_file is a base 64 encoded image. Some of the lines are very long therefore the following returns xargs: argument line too long:

cat images_file | xargs -L1 process_image

I wanted to modify process_image to take the entire stdout from cat images_file and then loop over each line using a simple while loop, but my colleagues have advised against this approach. Does xargs -L1 also internally use the same mechanism as while? How would using xargs be more desirable than using a while? What is the maximum argument length that xargs can handle and is there any way to overcome this while maintaining the cat <file> | xargs -L1 <executable_script> approach?

sriganesh
  • 101
  • 1
  • 6
  • Can the `process_image` script or program read the image data from stdin, or can you give it a command-line argument that specifies a file to read the image from (for example `-f /path/to/image-file` ? – Sotto Voce Jul 14 '22 at 07:27
  • The `images_file` is being generated in runtime by a previous process. I could find many workarounds to this, but I am not being allowed to. I want to know if I can specifically use `xargs -L1` to allow an argument whose length exceeds the default allowed limit. – sriganesh Jul 14 '22 at 07:32
  • 1
    @sriganesh Why not run a simple shell loop to circumvent this problem, instead of using `xargs`? Try dumping the value of ARG_MAX, using `getconf ARG_MAX` – Inian Jul 14 '22 at 07:37
  • 1
    Linux has a hard-coded length limit for a single command line argument, and you might be crashing into that. See [What defines the maximum size for a command single argument?](https://unix.stackexchange.com/questions/120642/what-defines-the-maximum-size-for-a-command-single-argument) The solution in that case, such as it is, would be to switch another Unix without that limit. Or to pass data like that using files or pipes instead of command line arguments... – ilkkachu Jul 14 '22 at 07:38
  • 1
    I asked about `process_image` rather than `images_file`, but if you're set on xargs, okay. I don't see anything in the xargs man page that suggests `-L1` would help you, but I do see `-s somelargenumber` might help. (although the man page says the default value is ARG_MAX - 4096, so there doesn't seem to be a lot of space to gain) – Sotto Voce Jul 14 '22 at 07:41
  • 1
    @sriganesh have your colleague given a reason for advising against having `process_image` loop over input? – muru Jul 14 '22 at 07:53
  • Do you use the "_Trailing blanks cause an input line to be logically continued on the next input line_" part of `-L`? If not, a simple loop reading one line at a time might suffice – roaima Jul 14 '22 at 08:25
  • @SottoVoce I am not allowed to create intermediate files for storing the generated images and using the `process_image` script to read these files from a location. I want to know whether `xargs -L1` anyway uses the same mechanism as `while`. My boss insists that I do not use `for` or `while` loops. So if `xargs` also uses loops, why not use use `while` loop? Either that or I want to be able to use `xargs` to allow a size larger than `ARG_MAX`. I am a fresher with no prior experience in shell scripting and I am working under many heavy design constraints (no loops, no intermediate files). – sriganesh Jul 14 '22 at 08:37
  • @sriganesh, what OS are you on? What size are the command line arguments you need to pass? (the images, the lines in your file) – ilkkachu Jul 14 '22 at 08:38
  • 1
    There are many people who have this idea that while loops are somehow bad, but they are wrong. This is classic [cargo cult programming](https://en.wikipedia.org/wiki/Cargo_cult_programming): shell loops are a bad tool to process text files, so then people think they're a bad tool in general. There is nothing wrong with using a shell loop if it is the right tool for the job. – terdon Jul 14 '22 at 08:40
  • Shell loops are just fine if what you're doing is calling other programs with the data you loop over. But, if the person forbidding them is your _boss_, the technical arguments might not matter and you get to do what they want anyway. (The question says "colleagues", the comment says "boss", those are slightly different.) – ilkkachu Jul 14 '22 at 08:42
  • @ilkkachu Ubuntu Focal64. They are images between 10-50 kB being encoded in base 64. – sriganesh Jul 14 '22 at 08:43
  • 1
    @sriganesh, hmm, curious. As far as I understand, the Linux limit is 128 kB, which should be enough for a 50 kB image, even if that was before the base64 encoding. Can you check what happens if you take the largest image you can find, put that _alone_ in a file, and then run `xargs` over that? Check to see if it runs, and what the exact length of the file was. – ilkkachu Jul 14 '22 at 08:46
  • 1
    Presumably this is the same boss/colleague who [doesn’t believe that `&` truly runs processes in parallel](https://unix.stackexchange.com/q/709690/86440)… – Stephen Kitt Jul 14 '22 at 10:05
  • Why does your `process_image` program take its image as a command-line argument? That would be reasonable for very small amounts of data, but is absurd when you're talking about many kilobytes or more of data, and likely to run into ARG_MAX command-line length limits (especially on systems that have smaller ARG_MAX than Linux does). Instead, your program should take its data from a file or files, with the argument being the filename(s) containing the image(s), or from stdin. **That** is what you need to fix. – cas Jul 14 '22 at 11:27
  • @sriganesh I wasn't going to suggest that you write the image data to files on disk, but rather use techniques like `echo -n "$imagedata" | process_image` or `process_image <<<"$imagedata" ` or perhaps even `process_data -f <( echo -n "$imagedata" )` to give the data to the command without creating oversize command-line arguments (which my first and third examples still do). But they depend on `process_image` accepting file data on stdin or using an argument for passing a file path on the command line, which is still not known in this discussion. – Sotto Voce Jul 14 '22 at 13:56
  • 1
    "takes stdin ... using xargs" makes no sense. xargs' job is to convert its input (stdin or a file) and put it onto the command line as arguments for another program. That program processes command-line arguments, it does not process stdin. As I said earlier, that is the thing you need to fix. It is the source of your problems, and they will not go away until you rewrite your `process_image` program so that it reads its data from stdin (or from a file) - taking bulk data from command line args is beyond absurd, it is insane. – cas Jul 14 '22 at 16:16

0 Answers0