2

I'm having a little trouble understanding what's happening in a Perl script that uses pax.

Background: Feeding a .tgz file to pax and unpacking to get a folder full of files.

This is the thing I don't understand:

pax -r -z -s '/.*\\//directory\\//p' -f $input_path/$tgz

Where $input_path is a scalar variable in Perl containing a path and $tgz is another scalar variable containing the name of the .tgz file.

So -r for reading makes sense, -z for unzipping is fine. The -s and -f flags confuse me. I get the following error: pax: Invalid replacement string option /.*\\//directory\\//p.

How I think the flags work:

-f, as this doesn't seem to be the problem. This is just where to put the files.

-s, string replacement to modify the names of the file contained in the .tgz file.

Can anyone demystify the /.*\\//directory\\//p part, as I don't really get what's going on here with all the escape slashes, \ and the p must do something but I have no idea what?

Christopher
  • 247
  • 2
  • 9
  • The syntax is `-s /old/new/[gp]/`, where `old` is a basic regular expression (cf. `re_format(7)`). The `p` flag makes `pax` print successful substitution to `stderr`. However, `.*\\//directory\\ ` is not a valid BRE. It's hard to tell whether the problem comes from the original script or from your attempt at obfuscation. You also claim you found this is in a `perl` script, which doesn't make sense, either. – Satō Katsura Oct 05 '16 at 08:29
  • @SatoKatsura So it's, `old` = `.` (here, so what does the wildcard do? all that is here?) and `new` = `directory`? I'm not sure what you meant by [gp] or BRE. In the `perl` script it is enclosed within back-ticks and then `;`. – Christopher Oct 05 '16 at 08:38
  • If you use the `pax` provided by `star`, you get this error message: `pax: Bad substitute option 'd'.` which is a result from incorrect backslash quoting. – schily Jun 12 '20 at 09:55

1 Answers1

2

Pax parses /.*\\//directory\\//p as:

  • / is the separator character.
  • .*\\ is the regular expression, matching any string ending with a backslash (backslash quotes the next character).
  • / separates the regular expression from the replacement text.
  • / ends the replacement text.
  • directory\\//p is trailing garbage.

Evidently, you meant to use the backslashes to protect the slashes so that they're part of the regex rather than separators. For a shell script, there are extra backslashes in there (but they may be due to the fact that this is happening in a perl script, more on this later). There's also something wrong with the slashes. If you want to remove any/leading/prefix/up/to/directory from the paths, then it should be

pax -r -z -s '/.*\/directory\///p' -f "$input_path/$tgz"

It would be easier to read with a different separator. Then you wouldn't need to escape slashes.

pax -r -z -s '!.*/directory/!!p' -f "$input_path/$tgz"

All this assumes that the command is a shell command. You mention a Perl script; Perl would add its own layer of quoting, so what to write depends on how the string is inserted in the Perl script. The use of $input_path/$tgz is definitely problematic because it's interpolating a string into a shell script, so that string will be parsed as a shell snippet instead of a file name.

If the shell command is between double quotes or backticks, then the backslashes do need doubling. There's still the problem of the misplaces slash. Here's a way to write this in Perl:

my $quoted_file_name = quotemeta("$input_path/$tgz");
system("pax -r -z -s '!.*/directory/!!p' -f $quoted_file_name");

If you're using system then you should use the list form instead to avoid quoting issues by not invoking an intermediate shell.

system('pax', '-r', '-z', '-s', '!.*/directory/!!p', '-f', "$input_path/$tgz");
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175