36

I have a few Python scripts laying around, and I'm working on rewriting them. I have the same problem with all of them.

It's not obvious to me how to write the programs so that they behave like proper unix tools.

Because this

$ cat characters | progname

and this

$ progname characters

should produce the same output.

The closest thing I could find to that in Python was the fileinput library. Unfortunately, I don't really see how to rewrite my Python scripts, all of which look like this:

#!/usr/bin/env python 
# coding=UTF-8

import sys, re

for file in sys.argv[1:]:
    f = open(file)
    fs = f.read()
    regexnl = re.compile('[^\s\w.,?!:;-]')
    rstuff = regexnl.sub('', fs)
    f.close()
    print rstuff

The fileinput library processes stdin if there is a stdin, and processes a file if there is a file. But it iterates over single lines.

import fileinput
for line in fileinput.input():
    process(line)

I really don't get that. I guess if you're dealing with small files, or if you're not doing much to the files, this may seem obvious. But, for my purposes, this makes it much slower than simply opening the entire file and reading it into a string, as above.

Currently I run the script above like

$ pythonscript textfilename1 > textfilename2

But I want to be able to run it (and its brethren) in pipes, like

$ grep pattern textfile1 | pythonscript | pythonscript | pythonscript > textfile2
JJoao
  • 11,887
  • 1
  • 22
  • 44
ixtmixilix
  • 13,040
  • 27
  • 82
  • 118

6 Answers6

17

Check if a filename is given as an argument, or else read from sys.stdin.

Something like this:

if len(sys.argv) > 0:
   f = open(sys.argv[1])
else:
   f = sys.stdin 

It's similar to Mikel's answer except it uses the sys module. I figure if they have it in there it must be for a reason...

laconbass
  • 4,339
  • 4
  • 16
  • 20
rahmu
  • 19,673
  • 28
  • 87
  • 128
  • What if two file names are specified on the command line? – Mikel Sep 04 '12 at 21:28
  • 4
    Oh absolutely! I didn't bother showing it because it was already shown in your answer. At some point you have to trust the user to decide what she needs. But feel free to edit if you believe this is best. My point is only to replace `"open(/dev/stdin")` with `sys.stdin`. – rahmu Sep 04 '12 at 21:40
  • 3
    you may want to check `if len(sys.argv)>1:` instead of `if sys.argv[1]:` otherwise you get an index out of range error – Yibo Yang Sep 21 '16 at 02:53
11

Why not just

files = sys.argv[1:]
if not files:
    files = ["/dev/stdin"]

for file in files:
    f = open(file)
    ...
Mikel
  • 56,387
  • 13
  • 130
  • 149
  • 14
    `sys.stdin` should be used instead as it's more portable than hardcoded path to file. – Piotr Dobrogost Feb 03 '15 at 10:26
  • `sys.stdin` should be used instead, as Piotr says – smci Nov 11 '15 at 04:27
  • But `sys.stdin` is a file, and it's already open, and must not be closed. Impossible to handle _just like_ a file argument without jumping through hoops. – alexis Jan 11 '19 at 14:14
  • @alexis Sure, if you want to close `f`, or want to use a context manager, you need something more complex. See my new answer as an alternative. – Mikel Jan 12 '19 at 05:17
9

My preferred way of doing it turns out to be... (and this is taken from a nice little Linux blog called Harbinger's Hollow)

#!/usr/bin/env python

import argparse, sys

parser = argparse.ArgumentParser()
parser.add_argument('filename', nargs='?')
args = parser.parse_args()
if args.filename:
    string = open(args.filename).read()
elif not sys.stdin.isatty():
    string = sys.stdin.read()
else:
    parser.print_help()

The reason why I liked this best is that, as the blogger says, it just outputs a silly message if accidentally called without input. It also slots so nicely into all of my existing Python scripts that I have modified them all to include it.

ixtmixilix
  • 13,040
  • 27
  • 82
  • 118
  • 3
    Sometimes you do want to enter the input interactively from a tty; checking `isatty` and bailing out does not conform to the philosophy of Unix filters. – musiphil Sep 24 '13 at 07:58
  • 1
    Apart from the `isatty` wart, this covers useful and important ground not found in the other answers, so it gets my upvote. – tripleee Sep 23 '15 at 05:53
3
files=sys.argv[1:]

for f in files or [sys.stdin]:
   if isinstance(f, file):
      txt = f.read()
   else:
      txt = open(f).read()

   process(txt)
Mikel
  • 56,387
  • 13
  • 130
  • 149
JJoao
  • 11,887
  • 1
  • 22
  • 44
  • This is how I would have written it, if `/dev/stdin` were unavailable on all my systems. – Mikel Jun 20 '18 at 16:12
0

I am using this solution and it works like a charm. Actually I am using in a script calle unaccent that lowercases and removes accents from a given string

argument = sys.argv[1:] if len(sys.argv) > 1 else sys.stdin.read()

I guess the firest time I saw this solution was here.

SergioAraujo
  • 439
  • 6
  • 8
0

If your system doesn't have /dev/stdin, or you want a more general solution, you could try something more complicated like:

class Stdin(object):
    def __getattr__(self, attr):
        return getattr(sys.stdin, attr)

    def __enter__(self):
        return self

def myopen(path):
    if path == "-":
        return Stdin()
    return open(path)

for n in sys.argv[1:] or ["-"]:
    with myopen(n) as f:
            ...
Mikel
  • 56,387
  • 13
  • 130
  • 149
  • Why do you move the file pointer on exit? Bad idea. If input was redirected from a file, the next program will read it again. (And if stdin is a terminal, seek usually does nothing, right?) Just leave it alone. – alexis Jan 14 '19 at 13:50
  • Yeah, done. I just thought it was cute to use `-` multiple times. :) – Mikel Jan 14 '19 at 20:42
  • Could someone explain what the `Stdin` class does? – abalter Mar 05 '22 at 04:53