2

Is the output of comm guaranteed sorted? In my simple examples they are and that makes sense to me (how I think comm works); however, I need to comm very large files and worried that comm might do some black magic for very large files.

Also, can someone point me to the source of comm? I've never been able to find the source for such scripts.

Thanks

Jeff
  • 155
  • 4
  • Asking 2 questions at once is not a good idea, so the second one got never answered. Here it goes. See https://unix.stackexchange.com/questions/366015/how-do-i-look-at-the-source-code-for-a-command for 2 possible answers. There are several implementations of `comm`. A commonly used one is from GNU and its source code is at http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/comm.c?h=v8.29&id=27b2b19aa8d8b30b8cb4198b2f4b54568e10a35e (newest released version as of time of writing) – Uwe Geuder Apr 13 '18 at 18:17

1 Answers1

3

Yes, if your input lines are ordered in the current collating sequence. From POSIX comm STDOUT documentation:

If the input files were ordered according to the collating sequence of the current locale, the lines written shall be in the collating sequence of the original lines.

If you guaranteed your input sorted, the comm output guaranteed sorted, too.

POSIX also defined that if your input is not ordered according to the collating sequence of the current locale, the comm output will be unspecified.

If you have GNU comm, you can use option --check-order to make unsorted inputs will cause a fatal error message.

cuonglm
  • 150,973
  • 38
  • 327
  • 406