36

Very new to UNIX but not new to programming. Using Terminal on MacBook. For the purposes of managing and searching word lists for crossword construction, I'm trying to get handy with the Grep command and its variations. Seems pretty straightforward but getting hung up early on with what I thought should be a simple case.

When I enter

grep "^COW" masternospaces.txt

I get what I want: a list of all the words starting with COW.

But when I enter

grep "COW$" masternospaces.txt

I expect to get a list of words ending with COW (there are many such words), and nothing is returned at all.

The file is a plain text file, with every line just a word (or a word phrase with no spaces) in all caps.

Any idea what could be happening here?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
DTalvacchio
  • 361
  • 1
  • 3
  • 4
  • 4
    What is the origin of the masternospaces.txt file? is it possible it has Windows-style line terminations (CR-LF) instead of Unix-style LFs? – steeldriver Dec 29 '14 at 19:05
  • 2
    Not sure, but are you looking for a list *words* or a list of *lines...*? – mikeserv Dec 29 '14 at 19:06
  • steeldriver-- Something like that was my first thought. Wasn't sure how to inspect what was happening there, or what even the possibilities were. Assumed that an end return was an end return. That file is a massive compendium from a few sources. I'm not even sure which one would be considered the original file. And it's been through at least three word processors on both PC and Mac machines. What might be the best way to see what kind of terminations it's using? – DTalvacchio Dec 29 '14 at 19:15
  • mikeserv-- In this .txt file, every line is just a word (or a phrase with no spaces between words, so again a "word"). So I am searching for lines, I suppose . . . just that each line has only one of what I'm considering a word for crossword purposes. – DTalvacchio Dec 29 '14 at 19:17
  • To see if it has CR-LF endings, there are many ways but `cat -net masternospaces.txt` is one easy one (the CRs will show up as `^M`). BTW it sounds like a *word anchor* might do equally well in your application e.g. `COW\>` – steeldriver Dec 29 '14 at 19:21
  • Ok, the wording was just unclear. Then you should have a look at *exactly* what you're working with: do `sed -n '/COW/l' file` to get an unequivocal representation of every line matching COW. There must be some character following COW. – mikeserv Dec 29 '14 at 19:21
  • @steeldriver-- Running the cat -net command yielded the entire set of lines, each with a $ at the end. Does that tell you something? – DTalvacchio Dec 29 '14 at 19:39
  • @mikeserv-- The sed -n command did list all the lines with COW, sometimes COW appearing at the end but not always. All entries had a $ at the end. – DTalvacchio Dec 29 '14 at 19:42
  • By the way @steeldriver, the word anchor idea did work. Got a list of words ending in COW. But I'd still like to resolve this larger issue regarding the line endings, as I'm sure it will come up somehow. – DTalvacchio Dec 29 '14 at 19:57
  • WHat does `grep -w '.*COW'` return? – muru Dec 29 '14 at 20:13
  • @muru -- Exactly what I want: all the lines ending in COW. Does something about that explain why $ isn't working? – DTalvacchio Dec 29 '14 at 20:19
  • I can see that I have a lot to learn about these various line terminations, and how to convert from whatever this file is to one with Unix style terminations. Anybody have a link to a clear document which could help me understand (1) how to determine what terminations are in this file and (2) how to convert? – DTalvacchio Dec 29 '14 at 20:43
  • Converting is often done using the `dos2unix` tool, and the [`file` command](http://stackoverflow.com/questions/5346523/how-to-find-a-windows-end-of-line-eol-character) is capable of detecting which line endings are being used. You can also manually inspect a line using `od`, for example. – muru Dec 29 '14 at 20:53
  • `grep -w` is equivalent to using word anchors, I hadn't noticed steeldriver's comment or your response when posting that. – muru Dec 29 '14 at 20:59
  • 1
    You can use `hexdump` to check exactly how your line endings are formatted. I suggest you use my favorite format : `hexdump -e '"%08_ad (0x%08_ax) "8/1 "%02x "" "8/1 "%02x "' -e '" "8/1 "%_p""|"8/1 "%_p""\n"' masternospaces.txt`. With the output, check the line endings : `0a` -> `LF`, `0d` -> `CR`. – user43791 Dec 29 '14 at 21:59
  • So `sed -n '/COW/l'` printed lines that looked like `...COW$`? There was no space or anything at all between the `$` and the `W`? If that is the case then it almost definitely a problem with your `grep` command or something. It you had a CR/LF file it would look like: `COW\r$` where the `\r`eturn is marked for each. That is the easiest way to check for that kind of stuff, usually. Maybe you could copy/paste some of the `sed` output into the question, if you don't mind? – mikeserv Dec 29 '14 at 22:47

6 Answers6

40

As @steeldriver mentionned, the problem is likely to be caused by a different line ending style than what grep is expecting.

To check the line endings

You can use hexdump to check exactly how your line endings are formatted. I suggest you use my favorite format :

hexdump -e '"%08_ad (0x%08_ax)    "8/1 "%02x ""   "8/1 "%02x "' -e '"    "8/1 "%_p""|"8/1 "%_p""\n"' masternospaces.txt

With the output, check the line endings : 0a -> LF, 0d -> CR. A very quick example would give something like this :

$ hexdump -e '"%08_ad (0x%08_ax)    "8/1 "%02x ""   "8/1 "%02x "' -e '"    "8/1 "%_p""|"8/1 "%_p""\n"' masternospaces.txt
00000000 (0x00000000)    4e 6f 20 43 4f 57 20 65   6e 64 69 6e 67 0d 0a 45    No COW e|nding..E
00000016 (0x00000010)    6e 64 69 6e 67 20 69 6e   20 43 4f 57 0d 0a          nding in| COW..

Note the line endings in dos format : 0d 0a.

To change the line endings

You can see here or here for various methods of changing line endings using various tools, but for a one-time thing, you could always use vi/vim :

vim masternospaces.txt
:set fileformat=unix
:wq

To grep without changing anything

If you just want grep to match no matter the line ending, you could always specify line endings like this :

grep 'COW[[:cntrl:]]*$' masternospaces.txt

If a blank line is shown, you can check that you indeed matched something by using the -v option of cat :

grep 'COW[[:cntrl:]]*$' masternospaces.txt | cat -v

My personal favorite

You could also both grep and standardize the output using sed :

sed -n '/COW^M*$/{;s/^M//g;p;};' masternospaces.txt

where ^M is obtained by typing Ctrl-V Ctrl-M on your keyboard.

Hope this helps!

user43791
  • 2,668
  • 14
  • 14
  • That is all extremely helpful. Am out of time today but will look through all of this closely tomorrow and see what's what. If in the meantime any of you has a link to your favorite Unix command reference guide so that I can teach myself a little about how things are working, I'd appreciate it. I've been picking up pieces here and there but have yet to find one source that is my go-to for explanations. Thanks everyone and will check in tomorrow with a hopefully successful update. --D – DTalvacchio Dec 29 '14 at 23:54
  • It's too bad this post doesn't have closure, for me at least. I cannot, for the life of me, figure out how to match the end of the line. If I do a hex dump, I can't find a nice line ending like your example above. I am not familiar with working with hex so I may not be reading it right. I also tried the `[[:cntrl:]]` @user43791 suggested and it's still not matching anything for me. This makes no sense. I'm using GNU grep 2.20 and parsing output from nDPI which was written to a text file – harperville Feb 22 '16 at 15:40
  • @harperville If you `cat -v yourfile.ext`, what do you see? – user43791 Mar 01 '16 at 20:13
  • Well, nothing to exciting or unexpected. Just the contents as I would expect to see them. Anything specific you're looking for? I can't paste the output here but I just see the contents. Regular ol' "ASCII English text" according to `file`. – harperville Mar 03 '16 at 01:29
  • @harperville No extra "^M" at the end of each line? Could you paste the first few lines of hex? – user43791 Mar 03 '16 at 12:14
  • Hex dump of file: http://pastebin.com/x7NHd8t7 `cat -v` of same file: http://pastebin.com/gmmuaYmY – harperville Mar 04 '16 at 16:35
  • Joined just to up vote this. So helpful to have those special characters. Was pulling my hair out wondering why `\n` wasn't being recognized! – Minnow Jan 25 '19 at 22:58
  • Why do you use double square brackets in `[[:cntrl:]]`? According to the [manual page](https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html), it only requires a single pair of bracket. – HelloGoodbye Feb 21 '19 at 12:31
  • Under "**To grep without changing anything**", a much simpler way (imho) is to use `dos2unix` to anchor CRLF lines with `$`. Worth noting that the recommended `[[:cntrl:]]` also didn't work for me. – nivk Jul 17 '19 at 23:07
  • The command line tool `dos2unix` does the trick. – ingyhere Dec 19 '20 at 06:13
7

Another way to remove the \r before the grep:

... | dos2unix | egrep 'COW$' | ...

I like that it's very clear since I don't remember things like [[:cntrl:]] for long.

Javier
  • 274
  • 2
  • 8
1

Although you can use 'standard' RegEx syntax with grep (as in @user43791's answer), grep also has other identifiers to signify the input boundaries.

The matchers for the start and end of the whole line are \` (backtick) (instead of ^) and \' (apostrophe) (instead of $).

So for your original command, you would use: grep "COW\'" masternospaces.txt

Side note: It's also important to note that ? and + will be treated literally unless you escape them using \? and \+ to make them their RegEx-style selector counterparts.

Source: grep regular expression syntax

samthecodingman
  • 153
  • 1
  • 1
  • 6
-3

Try this command: grep "COW"$ masternospaces.txt

BlueManCZ
  • 1,693
  • 12
  • 31
Nir
  • 1
-4

"COW$" when bash set pararameter for grep , it was interpreted as 'COW' where treat "$" as "", becase $ is a escape simbol. when nothing was fellowed by $, it is interpreted as empty string by bash shell so, you should use grep 'COW$' masternospaces.txt instead.

  • 3
    since there's no valid expansion of `$`, it would be left alone by bash and used by grep. See for yourself: `echo "COW$"` -- the `$` will still be there. – Jeff Schaller Nov 30 '17 at 17:37
-5

In BSD grep you need to escape "$" and enclose your string in double quotes:

"COW\$"
  • 1
    Um... no. The `$` will not be special to the shell, because the stuff after it is not a valid shell variable name. Using single quotes around static strings is a better idea, but will make no difference here. – Kusalananda Jun 28 '18 at 07:06