0

I had 5000 text files in a directory, each text file name starts with a prefix OG00* (ex - OG0017774.log)

The .log files which contains asterisk (*) inside the file, need to be copied into a new directory.

The content of the file -

cat OG0017774.log

M0~b904dbe442e0eb658d229076cacb9ef6 M1~9edeedcb1f4f315c4689bacd8075222f 0.000035**
M0~b904dbe442e0eb658d229076cacb9ef6 M2~aeba83b564ee32e0ef1a8321c8d930f4 0.000671**
M0~b904dbe442e0eb658d229076cacb9ef6 M3~006a376da2fba16185ce424bf4cba983 0.000055**
M0~b904dbe442e0eb658d229076cacb9ef6 M4~e564dbfbbbe8d1f7d9d8c8e4da202943 0.000015**
M0~b904dbe442e0eb658d229076cacb9ef6 M5~2abe603e8fee2fcb08b7fb818957e0aa 0.000006**

Suggestions appreciated.

I tried this code, it copies all the files in the current directory to a new directory.

I want to copy those files which had * inside the each text file.

#!/bin/bash
KEYWORD_PATTERN='*'
find . -type f |
while read FNAME
do
    if grep -Ew -q "$KEYWORD_PATTERN" $FNAME
    then
        KEYWORD=$(grep -Ew -o "$KEYWORD_PATTERN" $FNAME)
        cp -r $FNAME keywords/$KEYWORD
    fi

done
sunnykevin
  • 89
  • 1
  • 8
  • What is your question? Your script does not do anything since all it does is output the `mv` command. Do you get error messages too? Do you also want to give the files some special name in the `keywords` directory? If so, what should happen if you have name collisions? How should the new name be chosen? What's the reasoning behind using `find` if you _know_ all files match `OG00*`? – Kusalananda Aug 01 '22 at 20:54
  • I update the post. If a text file contains * inside it, I'd like to copy them in to a new directory. – sunnykevin Aug 01 '22 at 20:58
  • `*` by itself is an invalid extended regex, and even though I tried to figure it out, I'm not even sure what `grep -Ew '*'` does on my Debian. With `-o`, it doesn't print anything anyway. So that might be a problem. I'm also not exactly sure what you expect to get with the `$(grep -Ew -o "$KEYWORD_PATTERN" $FNAME)`, as if you're trying to match against a literal `*`, the matched string would always be just `*`. And if `$KEYWORD` _is_ `*` then you really better start quoting the rest of those variable expansions. – ilkkachu Aug 01 '22 at 21:25
  • see https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters and https://unix.stackexchange.com/questions/68694/when-is-double-quoting-necessary – ilkkachu Aug 01 '22 at 21:26
  • You mean any .log file that contains even a single `*` on any line shall be copied ? – Paul_Pedant Aug 01 '22 at 22:44
  • Matching special characters is frequently difficult. I would count the `*` using `tr` (which is ignorant of patterns) and `wc`: `tr -cd '*' < $"{fname}" | wc -c`. Then just test for non-zero. Also note that uppercase variable names are likely to clash with those in your environment. – Paul_Pedant Aug 01 '22 at 22:51
  • Yes you're right. – sunnykevin Aug 01 '22 at 22:54

2 Answers2

1

What about something like this?

for i in OG00*; do 
    if grep -q -F '*' "$i"; then 
        mv "$i" ../keywords/
    fi
done
r_31415
  • 496
  • 1
  • 4
  • 7
1

With GNU tools:

grep -rFlZ --include='OG00*' '*' . |
  xargs -r0 cp -t ../keywords

grep searches for the Fixed string * inside the current directory (.) recursively, in file whose names starts with OG00 and lists the files with at least one match Zero delimited; xargs takes that output, and splits it on 0s to pass as arguments to cp.

The POSIX equivalent would be:

find . -name 'OG00*' -type f -exec grep -qF '*' {} ';' \
   -exec sh -c 'exec cp "$@" ../keywords' sh {} +

Though that means running one grep per file so would be significantly less efficient.

To match a * with grep, the options are:

  • grep -F '*' fixed string match the easiest and the one you want to use if you only need to search for fixed strings.
  • grep '*' in basic regular expressions, * at the beginning of the pattern matches a literal *. grep 'a*' would match any sequence of 0 or more as though, and you'd need grep 'a\*' or grep 'a[*]' to match a literal a*.
  • grep -E '\*' / grep -E '[*]'. With extended regexps, a * at the start of the pattern is an error, and that * needs to be escaped. Same goes for grep -P or grep -X where supported.

You may also want to read:

About some of the common mistakes in your code.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501