1

I would like to extract the numeric part of the file names that begin with "hsli" and end with ".h5" in Bash on Ubuntu 14.04.1 64-bit LTS. My ls -l hsli* output is as follows:

-rwxrwxrwx 1 ongun ongun 31392 Feb 26 13:04 hsli0.03.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 13:44 hsli0.042.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 14:24 hsli0.054.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 15:03 hsli0.066.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 15:42 hsli0.078.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 16:22 hsli0.09.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:02 hsli0.102.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:36 hsli0.114.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:58 hsli0.126.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 18:20 hsli0.138.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 18:42 hsli0.15.h5

They are already in ascending order and after a bit of manipulation I am able to get the file name for the first file with the following command. The command and the output follow below:

$ ls -l hsli* | head -1 | rev | cut -f 1 -d " " | rev 
hsli0.03.h5

Now my aim is to extract 0.03 from here, how can I do so? I am not familiar with regular expressions and this seems like a hard case since there are 2 dots in the file name.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Vesnog
  • 679
  • 4
  • 11
  • 29
  • `ls hsli* | head -1 | sed 's/[0-9]*\.[0-9]*//'` or even `ls hsli* | head -1 | sed 's/[0-9.]\+/./'` – Costas Feb 26 '15 at 19:56
  • Will try it @Costas thanks. The command got rid of the digits all together and the output is `hsli.h5`. – Vesnog Feb 26 '15 at 19:59
  • If there are definitely no `\n`ewlines in the filenames, do: `\ls -d ./hsli* | cut -d. -f3` for the whole list - add `head -n1` to the end. @Costas - you can drop `head` if you just add a `;q` to the tail of your command. – mikeserv Feb 26 '15 at 20:01
  • 1
    Of course, without `ls`, you can do: `set -- hsli*; set -- "${1#*.}"; echo "${1%.*}"` – mikeserv Feb 26 '15 at 20:03
  • @mikeserv It gives `03` as the output not `0.03`. Can I manually prepend a dot in the beginning, say with `sed`? – Vesnog Feb 26 '15 at 20:05
  • 1
    @Vesnog - ok, so do `printf %.02f\\n ".$(earlier cmd)"` - it's probably better than `echo` anyway. Or for the second version just `...;echo "0.${1%.*}"`. Oh, and maybe add a `-s` switch to `cut` so you only work with filenames that definitely contain the right amount of `.` dots. – mikeserv Feb 26 '15 at 20:07
  • @mike Okay the second version worked like a charm but I could not get the first one to work. – Vesnog Feb 26 '15 at 20:14
  • 1
    @Vesnog - well, for the second one, you might want to do a `test` first before the `echo` *(in case the filename you search for doesn't exist or doesn't have the right number of dots)*. I'll do an answer. – mikeserv Feb 26 '15 at 20:17

2 Answers2

2

Without ls, since you're just populating its list with shell globs anyway, you can cut out the middle-man like:

glob_hsli()(IFS=.;set +f
    set -f -- '' hsli*.*.h5
    for h5 do case ${h5#*.}  in
        (*[!0-9]*.*|.*|'') : ;;
        (*) set $h5 "${1:-0}";
        shift $((3>>($2>$4)));;
    esac;done
    printf "0.%d\n" "${1:?No Match Found!}"
)

Call it without arguments and it will glob your hsli* files and only print the 1st occurring middle *.string.h5 part in the results for the current directory, or it will return with error and a meaningful error message printed to stderr if it cannot do so.

mikeserv
  • 57,448
  • 9
  • 113
  • 229
  • Should I save this as a separate file in the same directory? I have never done this before. – Vesnog Feb 26 '15 at 20:34
  • @Vesnog - You can if you like - you can then source it like `. ./filename.fn` *(or whatever you name it)*. Or you can copy/paste it into your command-line. After doing either thing you'd just call it like `glob_hsli`. I think I got it ironed out to handle all outside cases now, as well. It will recurse if it needs to get a match for `*.*.h5` without also matching `*.*.*.h5` - but it will quit as soon as it can regardless. With your above dataset, one iteration should be all it takes. – mikeserv Feb 26 '15 at 20:37
  • Thanks once again while we are at it I got another file that has some line with a word like `reso=35`, how can I extract 35 here? Tried `glob_hsli` and it returns `0. 03.h5` – Vesnog Feb 26 '15 at 20:43
  • 1
    @Vesnog - `sed '/\n/P;//!s/reso=\([0-9]\{1,\}\)/\n\1\n/;D' – mikeserv Feb 26 '15 at 20:47
  • Thanks that works for the `reso=35`. For the first part I think I will go with the solution in your comment since I am sure that the `hsli` files exist in that directory. However, I cannot figure out how to use it for the maximum number since it does not use `ls`. – Vesnog Feb 26 '15 at 21:01
  • For example I have hsli0.15.h5 also in the same directory and would like to extract `0.15`, if you look at the `ls` output in my original post it might illustrate my question better. – Vesnog Feb 26 '15 at 21:11
  • I will try now what do you think about the `0.15` by the way? – Vesnog Feb 26 '15 at 21:25
  • 1
    @Vesnog - oh! I get it. I do think it should glob `.15` before in order *before* `.03`, but if not we can explicitly test for that. It's not so hard. – mikeserv Feb 26 '15 at 21:28
  • It works like a charm now with the latest version it prints `0.03`. I did not understand your last argument though. – Vesnog Feb 26 '15 at 21:34
  • The latest version does not provide any output. – Vesnog Feb 26 '15 at 21:44
  • Yes you are right. – Vesnog Feb 26 '15 at 21:55
1

Bash makes it relatively easy to apply a transformation like stripping prefixes and suffixes to elements of an array.

shopt -s nullglob                  # if there are no matches, produce an empty list
versions=(hsli*.h5)                # list matches
versions=("${versions[@]#hsli}")   # strip prefix
versions=("${versions[@]%.h5}")    # strip suffix
printf '%s\n' "${versions[@]}"     # print one version per line
for v in "${versions[@]}"; do      # execute a command on each version
  somecommand "$v"
done

Note that the versions (if that's what they are) are sorted in lexicographic order, so e.g. 0.9 comes after 0.10. If you want a numerical order and you have recent enough versions of GNU coreutils, you can use sort -V to sort 0.9 before 0.10. Given that your file names don't contain whitespace or globbing characters, you can sort them with

versions=($(printf '%s\n' "$versions[@]" | sort -V))
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • Thanks your help is much appreciated I was too confused and got the job done with `ls -l hsli* | tail -1 | rev | cut -f 1 -d " " | rev | sed -e 's/[a-z]*//' -e 's/.h5//'` and the same command with `head` in the second pipe to get the first files number. The numbers correspond to frequencies in an FDTD(Finite Difference Time Domain analysis). I would like to learn much more about `sed`, `awk` and `regexs` when I find the time btw, where can I start? – Vesnog Feb 26 '15 at 23:20