3

I have over 100 files named like

x.assembled.forward.fastq.gz
x(n).unassembled.reverse.fastq.gz

the problem is that the pipelines that I am working with do not accept 'dots' in the file name and I have to change all of them to _ so it would be like

x_assembled_forward.fastq.gz
x(n)_unassembled_reverse.fastq.gz

I thought it would be possible using the simple command:

mv *.assembled.*.fast.gz  *_assembled_*.fastq.gz

.... apparently not! :D

How can I do that?

ctrl-alt-delor
  • 27,473
  • 9
  • 58
  • 102
Moomin
  • 31
  • 2
  • 2
    `x_assembled_forward.fastq.gz` still has `.` in it. Did you mean `x_assembled_forward_fastq_gz`? – muru Aug 05 '19 at 08:05
  • 1
    Personally I'd use symlinks instead of changing the names. Also, out of curiosity, what's the pipeline that can't deal with the dots? – Sparhawk Aug 05 '19 at 08:16
  • @Sparhawk Some bioinformatics genome assembly pipeline that attaches some sort of semantic meaning to stuff in the filename given either dots or underscore delimiters? – Kusalananda Aug 05 '19 at 08:18
  • What system are you using? You had tagged with Linux, what distribution is this? – terdon Aug 05 '19 at 08:24
  • .fastq.gz is the format of the file it shouldn't change – Moomin Aug 05 '19 at 09:15
  • yes it's a genomic pipeline called ipyrad and other software for post processing analyses – Moomin Aug 05 '19 at 09:16
  • I use putty on window to connect to a supper cluster system – Moomin Aug 05 '19 at 09:16
  • Welcome to the site! A little tip: you can notify people in comments by writing an "@" in front of their username. It even helps you with an autocomplete function once you typed a few characters. You can notify one person per comment and the person who wrote the post (question or answer) is always notified. If you have any questions about the site take the [tour], visit the [help] or check out [meta]. Have fun! – Secespitus Aug 05 '19 at 09:19
  • Possible duplicate of [How do I change the extension of multiple files?](https://unix.stackexchange.com/questions/19654/how-do-i-change-the-extension-of-multiple-files) – Gilles 'SO- stop being evil' Aug 05 '19 at 20:25
  • @Gilles I don't think this is a dupe. The OP here wants to _keep_ the extension, but replace all but the last two `.` with `_`. The last two, however, shouldn't be changed so the answers of the dupe don't really apply. – terdon Aug 06 '19 at 08:14
  • @terdon It's a renaming at a different position, but still the same kind of renaming: replacing a substring. – Gilles 'SO- stop being evil' Aug 06 '19 at 08:29
  • @Gilles agreed, but most of the answers only cover changing an extension and would require significant tweaking to apply here. – terdon Aug 06 '19 at 08:41
  • Possible duplicate of [Rename files in directory](https://unix.stackexchange.com/questions/98070/rename-files-in-directory) – jsbillings Aug 11 '19 at 19:59

3 Answers3

8

If you have perl-rename installed (called rename on Debian, Ubuntu and other Debian-derived systems), you can do:

rename -n 's/\./_/g; s/_fastq_gz/.fastq.gz/' *fastq.gz

That will first replace all . with _ and then replace the final _fastq_gz with .fastq.gz.

The -n causes it to only print the changes it would do, without actually renaming the files. Once you're sure this does what you want, remove the -n to actually rename them:

rename  's/\./_/g; s/_fastq_gz/.fastq.gz/' *fastq.gz
terdon
  • 234,489
  • 66
  • 447
  • 667
  • I don't have that software – Moomin Aug 05 '19 at 16:42
  • 1
    @Masse that's why I asked you to please tell us what operating system you are using. If Linux, tell me what distribution and I can tell you how to install the tool. If you cannot install software (and if you're sure it isn't already present, it may be called `perl-rename`), then [use Kusalananda's approach](https://unix.stackexchange.com/a/533923/22222). – terdon Aug 05 '19 at 17:04
5

mv either takes a single file and moves or renames it, or it takes a number of files or directories and moves them to a directory. You can't rename multiple files with mv.

Instead:

for name in *.*.fasta.gz; do
    newname=${name%.fasta.gz}         # remove filename suffix
    newname=${newname//./_}.fasta.gz  # replace dots with underscores and add suffix

    mv -i -- "$name" "$newname"
done

This would iterate over all your compressed fasta files in the current directory that contains at least one dot elsewhere in the name, apart from in the filename suffix. It would remove the known filename suffix (which should not have dots replaced by underscores) and then substitutes all dots with underscores in the remaining bit and re-attaches the suffix.

The final substitution will work in the bash shell, but possibly not if running under /bin/sh.

mv -i is then used to rename the file (will ask for confirmation if the new name already exists). The double dash (--) is used just in case any of the names start with a dash (these would potentially be taken as sets of options to mv and the double dash prevents this).

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
2

Why it does not work

The * is expanded by the shell, before the command is run. It matches existing files. mv has no pattern matching abilities.

Solutions

mmv

This command works most like the way you are trying to do it. It is not as powerful as rename, but it is simpler.

e.g. mmv '*.assembled.*.fastq.gz' '#1_assembled_#2fastq.gz'

ctrl-alt-delor
  • 27,473
  • 9
  • 58
  • 102