62

I am trying to write a bash shell function that will allow me to remove duplicate copies of directories from my PATH environment variable.

I was told that it is possible to achieve this with a one line command using the awk command, but I cannot figure out how to do it. Anybody know how?

codeforester
  • 722
  • 2
  • 8
  • 23
Johnny Williem
  • 763
  • 3
  • 9
  • 6

19 Answers19

48

If you don't already have duplicates in the PATH and you only want to add directories if they are not already there, you can do it easily with the shell alone.

for x in /path/to/add …; do
  case ":$PATH:" in
    *":$x:"*) :;; # already there
    *) PATH="$x:$PATH";;
  esac
done

And here's a shell snippet that removes duplicates from $PATH. It goes through the entries one by one, and copies those that haven't been seen yet.

if [ -n "$PATH" ]; then
  old_PATH=$PATH:; PATH=
  while [ -n "$old_PATH" ]; do
    x=${old_PATH%%:*}       # the first remaining entry
    case $PATH: in
      *:"$x":*) ;;          # already there
      *) PATH=$PATH:$x;;    # not there yet
    esac
    old_PATH=${old_PATH#*:}
  done
  PATH=${PATH#:}
  unset old_PATH x
fi
Tom Hale
  • 28,728
  • 32
  • 139
  • 229
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • It would be better, if iterate items in $PATH reversely, because the later ones are newly added usually, and they might have the value up to date. – Eric Sep 03 '16 at 06:48
  • 6
    @EricWang I don't understand your reasoning. PATH elements are traversed from front to back, so when there are duplicates, the second duplicate is effectively ignored. Iterating from back to front would change the order. – Gilles 'SO- stop being evil' Sep 03 '16 at 10:50
  • @Gilles When you have duplicated variable in PATH, probably it's added in this way: `PATH=$PATH:x=b`, the x in original PATH might has value a, thus when iterate in order, then the new value will be ignored, but when in reversed order, the new value will take effect. – Eric Sep 03 '16 at 14:38
  • 6
    @EricWang In that case, the added value has no effect so should be ignored. By going backwards, you're making the added value come before. If the added value had been supposed to go before, it would have been added as `PATH=x:$PATH`. – Gilles 'SO- stop being evil' Sep 03 '16 at 15:42
  • @Gilles When you append something, that means it's not there yet, or you want to override the old value, so you need to make the new added variable visible. And, by convention, usually it's append in this way: `PATH=$PATH:...` not `PATH=...:$PATH`. Thus it's more proper to iterate reversed order. Even though you way would also work, then people append in the way reverse way. – Eric Sep 03 '16 at 16:13
  • I almost passed over this answer because it starts with an "add only if not already there" method, which I wouldn't want to use since it loses the important property of *where* in PATH I'm inserting the new entry (at the beginning, if I want it to win over everything else, or at the end if I want it to lose over everything else). But then you show an excellent shell-only way to remove dups; *that* is the valuable part of this answer. – Don Hatch Nov 24 '17 at 20:58
  • @DonHatch When you add-only-if-not-already-there, you can choose where to insert. Ok, I only show inserting at the beginning, but it's trivial to change the code to insert at the end. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:04
  • @Gilles The problem is if the entry is already in $PATH, then your first method wont change $PATH. I am suggesting that in that case it would be better to move the entry to the beginning (if overriding other entries is indeed what is desired). A nice way to accomplish that is to prepend the entry as usual, and then use your second function to remove dups. – Don Hatch Nov 24 '17 at 21:20
  • @DonHatch My own `.profile` is even more complicated than that (it has complex stuff to sort both existing and added entries), but not everyone needs the complexity. I generally prefer to present possibilities in order of increasing complexity. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:28
  • @Gilles Certainly, but how about refraining from presenting the first possibility at all? It's an accident waiting to happen. E.g. say my original .bashrc prepends ~/bin because I want my ~/bin/cat to win over /usr/bin/cat, then I notice my path is growing so I use your first version to prevent that, without thinking about it deeply enough. Now my setup is broken in a non-obvious way. I think your answer could be improved if you would refrain from presenting the error-prone first method at all-- or, if you are attached to keeping it for some reason, at least point out that it's error prone. – Don Hatch Nov 24 '17 at 21:48
  • 1
    @DonHatch I want to keep it because it serves the needs of most people. I do point out that it assumes that there are no duplicates at the beginning, what more do you want? The order of addition is a different issue which is not mentioned in the question and not solved by the duplicate removal code. – Gilles 'SO- stop being evil' Nov 24 '17 at 21:54
38

Here's an intelligible one-liner solution that does all the right things: removes duplicates, preserves the ordering of paths, and doesn't add a colon at the end. So it should give you a deduplicated PATH that gives exactly the same behavior as the original:

PATH="$(perl -e 'print join(":", grep { not $seen{$_}++ } split(/:/, $ENV{PATH}))')"

It simply splits on colon (split(/:/, $ENV{PATH})), uses uses grep { not $seen{$_}++ } to filter out any repeated instances of paths except for the first occurrence, and then joins the remaining ones back together separated by colons and prints the result (print join(":", ...)).

If you want some more structure around it, as well as the ability to deduplicate other variables as well, try this snippet, which I'm currently using in my own config:

# Deduplicate path variables
get_var () {
    eval 'printf "%s\n" "${'"$1"'}"'
}
set_var () {
    eval "$1=\"\$2\""
}
dedup_pathvar () {
    pathvar_name="$1"
    pathvar_value="$(get_var "$pathvar_name")"
    deduped_path="$(perl -e 'print join(":",grep { not $seen{$_}++ } split(/:/, $ARGV[0]))' "$pathvar_value")"
    set_var "$pathvar_name" "$deduped_path"
}
dedup_pathvar PATH
dedup_pathvar MANPATH

That code will deduplicate both PATH and MANPATH, and you can easily call dedup_pathvar on other variables that hold colon-separated lists of paths (e.g. PYTHONPATH).

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Ryan C. Thompson
  • 5,398
  • 6
  • 29
  • 23
  • For some reason I had to add a `chomp` to remove a trailing newline. This worked for me: `perl -ne 'chomp; print join(":", grep { !$seen{$_}++ } split(/:/))' <<<"$PATH"` – Håkon Hægland Dec 28 '14 at 19:05
  • I can't get this fix to persist. It does clean duplicates from `$PATH`, but if I open a new Ubuntu WSL2 command prompt window, my `$PATH` is back to having duplicates. How can I make this permanent? – Kyle Vassella Dec 24 '20 at 15:39
  • @KyleVassella Did you add this code to your shell startup file? – Ryan C. Thompson Dec 25 '20 at 03:44
20

Here's a sleek one:

printf %s "$PATH" | awk -v RS=: -v ORS=: '!arr[$0]++'

Longer (to see how it works):

printf %s "$PATH" | awk -v RS=: -v ORS=: '{ if (!arr[$0]++) { print $0 } }'

Ok, since you're new to linux, here is how to actually set PATH without a trailing ":"

PATH=`printf %s "$PATH" | awk -v RS=: '{ if (!arr[$0]++) {printf("%s%s",!ln++?"":":",$0)}}'`

btw make sure to NOT have directories containing ":" in your PATH, otherwise it is gonna be messed up.

some credit to:

SherylHohman
  • 477
  • 5
  • 11
akostadinov
  • 944
  • 10
  • 19
  • -1 this doesn't work. I still see duplicates in my path. – dogbane Jun 14 '12 at 07:34
  • 4
    @dogbane: It removes duplicates for me. However it has a subtle problem. The output has a : on the end which if set as your $PATH, means the current directory is added the path. This has security implications on a multi-user machine. – camh Jun 14 '12 at 07:42
  • @dogbane, it works and I edited post to have a one line command without the trailing : – akostadinov Jun 14 '12 at 07:59
  • @dogbane your solution has a trailing : in the output – akostadinov Jun 14 '12 at 08:12
  • hmm, your third command works, but the first two do not work unless I use `echo -n`. Your commands don't seem to work with "here strings" e.g. try: `awk -v RS=: -v ORS=: '!arr[$0]++' <<< ".:/foo/bin:/bar/bin:/foo/bin"` – dogbane Jun 14 '12 at 08:32
  • so which one will actually give me the desired result? – Johnny Williem Jun 14 '12 at 09:01
  • @dogbane, right, initially I didn't notice the extra line and when I wrote the third command I forgot to update the other two. wrt <<< it adds a new line at end like echo without -n. It is a bash extension though so not portable and does not provide any advantages over piping for this task. Johnny Williem, use the third command that starts with PATH= – akostadinov Jun 14 '12 at 10:31
  • 1
    Note that `echo -n` outputs `-n` in Unix-compliant `echo` implementations. The standard way to output a $string without the trailing newline character is `printf %s "$string"`, hence Gilles' edit. Generally [you can't use `echo` for arbitrary data](/q/65803) – Stéphane Chazelas Sep 05 '16 at 13:47
  • @StéphaneChazelas, ok, old UNIXes. Btw the new line was confusing `awk` so last entry was not deduplicated. Thanks to Gilles for catching that (and fixing portability). – akostadinov Sep 05 '16 at 14:12
  • @akostadinov, not only _old_. That's the Unix requirement as in the latest version of the Unix specification (from 2013, same goes for the 2016 specification which is going out shortly). For instance `/bin/sh` on OS/X is based on `bash` and `echo -n` outputs `-n` like the Unix specification requires (POSIX leaves the behaviour unspecified for `echo -n`) – Stéphane Chazelas Sep 05 '16 at 15:07
  • extremely sweet! I just love one-liners... – MoVod Jun 01 '19 at 11:58
  • 1
    Problem I ran into, duplicates with and without trailing slashes "/foo/bar:/foo/bar/" will not removed - however, they are equivalent within the PATH variable. – Christian Herenz Dec 10 '19 at 19:03
  • 1
    @ChristianHerenz, maybe `awk` can also split on `/:` and `:` at the same time, maybe with regular expression/pattern. Not sure ATM but might be a good thing to explore if you want to improve current solution. – akostadinov Dec 12 '19 at 08:25
  • Why do you use printf rather than echo? – einpoklum Feb 12 '20 at 14:14
9

Here is an AWK one liner.

$ PATH=$(printf %s "$PATH" \
     | awk -vRS=: -vORS= '!a[$0]++ {if (NR>1) printf(":"); printf("%s", $0) }' )

where:

  • printf %s "$PATH" prints the content of $PATH without a trailing newline
  • RS=: changes the input record delimiter character (default is newline)
  • ORS= changes the output record delimiter to the empty string
  • a the name of an implicitly created array
  • $0 references the current record
  • a[$0] is a associative array dereference
  • ++ is the post-increment operator
  • !a[$0]++ guards the right hand side, i.e. it makes sure that the current record is only printed, if it wasn't printed before
  • NR the current record number, starting with 1

That means that AWK is used to split the PATH content along the : delimiter characters and to filter out duplicate entries without modifying the order.

Since AWK associative arrays are implemented as hash tables the runtime is linear (i.e. in O(n)).

Note that we don't need look for quoted : characters because shells don't provide quoting to support directories with : in its name in the PATH variable.

Awk + paste

The above can be simplified with paste:

$ PATH=$(printf %s "$PATH" | awk -vRS=: '!a[$0]++' | paste -s -d:)

The paste command is used to intersperse the awk output with colons. This simplifies the awk action to printing (which is the default action).

Python

The same as Python two-liner:

$ PATH=$(python3 -c 'import os; from collections import OrderedDict; \
    l=os.environ["PATH"].split(":"); print(":".join(OrderedDict.fromkeys(l)))' )
maxschlepzig
  • 56,316
  • 50
  • 205
  • 279
  • ok, but does this remove dupes from an existing colon delimited string, or does it prevent dupes from being added to a string? – Alexander Mills Dec 10 '16 at 09:59
  • 1
    looks like the former – Alexander Mills Dec 10 '16 at 10:00
  • 2
    @AlexanderMills, well, the OP just asked about removing duplicates so this is what the awk call does. – maxschlepzig Dec 10 '16 at 18:59
  • 1
    The `paste` command doesn't work for me unless I add a trailing `-` to use STDIN. – wisbucky Apr 24 '17 at 21:11
  • @wisbucky, hm, does your paste prints some error message? I tested it with 'paste (GNU coreutils) 8.25'. – maxschlepzig Apr 24 '17 at 21:27
  • It prints `usage: paste [-s] [-d delimiters] file ...`. This is on mac, which I think uses BSD not GNU versions. – wisbucky Apr 24 '17 at 21:47
  • 2
    Also, I need to add spaces after the `-v` or else I get an error. `-v RS=: -v ORS=`. Just different flavors of `awk` syntax. – wisbucky Apr 24 '17 at 21:56
  • For those that don't understand the `!a[$0]++` part, what's going on is that 1) `a[$0]++` is creating an associative array with the `path` as the `key`, and the incrementing `count` as the `value`. The first time a unique path is seen, the `value` will be initialized to `0` and incremented to `1`. The second time a path is seen, the value will be incremented to `2`, etc. To see this clearly, run this command: `printf %s "$PATH" | awk -v RS=: '{print a[$0]++, $0 }'` – wisbucky Apr 25 '17 at 23:15
  • 2) In `awk`, the statement before the `{action}` is a `pattern`. If `pattern` is `TRUE`, then execute the `{action}`. Any nonzero number is `TRUE`, `0` is `FALSE`. The first time a path is seen, the `value` of `a[$0]` is `0` (remember, we are post-incrementing), which evaluates to `FALSE`. The negated value `!` is `TRUE`. Therefore, it executes the `{action}`, which is to print the path. All subsequent occurrences of the same `path` will have `value` > 0, so they evaluate to `TRUE`, and the negated values are `FALSE`. Therefore, the `{action}` is not executed. – wisbucky Apr 25 '17 at 23:16
6

As long as we are adding non-awk oneliners:

PATH=$(zsh -fc "typeset -TU P=$PATH p; echo \$P")

(Could be as simple as PATH=$(zsh -fc 'typeset -U path; echo $PATH') but zsh always reads at least one zshenv configuration file, which can modify PATH.)

It uses two nice zsh features:

  • scalars tied to arrays (typeset -T)
  • and arrays that autoremove duplicate values (typeset -U).
4

As others have demonstrated it is possible in one line using awk, sed, perl, zsh, or bash, depends on your tolerance for long lines and readability. Here's a bash function that

  • removes duplicates
  • preserves order
  • allows spaces in directory names
  • allows you to specify the delimiter (defaults to ':')
  • can be used with other variables, not just PATH
  • works in bash versions < 4, important if you use OS X which for licensing issues does not ship bash version 4

bash function

remove_dups() {
    local D=${2:-:} path= dir=
    while IFS= read -d$D dir; do
        [[ $path$D =~ .*$D$dir$D.* ]] || path+="$D$dir"
    done <<< "$1$D"
    printf %s "${path#$D}"
}

usage

To remove dups from PATH

PATH=$(remove_dups "$PATH")

If path+="$D$dir" above is replaced with path="$path$D$dir", then this function also deduplicates entries correctly in zsh (in addition to bash). Without this change, a space will be inserted before every colon when this function is used in zsh.

amdn
  • 141
  • 4
4

There has been a similar discussion about this here.

I take a bit of a different approach. Instead of just accepting the PATH that is set from all the different initialization files that get installed, I prefer using getconf to identify the system path and place it first, then add my preferred path order, then use awk to remove any duplicates. This may or may not really speed up command execution (and in theory be more secure), but it gives me warm fuzzies.

# I am entering my preferred PATH order here because it gets set,
# appended, reset, appended again and ends up in such a jumbled order.
# The duplicates get removed, preserving my preferred order.
#
PATH=$(command -p getconf PATH):/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:$PATH
# Remove duplicates
PATH="$(printf "%s" "${PATH}" | /usr/bin/awk -v RS=: -v ORS=: '!($0 in a) {a[$0]; print}')"
export PATH

[~]$ echo $PATH
/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/lib64/ccache:/usr/games:/home/me/bin
George M
  • 13,589
  • 4
  • 43
  • 53
  • 3
    This is very dangerous because you add a trailing `:` to the `PATH` (i.e. an empty string entry), because then the current working directory is part of your `PATH`. – maxschlepzig Apr 13 '14 at 10:05
2

Recent bash versions (>= 4) also of associative arrays, i.e. you can also use a bash 'one liner' for it:

PATH=$(IFS=:; set -f; declare -A a; NR=0; for i in $PATH; do NR=$((NR+1)); \
       if [ \! ${a[$i]+_} ]; then if [ $NR -gt 1 ]; then echo -n ':'; fi; \
                                  echo -n $i; a[$i]=1; fi; done)

where:

  • IFS changes the input field separator to :
  • declare -A declares an associative array
  • ${a[$i]+_} is a parameter expansion meaning: _ is substituted if and only if a[$i] is set. This is similar to ${parameter:+word} which also tests for not-null. Thus, in the following evaluation of the conditional, the expression _ (i.e. a single character string) evaluates to true (this is equivalent to -n _) - while an empty expression evaluates to false.
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
maxschlepzig
  • 56,316
  • 50
  • 205
  • 279
  • +1: nice script style, but can you explain the particular syntax: `${a[$i]+_}` by editing your answer and adding one bullet. The rest is perfectly understandable but you lost me there. Thank you. – Cbhihe Jul 12 '16 at 11:28
  • 1
    @Cbhihe, I've added a bullet point that addresses this expansion. – maxschlepzig Jul 12 '16 at 19:23
  • Thank you very much. Very interesting. I did not think that was possible with arrays (non-strings)... – Cbhihe Jul 12 '16 at 20:52
2

Also sed (here using GNU sed syntax) can do the job:

MYPATH=$(printf '%s\n' "$MYPATH" | sed ':b;s/:\([^:]*\)\(:.*\):\1/:\1\2/;tb')

this one works well only in case first path is . like in dogbane's example.

In general case you need to add yet another s command:

MYPATH=$(printf '%s\n' "$MYPATH" | sed ':b;s/:\([^:]*\)\(:.*\):\1/:\1\2/;tb;s/^\([^:]*\)\(:.*\):\1/:\1\2/')

It works even on such construction:

$ echo "/bin:.:/foo/bar/bin:/usr/bin:/foo/bar/bin:/foo/bar/bin:/bar/bin:/usr/bin:/bin" \
| sed ':b;s/:\([^:]*\)\(:.*\):\1/:\1\2/;tb;s/^\([^:]*\)\(:.*\):\1/\1\2/'

/bin:.:/foo/bar/bin:/usr/bin:/bar/bin
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
rush
  • 27,055
  • 7
  • 87
  • 112
2
PATH=`perl -e 'print join ":", grep {!$h{$_}++} split ":", $ENV{PATH}'`
export PATH

This uses perl and has several benefits:

  1. It removes duplicates
  2. It keeps sort order
  3. It keeps the earliest appearance (/usr/bin:/sbin:/usr/bin will result in /usr/bin:/sbin)
vol7ron
  • 175
  • 2
  • 10
2
PATH=`awk -F: '{for (i=1;i<=NF;i++) { if ( !x[$i]++ ) printf("%s:",$i); }}' <<< "$PATH"`

Explanation of awk code:

  1. Separate the input by colons.
  2. Append new path entries to associative array for fast duplicate look-up.
  3. Prints the associative array.

In addition to being terse, this one-liner is fast: awk uses a chaining hash-table to achieve amortized O(1) performance.

based on Removing duplicate $PATH entries

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Leftium
  • 173
  • 1
  • 8
1

The right way of removing duplicates suggested by @ghm1014 and @rush (in the comment) using sort:

PATH=`echo -e ${PATH//:/'\n'} | awk '{printf("%d|%s\n", NR, $0)}' | sort -t '|' -k 2 -u | sort -t '|' -k 1 -g | cut -f2 -d'|'`; export PATH=${PATH//[$'\n']/:}
Varp
  • 11
  • 2
1

Since people resorted to suggesting perl oneliners, I'll also throw a coin into the pot. This one is in ruby (and way less cryptic that everything else, IMO):

PATH="$(ruby -e 'puts ENV["PATH"].split(":").uniq.join(":")')"
Dan S.
  • 11
  • 3
1

This is my version:

path_no_dup () 
{ 
    local IFS=: p=();

    while read -r; do
        p+=("$REPLY");
    done < <(sort -u <(read -ra arr <<< "$1" && printf '%s\n' "${arr[@]}"));

    # Do whatever you like with "${p[*]}"
    echo "${p[*]}"
}

Usage: path_no_dup "$PATH"

Sample output:

rany$ v='a:a:a:b:b:b:c:c:c:a:a:a:b:c:a'; path_no_dup "$v"
a:b:c
rany$
Rany Albeg Wein
  • 670
  • 4
  • 14
0

A solution - not one that is as elegant as those that change the *RS variables, but perhaps reasonably clear:

PATH=`awk 'BEGIN {np="";split(ENVIRON["PATH"],p,":"); for(x=0;x<length(p);x++) {  pe=p[x]; if(e[pe] != "") continue; e[pe] = pe; if(np != "") np=np ":"; np=np pe}} END { print np }' /dev/null`

The entire program works in the BEGIN and END blocks. It pulls your PATH variable from the environment, splitting it into units. It then iterates over the resulting array p (which is created in order by split()). The array e is an associative array that is used to determine whether or not we've seen the current path element (e.g. /usr/local/bin) before, and if not, is appended to np, with logic to append a colon to np if there is already text in np. The END block simply echos np. This could be further simplified by adding the -F: flag, eliminating the third argument to split() (as it defaults to FS), and changing np = np ":" to np = np FS, giving us:

awk -F: 'BEGIN {np="";split(ENVIRON["PATH"],p); for(x=0;x<length(p);x++) {  pe=p[x]; if(e[pe] != "") continue; e[pe] = pe; if(np != "") np=np FS; np=np pe}} END { print np }' /dev/null

Naïvely, I believed that for(element in array) would preserve order, but it doesn’t, so my original solution doesn’t work, as folks would get upset if someone suddenly scrambled the order of their $PATH:

awk 'BEGIN {np="";split(ENVIRON["PATH"],p,":"); for(x in p) { pe=p[x]; if(e[pe] != "") continue; e[pe] = pe; if(np != "") np=np ":"; np=np pe}} END { print np }' /dev/null
Andrew Beals
  • 101
  • 4
0
export PATH=$(echo -n "$PATH" | awk -v RS=':' '(!a[$0]++){if(b++)printf(RS);printf($0)}')

Only the first occurrence is kept and relative order is well maintained.

Cyker
  • 4,174
  • 6
  • 34
  • 45
0

Use awk to split the path on :, then loop over each field and store it in an array. If you come across a field which is already in the array, that means you have seen it before, so don't print it.

Here is an example:

$ MYPATH=.:/foo/bar/bin:/usr/bin:/foo/bar/bin
$ awk -F: '{for(i=1;i<=NF;i++) if(!($i in arr)){arr[$i];printf s$i;s=":"}}' <<< "$MYPATH"
.:/foo/bar/bin:/usr/bin

(Updated to remove the trailing :.)

dogbane
  • 29,087
  • 16
  • 80
  • 60
0

(To explain @Michał Politowski 's answer:)

for zsh:

PATH=$(zsh --no-rcs -c "P_scaler=$PATH ; typeset -T P_scaler  P_array ;  typeset -U P_array ; echo \$P_scaler")
Good Pen
  • 175
  • 5
-1

I would do it just with basic tools such as tr, sort and uniq:

NEW_PATH=`echo $PATH | tr ':' '\n' | sort | uniq | tr '\n' ':'`

If there is nothing special or weird in your path it should work

ghm1014
  • 1,527
  • 1
  • 11
  • 8