36

If I have a string that looks like this:

"this_is_the_string"

Inside a bash script, I would like to convert it to PascalCase, ie UpperCamelCase to look like this:

"ThisIsTheString"

I found that converting to lowerCamelCase can be done like this:

"this_is_the_string" | sed -r 's/([a-z]+)_([a-z])([a-z]+)/\1\U\2\L\3/'

Unfortunately I am not familiar enough with regexes to modify this.

pestophagous
  • 312
  • 2
  • 8
user1135541
  • 731
  • 4
  • 9
  • 13
  • (1) This doesn’t really matter, as far as this question (and the answers presented so far) are concerned, but, FYI, `\U\2` inserts the found text from the second group, converted to ALL CAPS.  Compare to `\u\2`, which inserts the text in Sentence case, with only the first character capitalized.  (2) All of the examples given below will translate “this_is_a_string” to “ThisIsAString” — which is what you asked for, but is slightly hard to read.  You might want to revise your requirements for the special case of a one-letter word (substring).  … (Cont’d) – Scott - Слава Україні Apr 14 '15 at 19:58
  • (Cont’d) …  (3) Do you have only one such string per line?  And is it always the first (or the _only_) text on the line?  If you have a string that’s not at the beginning of the line, the below answers will convert it to lowerCamelCase.  To fix, take Janis’s answer and change `(^|_)` to `(\<|_)`. – Scott - Слава Україні Apr 14 '15 at 19:58
  • 1
    inverse: http://stackoverflow.com/questions/28795479/awk-sed-script-to-convert-a-file-from-camelcase-to-underscores – Ciro Santilli OurBigBook.com Feb 01 '16 at 17:06

8 Answers8

56
$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g'            
ThisIsTheString

Substitute pattern
(^|_) at the start of the string or after an underscore - first group
([a-z]) single lower case letter - second group
by
\U\2 uppercasing second group
g globally.

Janis
  • 14,014
  • 3
  • 25
  • 42
13

Here's a Perl way:

$ echo "this_is_the_string" | perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
ThisIsTheString

It can deal with strings of arbitrary length:

$ echo "here_is_another_larger_string_with_more_parts" | 
    perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
HereIsAnotherLargerStringWithMoreParts

It will match any character (.) that comes after either the start of the string or an underscore ((^|_)) and replace it with the upper case version of itself (uc($&)). The $& is a special variable that contains whatever was just matched. The e at the end of s///ge allows the use of expressions (the uc() function in this case) within the substitution and the g makes it replace all occurrences in the line. The second substitution removes the underscores.

terdon
  • 234,489
  • 66
  • 447
  • 667
13

Since you're using bash, if you stored your string in a variable you could also do it shell-only:

uscore="this_is_the_string_to_be_converted"
arr=(${uscore//_/ })
printf %s "${arr[@]^}"
ThisIsTheStringToBeConverted

${uscore//_/ } replaces all _ with space, (....) splits the string into an array, ${arr[@]^} converts the first letter of each element to upper case and then printf %s .. prints all elements one after another.
You can store the camel-cased string into another variable:

printf -v ccase %s "${arr[@]^}"

and use/reuse it later, e.g.:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

Or, with zsh:

uscore="this_is_the_string_to_be_converted"
arr=(${(s:_:)uscore})
printf %s "${(C)arr}"
ThisIsTheStringToBeConverted

(${(s:_:)uscore}) splits the string on _ into an array, (C) capitalizes the first letter of each element and printf %s ... prints all elements one after another..
To store it in another variable you could use (j::) to joins the elements:

ccase=${(j::)${(C)arr}}

and use/reuse it later:

printf %s\\n $ccase
ThisIsTheStringToBeConverted
don_crissti
  • 79,330
  • 30
  • 216
  • 245
  • 1
    This seems a great solution, but unfortunately doesn't work on mac whose bash version is stuck at 3.2.57 because of license issues. – wlnirvana Aug 05 '20 at 13:51
  • @wlnirvana, AFAIK macOS has always come with `zsh` (even used to be `/bin/sh` there and it's the default interactive shell in newer versions I'm told) where it's just `${(j[])${(s[_]C)string}}` or `${${(C)string}//_}` – Stéphane Chazelas Nov 01 '20 at 16:15
6

It is not necessary to represent the entire string in a regular expression match -- sed has the /g modifier that allows you to walk over multiple matches and replace each of them:

echo "this_is_the_string" | sed 's/_\([a-z]\)/\U\1/g;s/^\([a-z]\)/\U\1/g'

The first regex is _\([a-z]\) -- each letter after underscore; the second one matches the first letter in a string.

myaut
  • 1,411
  • 10
  • 12
6

I only put in this answer because it is shorter and simpler than any other so far.

sed -re "s~(^|_)(.)~\U\2~g"

It says: upcase, the character following a _ or the start. Non letters will not be changed, as they have no case.

ctrl-alt-delor
  • 27,473
  • 9
  • 58
  • 102
  • 1
    "Everything should be made as simple as possible, but not simpler." – Albert Einstein.  This is not equivalent to the other answers; your answer will convert "FOO_BAR" to "FOOBAR", while the other answers will leave it alone. – Scott - Слава Україні Apr 14 '15 at 21:51
  • @scott Ah yes, I did not think of that. – ctrl-alt-delor Apr 14 '15 at 21:56
  • 1
    @Scott Isn't that the desired behavior? I guess that ideally, it should become `FooBar` but the underscore should be removed as per instructions. As I understand the instructions anyway. – terdon Apr 15 '15 at 10:24
  • @terdon: “Isn’t that the desired behavior?”  (1) I don’t know.  And I don’t believe that we _can_ know what the OP wants unless he tells us; the question is insufficiently explicit.  (2) I occasionally chastise people for making unwarranted assumptions about the potential input from the example(s) presented.  But, considering that the question is about case conversion, I believe it’s valid to extrapolate (from the fact that the example is all lower case) to the assumption that the OP wants to manipulate only lower case strings.  … (Cont’d) – Scott - Слава Україні Apr 16 '15 at 04:33
  • 2
    (Cont’d) …  (3) I think it’s somewhat clear that the spirit of the question is to transform a string so that word breaks indicated by underscores (`_`) are instead indicated by case transitions.  Given that, “FOO_BAR” → “FOOBAR” is clearly wrong (as it discards the word break information), although “FOO_BAR” → “FooBar” may be correct.  (4) Similarly, a mapping that causes collisions seems to be contrary to the spirit of the question.  For example, I believe that an answer that converts “DO_SPORTS” and “DOS_PORTS” to the same target is wrong. – Scott - Слава Україні Apr 16 '15 at 04:34
  • 1
    (Cont’d again) …  (5) In the spirit of not causing collisions, it seems to me that “foo_bar” and “FOO_BAR” should not map to the same thing, so therefore I object to “FOO_BAR” → “FooBar”.  (6) I think the bigger issue is namespaces.  I haven’t programmed in Pascal since Blaise was alive, but in C/C++, by convention, identifiers that are primarily in lower case (to include snake_case and CamelCase) are generally the domain of the compiler, while identifiers in upper case are the domain of the pre-processor.  So that’s why I think that the OP didn’t want ALL_CAPS identifiers to be considered. – Scott - Слава Україні Apr 21 '15 at 05:05
4

In perl:

$ echo 'alert_beer_core_hemp' | perl -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
AlertBeerCoreHemp

This is also i18n-able:

$ echo 'алерт_беер_коре_хемп' | perl -CIO -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
АлертБеерКореХемп
1

I did it this way:

echo "this_is_the_string" | sed -r 's/(\<|_)([[:alnum:]])/\U\2/g'

and got this result:

ThisIsTheString
0

My choice is:

echo "this_is-the_string-2.0" |  perl -pe 's/(?:^|[^a-z])([a-z0-9])/\u$1/g'

Which results in:

ThisIsTheString20
drAlberT
  • 101
  • 3