Convert underscore to PascalCase, ie UpperCamelCase

Question

If I have a string that looks like this:

"this_is_the_string"

Inside a bash script, I would like to convert it to PascalCase, ie UpperCamelCase to look like this:

"ThisIsTheString"

I found that converting to lowerCamelCase can be done like this:

"this_is_the_string" | sed -r 's/([a-z]+)_([a-z])([a-z]+)/\1\U\2\L\3/'

Unfortunately I am not familiar enough with regexes to modify this.

(1) This doesn’t really matter, as far as this question (and the answers presented so far) are concerned, but, FYI, `\U\2` inserts the found text from the second group, converted to ALL CAPS. Compare to `\u\2`, which inserts the text in Sentence case, with only the first character capitalized. (2) All of the examples given below will translate “this_is_a_string” to “ThisIsAString” — which is what you asked for, but is slightly hard to read. You might want to revise your requirements for the special case of a one-letter word (substring). … (Cont’d) — Scott - Слава Україні, Apr 14 '15 at 19:58
(Cont’d) … (3) Do you have only one such string per line? And is it always the first (or the _only_) text on the line? If you have a string that’s not at the beginning of the line, the below answers will convert it to lowerCamelCase. To fix, take Janis’s answer and change `(^|_)` to `(\<|_)`. — Scott - Слава Україні, Apr 14 '15 at 19:58
inverse: http://stackoverflow.com/questions/28795479/awk-sed-script-to-convert-a-file-from-camelcase-to-underscores — Ciro Santilli OurBigBook.com, Feb 01 '16 at 17:06

Janis · Accepted Answer · 2015-04-14T19:46:58.547

56

$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g'            
ThisIsTheString

Substitute pattern
(^|_) at the start of the string or after an underscore - first group
([a-z]) single lower case letter - second group
by
\U\2 uppercasing second group
g globally.

edited Apr 14 '15 at 19:46

answered Apr 14 '15 at 19:09

Janis

14,014
3
25
42

7

Note: `\U` is a GNU extension to POSIX. – Ciro Santilli OurBigBook.com Nov 19 '17 at 10:47
2

Just a note, you should capture numbers too `sed -r 's/(^|[-_ ]+)([0-9a-z])/\U\2/g'`. So strings like *"this_is_2nd_string"* work too. – pinkeen Jul 01 '19 at 23:43
6

How can I achieve this with non-GNU sed? – Cameron Hudson Feb 14 '20 at 19:26
1

not working well on mac ~$ bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21) Copyright (C) 2007 Free Software Foundation, Inc. ~$ ~$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g' UthisUisUtheUstring – Nir O. Mar 26 '23 at 15:58

score 13 · Answer 2 · answered Apr 14 '15 at 19:37

13

Here's a Perl way:

$ echo "this_is_the_string" | perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
ThisIsTheString

It can deal with strings of arbitrary length:

$ echo "here_is_another_larger_string_with_more_parts" | 
    perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
HereIsAnotherLargerStringWithMoreParts

It will match any character (.) that comes after either the start of the string or an underscore ((^|_)) and replace it with the upper case version of itself (uc($&)). The $& is a special variable that contains whatever was just matched. The e at the end of s///ge allows the use of expressions (the uc() function in this case) within the substitution and the g makes it replace all occurrences in the line. The second substitution removes the underscores.

answered Apr 14 '15 at 19:37

terdon

234,489
66
447
667

1

Speaking of perl, there's also a perl module [String::CamelCase](http://search.cpan.org/~hio/String-CamelCase-0.02/lib/String/CamelCase.pm) that "camelizes" underscored text. – don_crissti Apr 15 '15 at 12:01
@don_crissti ooh, sounds perfect for this. Thanks. – terdon Apr 15 '15 at 12:06
Shorter Perl: `perl -pe 's/(^|_)([a-z])/uc($2)/ge'` – Jan 12 '18 at 22:29
Or: `perl -pe's/_*([^_]*)/\u$1/g'` – Stéphane Chazelas Nov 01 '20 at 15:07
and how dow we assign the output to another variable? To call it without echo? (sorry a newbie) – Rahul Gandhi Jul 29 '21 at 15:12
1

@RahulGandhi please see [How can I assign the output of a command to a shell variable?](https://unix.stackexchange.com/q/16024) – terdon Jul 29 '21 at 15:14

don_crissti · Answer 3 · 2015-04-16T21:58:53.143

Since you're using bash, if you stored your string in a variable you could also do it shell-only:

uscore="this_is_the_string_to_be_converted"
arr=(${uscore//_/ })
printf %s "${arr[@]^}"
ThisIsTheStringToBeConverted

${uscore//_/ } replaces all _ with space, (....) splits the string into an array, ${arr[@]^} converts the first letter of each element to upper case and then printf %s .. prints all elements one after another.
You can store the camel-cased string into another variable:

printf -v ccase %s "${arr[@]^}"

and use/reuse it later, e.g.:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

Or, with zsh:

uscore="this_is_the_string_to_be_converted"
arr=(${(s:_:)uscore})
printf %s "${(C)arr}"
ThisIsTheStringToBeConverted

(${(s:_:)uscore}) splits the string on _ into an array, (C) capitalizes the first letter of each element and printf %s ... prints all elements one after another..
To store it in another variable you could use (j::) to joins the elements:

ccase=${(j::)${(C)arr}}

and use/reuse it later:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

This seems a great solution, but unfortunately doesn't work on mac whose bash version is stuck at 3.2.57 because of license issues. — wlnirvana, Aug 05 '20 at 13:51
@wlnirvana, AFAIK macOS has always come with `zsh` (even used to be `/bin/sh` there and it's the default interactive shell in newer versions I'm told) where it's just `${(j[])${(s[_]C)string}}` or `${${(C)string}//_}` — Stéphane Chazelas, Nov 01 '20 at 16:15

score 6 · Answer 4 · edited Apr 15 '15 at 11:51

6

It is not necessary to represent the entire string in a regular expression match -- sed has the /g modifier that allows you to walk over multiple matches and replace each of them:

echo "this_is_the_string" | sed 's/_\([a-z]\)/\U\1/g;s/^\([a-z]\)/\U\1/g'

The first regex is _$[a-z]$ -- each letter after underscore; the second one matches the first letter in a string.

edited Apr 15 '15 at 11:51

Community

1

answered Apr 14 '15 at 19:08

myaut

1,411
10
12

ctrl-alt-delor · Answer 5 · 2015-04-14T21:25:44.733

6

I only put in this answer because it is shorter and simpler than any other so far.

sed -re "s~(^|_)(.)~\U\2~g"

It says: upcase, the character following a _ or the start. Non letters will not be changed, as they have no case.

edited Apr 14 '15 at 21:25

answered Apr 14 '15 at 21:18

ctrl-alt-delor

27,473
9
58
102

1

"Everything should be made as simple as possible, but not simpler." – Albert Einstein. This is not equivalent to the other answers; your answer will convert "FOO_BAR" to "FOOBAR", while the other answers will leave it alone. – Scott - Слава Україні Apr 14 '15 at 21:51
@scott Ah yes, I did not think of that. – ctrl-alt-delor Apr 14 '15 at 21:56
1

@Scott Isn't that the desired behavior? I guess that ideally, it should become `FooBar` but the underscore should be removed as per instructions. As I understand the instructions anyway. – terdon Apr 15 '15 at 10:24
@terdon: “Isn’t that the desired behavior?” (1) I don’t know. And I don’t believe that we _can_ know what the OP wants unless he tells us; the question is insufficiently explicit. (2) I occasionally chastise people for making unwarranted assumptions about the potential input from the example(s) presented. But, considering that the question is about case conversion, I believe it’s valid to extrapolate (from the fact that the example is all lower case) to the assumption that the OP wants to manipulate only lower case strings. … (Cont’d) – Scott - Слава Україні Apr 16 '15 at 04:33
2

(Cont’d) … (3) I think it’s somewhat clear that the spirit of the question is to transform a string so that word breaks indicated by underscores (`_`) are instead indicated by case transitions. Given that, “FOO_BAR” → “FOOBAR” is clearly wrong (as it discards the word break information), although “FOO_BAR” → “FooBar” may be correct. (4) Similarly, a mapping that causes collisions seems to be contrary to the spirit of the question. For example, I believe that an answer that converts “DO_SPORTS” and “DOS_PORTS” to the same target is wrong. – Scott - Слава Україні Apr 16 '15 at 04:34
1

(Cont’d again) … (5) In the spirit of not causing collisions, it seems to me that “foo_bar” and “FOO_BAR” should not map to the same thing, so therefore I object to “FOO_BAR” → “FooBar”. (6) I think the bigger issue is namespaces. I haven’t programmed in Pascal since Blaise was alive, but in C/C++, by convention, identifiers that are primarily in lower case (to include snake_case and CamelCase) are generally the domain of the compiler, while identifiers in upper case are the domain of the pre-processor. So that’s why I think that the OP didn’t want ALL_CAPS identifiers to be considered. – Scott - Слава Україні Apr 21 '15 at 05:05

score 4 · Answer 6 · answered Sep 26 '18 at 21:22

In perl:

$ echo 'alert_beer_core_hemp' | perl -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
AlertBeerCoreHemp

This is also i18n-able:

$ echo 'алерт_беер_коре_хемп' | perl -CIO -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
АлертБеерКореХемп

score 1 · Answer 7 · answered Sep 27 '19 at 19:29

1

I did it this way:

echo "this_is_the_string" | sed -r 's/(\<|_)([[:alnum:]])/\U\2/g'

and got this result:

ThisIsTheString

answered Sep 27 '19 at 19:29

Fábio Roberto Teodoro

151
4

score 0 · Answer 8 · answered Nov 01 '20 at 12:15

0

My choice is:

echo "this_is-the_string-2.0" |  perl -pe 's/(?:^|[^a-z])([a-z0-9])/\u$1/g'

Which results in:

ThisIsTheString20

answered Nov 01 '20 at 12:15

drAlberT

101
3

1

Or `perl -pe 's/([a-z0-9]+)|./\u$1/g'` – Stéphane Chazelas Nov 01 '20 at 15:10
nice, but I find it a bit cryptic in fact – drAlberT Nov 01 '20 at 15:33

Convert underscore to PascalCase, ie UpperCamelCase

8 Answers8

Linked

Related