Counting lines of code?

Question

if I want to count the lines of code, the trivial thing is

cat *.c *.h | wc -l

But what if I have several subdirectories?

http://stackoverflow.com/questions/1358540/how-to-count-all-the-lines-of-code-in-a-directory-recursively — 林果皞, Jun 08 '16 at 12:26
Off-topic: Why the unnecessary `cat`? `wc -l *.c *.h` does the same thing. — Thomas Padron-McCarthy, Jun 08 '16 at 17:50
@ThomasPadron-McCarthy No it doesn't. You'd need `wc -l *.c *.h | tail -n 1` to get similar output. — Gilles 'SO- stop being evil', Jun 08 '16 at 22:16
Note that some (possibly even most) modern shells (Bash v4, Zsh, probably more) provide a recursive-globbing mechanism using `**`, so you could have used `wc -l **/*.{h,c}` or something similar. Note that in Bash, at least, this option (called `globstar`) is *off* by default. But also note that in this particular case, `cloc` or `SLOCCount` is a much better option. (Also, `ack` may be preferable to `find` for easily finding/listing source files.) — Kyle Strand, Jun 08 '16 at 22:31
wc -l counts lines, not lines of code. 7000 blank lines will still show up in wc -l but wouldn't count in a code metric. (comments too usually don't count) — coteyr, Jun 09 '16 at 08:38

Ho1 · Accepted Answer · 2016-06-09T10:38:15.050

74

The easiest way is to use the tool called cloc. Use it this way:

cloc .

That's it. :-)

edited Jun 09 '16 at 10:38

answered Jun 08 '16 at 16:24

Ho1

2,552
3
20
25

2

-1 because this program doesn't have any way to recognise lines of code in languages outside of its little, boring brain. It knows about Ada and Pascal and C and C++ and Java and JavaScript and "enterprise" type languages, but it refuses to count the SLOC by just file extension, and is thus completely useless for DSLs, or even languages it just happens to not know about. – cat Jun 09 '16 at 11:39
29

@cat Nothing is perfect, and nothing can fulfill all your past and future demands. – Ho1 Jun 09 '16 at 12:26
2

Well, the programming language which CLOC refuses to acknowledge does indeed fulfill all my past and future demands :) – cat Jun 09 '16 at 12:28
12

@cat according to the CLOC documentation it can read in a language definition file, so there is a way to get it to recognize code in languages it hasn't defined. Plus it's open source, so you can always extend it to make it better! – Centimane Jun 15 '16 at 18:28

Stephen Kitt · Answer 2 · 2020-05-26T13:23:01.280

44

You should probably use SLOCCount or cloc for this, they're designed specifically for counting lines of source code in a project, regardless of directory structure etc.; either

sloccount .

or

cloc .

will produce a report on all the source code starting from the current directory.

If you want to use find and wc, GNU wc has a nice --files0-from option:

find . -name '*.[ch]' -print0 | wc --files0-from=- -l

(Thanks to SnakeDoc for the cloc suggestion!)

edited May 26 '20 at 13:23

answered Jun 08 '16 at 11:40

Stephen Kitt

411,918
54
1,065
1,164

+1 for sloccount. Interestingly, running `sloccount /tmp/stackexchange` (created again on May 17 after my most recent reboot) says that the estimated cost to develop the sh, perl, awk, etc files it found is $11,029. and that doesn't include the one-liners that never made it into a script file. – cas Jun 08 '16 at 11:50
12

Estimating cost based on lines of code? What about all the people employed to re-factor spaghetti into something maintainable? – OrangeDog Jun 08 '16 at 16:08
@OrangeDog you could always try to account for that in the overhead; see the [documentation](http://www.dwheeler.com/sloccount/sloccount.html#using-basics) for an explanation of the calculation (with very old salary data) and the parameters you can tweak. – Stephen Kitt Jun 08 '16 at 16:12
@StephenKitt> still, the main issue is it's counting backwards. When cleaning up code, you often end up with less lines. Sure you could try to handwave an overhead to incur on the rest of the code to account for the removed one, but I don't see how it's better than just guessing the whole price in the first place. – spectras Jun 09 '16 at 06:51

score 15 · Answer 3 · answered Jun 08 '16 at 11:51

As the wc command can take multiple arguments, you can just pass all the filenames to wc using the + argument of the -exec action of GNU find:

find . -type f -name '*.[ch]' -exec wc -l {} +

Alternately, in bash, using the shell option globstar to traverse the directories recursively:

shopt -s globstar
wc -l **/*.[ch]

Other shells traverse recursively by default (e.g. zsh) or have similar option like globstar, well, at least most ones.

score 5 · Answer 4 · edited Jun 10 '16 at 06:31

If you are in an environment where you don't have access to cloc etc I'd suggest

find -name '*.[ch]' -type f -exec cat '{}' + | grep -c '[^[:space:]]'

Run-through: find searches recursively for all the regular files whose name ends in either .c or .h and runs cat on them. The output is piped through grep to count all the non-blank lines (the ones that contain at least one non-spacing character).

Vombat · Answer 5 · 2016-06-08T12:00:47.860

4

You can use find together with xargs and wc:

find . -type f -name '*.h' -o -name '*.c' | xargs wc -l

edited Jun 08 '16 at 12:00

answered Jun 08 '16 at 11:36

Vombat

12,654
13
44
58

2

(that assumes file paths don't contain blanks, newlines, single quote, double quote of backslash characters though. It may also output several `total` lines if several `wc`s are being invoked.) – Stéphane Chazelas Jun 09 '16 at 09:16
Perhaps the several `wc` commands problem can be addressed by piping `find` to `while read FILENAME; do . . .done` structure. And inside the while loop use `wc -l`. The rest is summing up the total lines into a variable and displaying it. – Sergiy Kolodyazhnyy Jun 09 '16 at 11:14

score 4 · Answer 6 · edited Jun 11 '20 at 14:16

As has been pointed out in the comments, cat file | wc -l is not equivalent to wc -l file because the former prints only a number whereas the latter prints a number and the filename. Likewise cat * | wc -l will print just a number, whereas wc -l * will print a line of information for each file.

In the spirit of simplicity, let's revisit the question actually asked:

if I want to count the lines of code, the trivial thing is
cat *.c *.h | wc -l
But what if I have several subdirectories?

Firstly, you can simplify even your trivial command to:

cat *.[ch] | wc -l

And finally, the many-subdirectory equivalent is:

find . -name '*.[ch]' -exec cat {} + | wc -l

This could perhaps be improved in many ways, such as restricting the matched files to regular files only (not directories) by adding -type f—but the given find command is the exact recursive equivalent of cat *.[ch].

score 3 · Answer 7 · edited Jun 09 '16 at 09:17

3

Sample using awk:

find . -name '*.[ch]' -exec wc -l {} \; |
  awk '{SUM+=$1}; END { print "Total number of lines: " SUM }'

edited Jun 09 '16 at 09:17

Stéphane Chazelas

522,931
91
1,010
1,501

answered Jun 08 '16 at 11:38

Lambert

12,495
2
26
35

Use `+` in place of `\;`. – Jonathan Leffler Jun 09 '16 at 13:39
@JonathanLeffler Why? – Hastur Jun 09 '16 at 17:56
1

@Hastur: It runs `wc -l` for groups of files, rather like `xargs` does, but it handles odd-ball characters (like spaces) in file names without needing either `xargs` or the (non-standard) `-print0` and `-0` options to `find` and `xargs` respectively. It's a minor optimization. The downside would be that each invocation of `wc` would output a total line count at the end when given multiple files — the `awk` script would have deal with that. So, it's not a slam-dunk, but very often, using `+` in place of `\;` with `find` is a good idea. – Jonathan Leffler Jun 09 '16 at 18:01
@JonathanLeffler Thank you. I agree. My concerns, however, were about the length of the parameter string passed to `wc`. If unknown _a priori_ the number of files that will be _found_, is there the risk to pass that limit or somehow is it handled by find? – Hastur Jun 09 '16 at 19:45
2

@Hastur: `find` groups the files into convenient size bundles, which won't exceed the length limit for the argument list on the platform, allowing for the environment (which comes out of the argument list length — so the length of the argument list plus the length of the environment has to be less than a maximum value). IOW, `find` does the job right, like `xargs` does the job right. – Jonathan Leffler Jun 09 '16 at 19:48

score 1 · Answer 8 · answered Jun 08 '16 at 12:25

1

easy command:

find . -name '*.[ch]' | xargs wc -l

answered Jun 08 '16 at 12:25

malyy

2,107
1
10
9

2

(that assumes file paths don't contain blanks, newlines, single quote, double quote of backslash characters though. It may also output several `total` lines if several `wc`s are being invoked.) – Stéphane Chazelas Jun 09 '16 at 09:16

score 0 · Answer 9 · answered Mar 01 '18 at 03:07

If you're on Linux I recommend my own tool, polyglot. It's dramatically faster than cloc and more featureful than sloccount.

You should be able to build on BSD as well, though there aren't any provided binaries.

You can invoke it with

poly .

score 0 · Answer 10 · answered Dec 19 '21 at 19:05

The new bid on the cloc is Loci.
Link to NPM package It counts code similarly to cloc, but is faster at scale. Also, as its natively written in nodejs it will run on all other environments without Perl (for cloc.pl) or cloc.exe.

It is in its infancy, but you can install it as an NPM CLI tool, or import it as a library into your own project.

Great for environments where you can install script-based npms, but are not allowed to use unapproved binaries

John · Answer 11 · 2016-06-15T17:13:05.037

-2

find . -name \*.[ch] -print | xargs -n 1 wc -l should do the trick. There are several possible variations on that as well, such as using -exec instead of piping the output to wc.

edited Jun 15 '16 at 17:13

answered Jun 08 '16 at 11:32

John

16,759
1
34
43

4

But `find . -name \*.[ch] -print` doesn't print the contents of the files, only the file names. So I count the number of files instead don't I? Do I need `xargs' ? – Niklas Rosencrantz Jun 08 '16 at 11:35
@Programmer400 yes, you'd need `xargs`, and you'd also need to watch for multiple `wc` invocations if you have lots of files; you'd need to look for all the `total` lines and sum them. – Stephen Kitt Jun 08 '16 at 11:49
If you just want the total line count, you'd need to do `find . -name \*.[ch] -print0 | xargs -0 cat | wc -l` – fluffy Jun 08 '16 at 22:28
Note that this (`find . -name \*.[ch] -print | wc -l`) counts the number of files (unless a file name contains a newline — but that's very unusual) — it does not count the number of lines in the files. – Jonathan Leffler Jun 09 '16 at 18:03

Counting lines of code?

11 Answers11

Linked

Related