37

if I want to count the lines of code, the trivial thing is

cat *.c *.h | wc -l

But what if I have several subdirectories?

Mark Stewart
  • 796
  • 1
  • 7
  • 20
Niklas Rosencrantz
  • 4,112
  • 6
  • 36
  • 58
  • 3
    http://stackoverflow.com/questions/1358540/how-to-count-all-the-lines-of-code-in-a-directory-recursively – 林果皞 Jun 08 '16 at 12:26
  • 4
    Off-topic: Why the unnecessary `cat`? `wc -l *.c *.h` does the same thing. – Thomas Padron-McCarthy Jun 08 '16 at 17:50
  • 6
    @ThomasPadron-McCarthy No it doesn't. You'd need `wc -l *.c *.h | tail -n 1` to get similar output. – Gilles 'SO- stop being evil' Jun 08 '16 at 22:16
  • 2
    Note that some (possibly even most) modern shells (Bash v4, Zsh, probably more) provide a recursive-globbing mechanism using `**`, so you could have used `wc -l **/*.{h,c}` or something similar. Note that in Bash, at least, this option (called `globstar`) is *off* by default. But also note that in this particular case, `cloc` or `SLOCCount` is a much better option. (Also, `ack` may be preferable to `find` for easily finding/listing source files.) – Kyle Strand Jun 08 '16 at 22:31
  • 8
    wc -l counts lines, not lines of code. 7000 blank lines will still show up in wc -l but wouldn't count in a code metric. (comments too usually don't count) – coteyr Jun 09 '16 at 08:38

11 Answers11

74

The easiest way is to use the tool called cloc. Use it this way:

cloc .

That's it. :-)

Ho1
  • 2,552
  • 3
  • 20
  • 25
  • 2
    -1 because this program doesn't have any way to recognise lines of code in languages outside of its little, boring brain. It knows about Ada and Pascal and C and C++ and Java and JavaScript and "enterprise" type languages, but it refuses to count the SLOC by just file extension, and is thus completely useless for DSLs, or even languages it just happens to not know about. – cat Jun 09 '16 at 11:39
  • 29
    @cat Nothing is perfect, and nothing can fulfill all your past and future demands. – Ho1 Jun 09 '16 at 12:26
  • 2
    Well, the programming language which CLOC refuses to acknowledge does indeed fulfill all my past and future demands :) – cat Jun 09 '16 at 12:28
  • 12
    @cat according to the CLOC documentation it can read in a language definition file, so there is a way to get it to recognize code in languages it hasn't defined. Plus it's open source, so you can always extend it to make it better! – Centimane Jun 15 '16 at 18:28
44

You should probably use SLOCCount or cloc for this, they're designed specifically for counting lines of source code in a project, regardless of directory structure etc.; either

sloccount .

or

cloc .

will produce a report on all the source code starting from the current directory.

If you want to use find and wc, GNU wc has a nice --files0-from option:

find . -name '*.[ch]' -print0 | wc --files0-from=- -l

(Thanks to SnakeDoc for the cloc suggestion!)

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • +1 for sloccount. Interestingly, running `sloccount /tmp/stackexchange` (created again on May 17 after my most recent reboot) says that the estimated cost to develop the sh, perl, awk, etc files it found is $11,029. and that doesn't include the one-liners that never made it into a script file. – cas Jun 08 '16 at 11:50
  • 12
    Estimating cost based on lines of code? What about all the people employed to re-factor spaghetti into something maintainable? – OrangeDog Jun 08 '16 at 16:08
  • @OrangeDog you could always try to account for that in the overhead; see the [documentation](http://www.dwheeler.com/sloccount/sloccount.html#using-basics) for an explanation of the calculation (with very old salary data) and the parameters you can tweak. – Stephen Kitt Jun 08 '16 at 16:12
  • @StephenKitt> still, the main issue is it's counting backwards. When cleaning up code, you often end up with less lines. Sure you could try to handwave an overhead to incur on the rest of the code to account for the removed one, but I don't see how it's better than just guessing the whole price in the first place. – spectras Jun 09 '16 at 06:51
15

As the wc command can take multiple arguments, you can just pass all the filenames to wc using the + argument of the -exec action of GNU find:

find . -type f -name '*.[ch]' -exec wc -l {} +

Alternately, in bash, using the shell option globstar to traverse the directories recursively:

shopt -s globstar
wc -l **/*.[ch]

Other shells traverse recursively by default (e.g. zsh) or have similar option like globstar, well, at least most ones.

heemayl
  • 54,820
  • 8
  • 124
  • 141
5

If you are in an environment where you don't have access to cloc etc I'd suggest

find -name '*.[ch]' -type f -exec cat '{}' + | grep -c '[^[:space:]]'

Run-through: find searches recursively for all the regular files whose name ends in either .c or .h and runs cat on them. The output is piped through grep to count all the non-blank lines (the ones that contain at least one non-spacing character).

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Kotte
  • 2,467
  • 22
  • 26
4

You can use find together with xargs and wc:

find . -type f -name '*.h' -o -name '*.c' | xargs wc -l
Vombat
  • 12,654
  • 13
  • 44
  • 58
  • 2
    (that assumes file paths don't contain blanks, newlines, single quote, double quote of backslash characters though. It may also output several `total` lines if several `wc`s are being invoked.) – Stéphane Chazelas Jun 09 '16 at 09:16
  • Perhaps the several `wc` commands problem can be addressed by piping `find` to `while read FILENAME; do . . .done` structure. And inside the while loop use `wc -l`. The rest is summing up the total lines into a variable and displaying it. – Sergiy Kolodyazhnyy Jun 09 '16 at 11:14
4

As has been pointed out in the comments, cat file | wc -l is not equivalent to wc -l file because the former prints only a number whereas the latter prints a number and the filename. Likewise cat * | wc -l will print just a number, whereas wc -l * will print a line of information for each file.

In the spirit of simplicity, let's revisit the question actually asked:

if I want to count the lines of code, the trivial thing is

cat *.c *.h | wc -l

But what if I have several subdirectories?

Firstly, you can simplify even your trivial command to:

cat *.[ch] | wc -l

And finally, the many-subdirectory equivalent is:

find . -name '*.[ch]' -exec cat {} + | wc -l

This could perhaps be improved in many ways, such as restricting the matched files to regular files only (not directories) by adding -type f—but the given find command is the exact recursive equivalent of cat *.[ch].

Wildcard
  • 35,316
  • 26
  • 130
  • 258
3

Sample using awk:

find . -name '*.[ch]' -exec wc -l {} \; |
  awk '{SUM+=$1}; END { print "Total number of lines: " SUM }'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Lambert
  • 12,495
  • 2
  • 26
  • 35
  • Use `+` in place of `\;`. – Jonathan Leffler Jun 09 '16 at 13:39
  • @JonathanLeffler Why? – Hastur Jun 09 '16 at 17:56
  • 1
    @Hastur: It runs `wc -l` for groups of files, rather like `xargs` does, but it handles odd-ball characters (like spaces) in file names without needing either `xargs` or the (non-standard) `-print0` and `-0` options to `find` and `xargs` respectively. It's a minor optimization. The downside would be that each invocation of `wc` would output a total line count at the end when given multiple files — the `awk` script would have deal with that. So, it's not a slam-dunk, but very often, using `+` in place of `\;` with `find` is a good idea. – Jonathan Leffler Jun 09 '16 at 18:01
  • @JonathanLeffler Thank you. I agree. My concerns, however, were about the length of the parameter string passed to `wc`. If unknown _a priori_ the number of files that will be _found_, is there the risk to pass that limit or somehow is it handled by find? – Hastur Jun 09 '16 at 19:45
  • 2
    @Hastur: `find` groups the files into convenient size bundles, which won't exceed the length limit for the argument list on the platform, allowing for the environment (which comes out of the argument list length — so the length of the argument list plus the length of the environment has to be less than a maximum value). IOW, `find` does the job right, like `xargs` does the job right. – Jonathan Leffler Jun 09 '16 at 19:48
1

easy command:

find . -name '*.[ch]' | xargs wc -l
malyy
  • 2,107
  • 1
  • 10
  • 9
  • 2
    (that assumes file paths don't contain blanks, newlines, single quote, double quote of backslash characters though. It may also output several `total` lines if several `wc`s are being invoked.) – Stéphane Chazelas Jun 09 '16 at 09:16
0

If you're on Linux I recommend my own tool, polyglot. It's dramatically faster than cloc and more featureful than sloccount.

You should be able to build on BSD as well, though there aren't any provided binaries.

You can invoke it with

poly .
0

The new bid on the cloc is Loci.
Link to NPM package It counts code similarly to cloc, but is faster at scale. Also, as its natively written in nodejs it will run on all other environments without Perl (for cloc.pl) or cloc.exe.

It is in its infancy, but you can install it as an NPM CLI tool, or import it as a library into your own project.

Great for environments where you can install script-based npms, but are not allowed to use unapproved binaries

0b1
  • 1
-2

find . -name \*.[ch] -print | xargs -n 1 wc -l should do the trick. There are several possible variations on that as well, such as using -exec instead of piping the output to wc.

John
  • 16,759
  • 1
  • 34
  • 43
  • 4
    But `find . -name \*.[ch] -print` doesn't print the contents of the files, only the file names. So I count the number of files instead don't I? Do I need `xargs' ? – Niklas Rosencrantz Jun 08 '16 at 11:35
  • @Programmer400 yes, you'd need `xargs`, and you'd also need to watch for multiple `wc` invocations if you have lots of files; you'd need to look for all the `total` lines and sum them. – Stephen Kitt Jun 08 '16 at 11:49
  • If you just want the total line count, you'd need to do `find . -name \*.[ch] -print0 | xargs -0 cat | wc -l` – fluffy Jun 08 '16 at 22:28
  • Note that this (`find . -name \*.[ch] -print | wc -l`) counts the number of files (unless a file name contains a newline — but that's very unusual) — it does not count the number of lines in the files. – Jonathan Leffler Jun 09 '16 at 18:03