1

I would like to find all declarations of functions that end with "_DB" in a file, and avoid perl and pipes.

For example:

prep_DB();

init_DB(DB *database, char *params[])
{
  open_DB(database);
}

prep_DB() {
  open_DB(database); // open
}

FILE * load_DB(const char * exppath, const char * expfname)
{}

should only match the second line, and the next-to-last line. The line prep_DB() { can be present or absent.

Currently, the following command finds all function invocations:

grep -E '.*_DB(.*)' file

However, I am having trouble negating the semicolon at the end. The closest thread that came to explaining how it works is this; however, it seems that semicolon is a special character, because the advice there is not working. How can I get around this limitation?

Alex
  • 1,099
  • 2
  • 9
  • 25
  • `grep -E '^.*_DB\(.*{$'` should work. – DopeGhoti Aug 23 '17 at 19:53
  • @pfnuesel Yes, and also with `[^\;]` at the end. It is matching all invocations of it. – Alex Aug 23 '17 at 19:53
  • 1
    @pfnuesel - that won't work without anchoring it... i.e. `[^;]$` – don_crissti Aug 23 '17 at 19:54
  • Try without `-E`. – pfnuesel Aug 23 '17 at 19:55
  • @don_crissti Why? What if something comes after the `;`? – pfnuesel Aug 23 '17 at 19:55
  • @don_crissti That works; can you post how `grep` knows to ignore the ` {` at the end, as an answer? – Alex Aug 23 '17 at 19:56
  • @DopeGhoti But I won't always have the '{' on the same line as the function invocation... Note that don_crissti answered this already. – Alex Aug 23 '17 at 19:57
  • Try without the `-E` and you won't need the end of line character `$`. Which will not work for e.g. `f1_DB(); f2_DB();`. – pfnuesel Aug 23 '17 at 19:59
  • 1
    Declaraction, or invocation? Either way, look into standardizing your code conventions. If what you are looking for cannot be described in a regular fashion, a regular expression to try to capture it will be increasingly complex. – DopeGhoti Aug 23 '17 at 19:59
  • @pfnuesel I tried without the '-E' and without the '$', and it works for this particular expression; however, for the line `FILE * load_prep(const char * exppath, const char * expfname)`, running `grep '.*load_prep(.*)[^;]'` doesn't catch it; don_crissti's method does. – Alex Aug 23 '17 at 20:05
  • @pfnuesel - OP clearly says _"negating the semicolon at the end"_ so there's nothing after the `;` ... if there was something after the `;` then the regex would still need anchoring though it would be slightly different e.g. `grep -E '.*_DB([^;]*)$'` – don_crissti Aug 23 '17 at 20:07
  • @don_crissti That's the thing-- your regex actually works, regardless of whether there is something present after it or not: `grep -E '.*_DB(.*)[^;]$' test.sh` `init_DB(DB *database, char *params[])` `prep_DB() {` Can you post an answer explaining why this is so? If not, please say so, so others would be more willing to post their version of it. – Alex Aug 23 '17 at 20:12
  • @pfnuesel I updated the original post, to include the counter-example to your approach. If you can make it work without the '-E', then you can post it as an answer. Looks like noone's posted one yet... – Alex Aug 23 '17 at 20:21
  • @DopeGhoti In this case, I am searching someone else's code for methods-- standardizing their code conventions would take too long, and I wouldn't be allowed to push it anyways. – Alex Aug 23 '17 at 20:23
  • @don_crissti I posted your comment as an answer (just as a FYI for future readers); however, it is not quite correct. It returns results where the semicolon is followed by other characters; see my post. – Alex Aug 23 '17 at 20:30
  • @don_crissti Oops-- thanks. I will take a closer look now. – Alex Aug 23 '17 at 20:35

2 Answers2

1

Assuming this is a C source file called file.c.

Using ctags:

$ ctags file.c

This creates a file called tags:

$ cat tags
init_DB file.c  /^init_DB(DB *database, char *params[])$/
load_DB file.c  /^FILE * load_DB(const char * exppath, const char * /
prep_DB file.c  /^prep_DB() {$/

This may be used with vi or vim to automatically jump to the function definitions.

You may also parse this file with cut and grep:

$ cut -f 1 tags | grep '_DB$'
init_DB
load_DB
prep_DB

On Ubuntu systems, installing ctags will actually install exuberant-ctags which provides a more verbose tags output:

$ cat tags
!_TAG_FILE_FORMAT       2       /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED       1       /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_PROGRAM_AUTHOR    Darren Hiebert  /[email protected]/
!_TAG_PROGRAM_NAME      Exuberant Ctags //
!_TAG_PROGRAM_URL       http://ctags.sourceforge.net    /official site/
!_TAG_PROGRAM_VERSION   5.9~svn20110310 //
init_DB file.c  /^init_DB(DB *database, char *params[])$/;"     f
load_DB file.c  /^FILE * load_DB(const char * exppath, const char * expfname)$/;"       f
prep_DB file.c  /^prep_DB() {$/;"       f

Here we can be sure to only get function definitions with

$ awk '$NF == "f" && $1 ~ /_DB$/ { print $1 }' tags
init_DB
load_DB
prep_DB

The point here is that you're better off using a dedicated C language parser than trying to account for all possible programming styles in an awk script or a regular expression with grep that parses C code.

You can also do

$ ctags -x file.c
init_DB          function      3 file.c           init_DB(DB *database, char *params[])
load_DB          function     12 file.c           FILE * load_DB(const char * exppath, const char * expfname)
prep_DB          function      8 file.c           prep_DB() {

and then parse/filter that in whatever way you need. The number is the line number of the definition. It all comes down to what you mean by "want to find".

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
0

I decided to post the summary of the 20+-comment discussion above.

One solution was provided by @don_crissti, who suggested the command:

$ grep -E '.*_DB([^;]*)$' <filename>

which produces the output

init_DB(DB *database, char *params[])
prep_DB() {
FILE * load_DB(const char * exppath, const char * expfname)

Based on his suggestion, I came up with the following command to exclude the middle result, if needed:

$ grep -E '_DB\(.*\)$' <filename>
slm
  • 363,520
  • 117
  • 767
  • 871
Alex
  • 1,099
  • 2
  • 9
  • 25
  • 1
    don_crissti's `grep -E` should actually be a mere `grep`: the parentheses in the regex are supposed to match those of the function definitions, they are not supposed to be regex operators. That's why you escaped them in your command (which could be written `grep '_DB(.*)$'`). – xhienne Aug 23 '17 at 21:49
  • @don_crissti's solution actually uses them with the '-E' as they are indended. It was because I didn't realize that 'egrep' required them to be escaped, that I asked this question in the first place. – Alex Aug 23 '17 at 22:43
  • Well, there is no reason to use grouping parentheses here, and doing so would also match lines other than function definitions, like `struct struct_DB {`, that's why `-E` should be omiited. – xhienne Aug 23 '17 at 23:11
  • @xhienne is right; I was quite busy at the time of writing those comments and didn't even bother with that part of the regex to be honest... the main problem was the anchoring as `.*[^;]` would match lines with and without a trailing semicolon. – don_crissti Aug 24 '17 at 11:16