1

I am trying to use lookaround element in PCRE2 regex for Wazuh tool, i need to match strings which are in double quotes and made the below regex however it looks its picking up "<" character and not closing the regex element.

<regex type="pcre2">(?<=").*?(?=")</regex>

ERROR: (1226): Error reading XML file 'etc/decoders/local_decoder.xml': XMLERR: Element '=").?(?=")</regex' not closed. (line 33).*

I have tried to escape the < in (?<=") however it doesn't seem to work. Any idea how to escape this in order to parse element properly

Atul
  • 1,851
  • 4
  • 27
  • 38

2 Answers2

1

The PCRE2 regular expression syntax allows you to write assertions in different equivalent ways:

  • (?<= is the same as either of these:

    • (*plb:
    • (*positive_lookbehind:
  • (?= is the same as either of these:

    • (*pla:
    • (*positive_lookahead:

(This is from the PRCE2 library's pcre2pattern manual.)

This means that you should be able to rewrite your regular expression without using characters that are special in XML (< in this case) as either the more expressive

<regex type="pcre2">(*positive_lookbehind:").*?(*positive_lookahead:")</regex>

or the terser

<regex type="pcre2">(*plb:").*?(*pla:")</regex>
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
0

< is an XML metacharacter and has to be encdoded as &lt; or similar.

Also, > is &gt; and a literal & is &amp;

However, it appears that the tool you are using has a bug in this area, so you will have to figure out a workaround; see https://github.com/wazuh/wazuh/issues/14261

tripleee
  • 7,506
  • 2
  • 32
  • 42
  • You mean (?<=").*?(?=") ? – Atul Dec 31 '22 at 07:41
  • Yup, that's how you encode it. I'm not familiar with the tool you are using, though; maybe it has additional quirks or different internal conventions. – tripleee Dec 31 '22 at 07:42
  • The original error seems to have gone now and i have a new one `ERROR: (1452): Syntax error on regex: '(?<=").*?(?=")'` – Atul Dec 31 '22 at 07:44
  • Sounds like a bug in the tool then. Are you forced to use XML or are there other options? – tripleee Dec 31 '22 at 07:47
  • I am forced to use XML unfortunately, the requirement is to capture all data between set of double quotes . – Atul Dec 31 '22 at 07:47
  • i found something which replaced the regex pattern. Thank you for helping out ! https://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks – Atul Dec 31 '22 at 07:49
  • @Atul It's unclear how the StackOverflow Q/A is helpful as they are trying to match a double-quoted string _including the quotes_, while this question is using look-ahead and look-behind assertions to avoid matching the actual double quotes. – Kusalananda Dec 31 '22 at 09:07
  • If the tool supports grouping, matching on `"([^"]*)"` and extracting group 1 is basically equivalent to the OP's attempt. – tripleee Dec 31 '22 at 11:31