2

Recently, I am into security logs and want to make it better way on bash-shell. I found out in awk back-references are only stored by 9. But I need to use 10 back-references.

Tried

awk '{print gensub(/^([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}).+?\sID\s(\[[0-9]{4}\]).+?\sTargetUserName\s=\s(.+?)\sTargetDomainName\s=\s(.+?)\sTargetLogonId\s=\s(.+?)\sLogonType\s=\s([0-9]{1,2})\s(.+?\sWorkstationName\s=\s(.+?)\sLogonGuid\s=\s.+?TransmittedServices\s=\s.+?\sLmPackageName\s=\s.+?KeyLength\s=\s.+?\sProcessId\s=\s.+?\sProcessName\s=\s.+?\sIpAddress\s=\s(.+?)\sIpPort\s\=\s([0-9]{1,}))?.+?$/,"\\5,\\4,\\3,\\2\\6,\\1,\\8,\\9,","g") }'

Target strings (actually there are thousands of strings)

2017-03-21T02:00:00 kornawesome Security/Microsoft-Windows-Security-Auditing ID [4624] :EventData/Data -> SubjectUserSid = S-1-5-18 SubjectUserName = PRETENDERS$ SubjectDomainName = WORKGROUP SubjectLogonId = 0x00000000000004j7 TargetUserSid = X-12-54-181 TargetUserName = SYSTEMS TargetDomainName = NT AUTHORITY TargetLogonId = 0x00000000000003e7 LogonType = 8 LogonProcessName = Lxxoi   AuthenticationPackageName = Negotiate WorkstationName = - LogonGuid = {00344000-0000-0000-0000-0000000003440} TransmittedServices = - LmPackageName = Stainless KeyLength = 0 ProcessId = 0x0000000000000244 ProcessName = C:/Windows/System32/services.exe IpAddress = 10.0.0.0 IpPort = 10.5.3.2 ImpersonationLevel = %%1122

If there is another way to perform with awk and also, I would like to use basic bash and associative array. Please, also give me kind explanation...for me (noob).

KeiTheNoop
  • 39
  • 2
  • you could do it in two (or more) gensub operations, each one allowing for another 9. or use perl. you don't want to do this in bash, bash is not a good language for processing text. – cas Oct 21 '19 at 10:04
  • BTW, given that the bulk of the log entry is key/value pairs, you're probably better off using gensub for the first part of the log line, and then `split()`-ing the remainder of the line into an array on `/ = /` (space, equals, space), and then processing the array. – cas Oct 21 '19 at 10:07

2 Answers2

3

A problem with security logs is that some of the text is probably under user control, so using regular expressions to break things apart is problematic. However you can potentially use more than one expression to break things apart, and this can work around the limit of 9 backreferences. For example if all your log entries start with a timestamp you can peel that off.

awk '{t=$1 ;$1=""; 
print gensub(/^.+?\sID\s(\[[0-9]{4}\]).+?\sTargetUserName\s=\s(.+?)\sTargetDomainName\s=\s(.+?)\sTargetLogonId\s=\s(.+?)\sLogonType\s=\s([0-9]{1,2})\s(.+?\sWorkstationName\s=\s(.+?)\sLogonGuid\s=\s.+?TransmittedServices\s=\s.+?\sLmPackageName\s=\s.+?KeyLength\s=\s.+?\sProcessId\s=\s.+?\sProcessName\s=\s.+?\sIpAddress\s=\s(.+?)\sIpPort\s\=\s([0-9]{1,}))?.+?$/,"\\4,\\3,\\2,\\1\\5" t ",\\7,\\8,","g") }'

You can be selective, so you have WorkstationName\s=\s(.+?)\sLogonGuid as part of you pattern, you could use

awk {t=$1; $1="" ; printf("%s", gensub(/^.+?WorkstationName\s=\s(.+?)\sLogonGuid.*$/,"\\1,")); printf("%s,", t)}

to pull out a field, and this can be repeated.

@cas notes in the comments that the data can be viewed in 2 parts, the stuff before the EventData/Data -> and the stuff after it, and that the stuff after it can be split on = (space equal space). I would go further and view it as key/value pairs and split on /\s\S+\s=\s/ and use optional 4th argument to split to get the keys. There are a couple of big assumptions in this, that the user doesn't get to put an equals sign into the line and that each piece of data has a single word key. Note the index of the keys and values differ by 1, and that the initial part of the line ends up in v[1].

/usr/bin/awk '{
    n=split($0,v,/\s\S+\s=\s/,k)
    printf("There are %d fields\n",n)
    for(i=0;i<n;i++) { printf("%d key \"%s\" value \"%s\"\n",i,k[i],v[i+1]) }
}'

with your sample data gives

There are 22 fields
0 key "" value "2017-03-21T02:00:00 kornawesome Security/Microsoft-Windows-Security-Auditing ID [4624] :EventData/Data ->"
1 key " SubjectUserSid = " value "S-1-5-18"
2 key " SubjectUserName = " value "PRETENDERS$"
3 key " SubjectDomainName = " value "WORKGROUP"
4 key " SubjectLogonId = " value "0x00000000000004j7"
5 key " TargetUserSid = " value "X-12-54-181"
6 key " TargetUserName = " value "SYSTEMS"
7 key " TargetDomainName = " value "NT AUTHORITY"
8 key " TargetLogonId = " value "0x00000000000003e7"
9 key " LogonType = " value "8"
10 key " LogonProcessName = " value "Lxxoi  "
11 key " AuthenticationPackageName = " value "Negotiate"
12 key " WorkstationName = " value "-"
13 key " LogonGuid = " value "{00344000-0000-0000-0000-0000000003440}"
14 key " TransmittedServices = " value "-"
15 key " LmPackageName = " value "Stainless"
16 key " KeyLength = " value "0"
17 key " ProcessId = " value "0x0000000000000244"
18 key " ProcessName = " value "C:/Windows/System32/services.exe"
19 key " IpAddress = " value "10.0.0.0"
20 key " IpPort = " value "10.5.3.2"
21 key " ImpersonationLevel = " value "%%1122"

From here you can go further, create an associative array called say data

for(i=1;i<n;i++) {gsub(/[ =]/,"",k[i]);data[k[i]]=v[i+1]}

and then you can print out things like data["IpPort"] rather than worrying if this is field 20 or 21.

icarus
  • 17,420
  • 1
  • 37
  • 54
  • +1. I noticed that `TargetDomainName = NT AUTHORITY` was going to be a slight problem because of the embedded space, so (after splitting the input line into date,id, and eventdata variables) used gensub to add a tab before each key in eventdata and then split it by tabs. then i got distracted by dinner and never finished it. this kind of script is much easier in perl, anyway. – cas Oct 22 '19 at 00:16
  • @cas The general problem is still what data is under the `attacker's` control.I have the assumption that the split on /\s\S+\s=\s/ is going to be valid, but can the WorkstationName have embedded spaces and equal signs? If this is logging information coming from the network, how much of the data is being sanitized before it is being logged? Anyway I am not unhappy with the split and the line creating the associative array, giving a lot ofresults for 2 lines of awk. Agreed that perl would probably be a better match for this problem. – icarus Oct 22 '19 at 04:56
  • 1
    There's only so much you can do to fix bad data, "garbage in, garbage out". The **real** fix for problems like this is for the program that generates the log entries to do so in a consistent, unambiguous, and easily parsed format. Quotes around both keys and values, for example. Or use json or similar. – cas Oct 22 '19 at 05:14
1

icarus has already answered this in awk, so here's how to extract the date and ID into variables, and the event data into a hash (associative array) using perl:

#!/usr/bin/perl -l

use strict;

while(<>) {
  if (m/^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}).+?\sID\s(\[\d{4}\]).*?Data -> (.*)$/) {
    my ($date,$id,$eventdata) = ($1,$2,$3);

    print $date;
    print $id;

    # decorate the key names with a tab (i.e. add a tab before each) 
    $eventdata =~ s/([^[:blank:]]+) *= */\t$1=/g;
    # remove tab from beginning of $eventdata
    $eventdata =~ s/^\t//;       #/

    # split $eventdata on tabs, and split again into key=value pairs
    # and store in %data hash.    
    my %data = map { my($k,$v) = split("=",$_,2); $k => $v } split(/ *\t/,$eventdata);

    foreach my $key (sort keys %data) { printf "%s=%s\n", $key, $data{$key} };
  };
};

(The #/ comment is only there to fix U&L's broken perl syntax highlighting)

Note that the ,2 at the end of the split("=",$_,2) operation splits each key=value pair into a maximum of two fields: everything up to the first = symbol, and everything after. This means that it doesn't matter if the value contains an = symbol. Things like this are much easier to do in perl than in awk. Working with regexes and capture groups is also easier, as shown in the first two lines at the beginning of the while(<>) loop.

Save it as, e.g. kei.pl, make it executable with chmod +x kei.pl and run it like this:

$ ./kei.pl input 
2017-03-21T02:00:00
[4624]
AuthenticationPackageName=Negotiate
ImpersonationLevel=%%1122
IpAddress=10.0.0.0
IpPort=10.5.3.2
KeyLength=0
LmPackageName=Stainless
LogonGuid={00344000-0000-0000-0000-0000000003440}
LogonProcessName=Lxxoi
LogonType=8
ProcessId=0x0000000000000244
ProcessName=C:/Windows/System32/services.exe
SubjectDomainName=WORKGROUP
SubjectLogonId=0x00000000000004j7
SubjectUserName=PRETENDERS$
SubjectUserSid=S-1-5-18
TargetDomainName=NT AUTHORITY
TargetLogonId=0x00000000000003e7
TargetUserName=SYSTEMS
TargetUserSid=X-12-54-181
TransmittedServices=-
WorkstationName=-

BTW, if you want the date and id in the hash too, add the following after the %data = map ... line (and delete the print $date; & print $id; lines:

$data{'DATE'} = $date;
$data{'ID'} = $id;
cas
  • 1
  • 7
  • 119
  • 185