I need to select specific data's from log files. I need two scripts:
- I need to select all IP addresses that only visited
/page1 - I need to select all IP addresses that visited
/page1but never visited/page2
I have my desired logs in a .tar file. I want them extracted into a folder, and then I will use the script to parse them and delete them. ALL duplicated IP addresses.
This is what I have so far:
# filter /page1 visitors
cat access.log | grep "/page1" > /tmp/res.txt
# take the IP portion of record
cat res.txt | grep '^[[:alnum:]]*\.[[:alnum:]]*\.[[:alnum:]]*\.[[:alnum:]]*' -o > result.txt
Typical access log looks like
162.158.86.83 - - [22/May/2016:06:31:18 -0400] "GET /page1?vtid=nb3 HTTP/1.1" 301 128 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"