1

Let's say I send an email, containing a link to my website, to someone that I really hope he'll visit it (fingers-crossed style):

http://www.example.com/?utm_source=email392

or

http://www.example.com/somefile.pdf?utm_source=email392

How to make Linux trigger an action (such as sending an automated email to myself) when this URL is visited, by regularly examining /var/log/apache2/other_vhosts_access.log?

I can't do it at PHP level because I need to do it for various sources/websites (some of them use PHP, some don't and are just link to files to be downloaded, etc.; even for the websites using PHP, I don't want to modify every index.php to do it from there, that's why I prefer an Apache log parsing method)

Basj
  • 2,351
  • 9
  • 37
  • 70
  • 1
    Notice that the URL might be visited by something else (e.g. Google bots) – Basile Starynkevitch Sep 14 '17 at 15:17
  • Yes @BasileStarynkevitch, I checked my logs and bots do visit my website, that's right, but they never do with the precise pattern `/?utm_source=onlycommunicated_tooneperson_viaemail` – Basj Sep 15 '17 at 11:57

6 Answers6

5

Live log monitoring using bash process substitution:

#!/bin/bash

while IFS='$\n' read -r line;
do
    # action here, log line in $line

done < <(tail -n 0 -f /var/log/apache2/other_vhosts_access.log | \
         grep '/somefile.pdf?utm_source=email392')

Process substitution feeds the read loop with the output from the pipeline inside <(...). The log line itself is assigned to variable $line.

Logs are watched using tail -f, which outputs lines as they are written to the logs. If your log files are moved periodically by logrotate, add --follow=name and --retry options to watch the file path instead of just the file descriptor.

Output from tail is piped to grep, which filters the relevant lines matching your URLs.

sebasth
  • 14,332
  • 4
  • 50
  • 68
  • Much better than my own answer. Upvoted. – Zé Loff Sep 14 '17 at 15:18
  • Thanks! For future reference: `tail -n 0` makes a clean / empty start of tail (no default 10 lines). – Basj Sep 15 '17 at 12:10
  • @sebasth can you post an example with `follow=name` and `retry`? Should I replace `name` by the actual path? – Basj Sep 15 '17 at 12:22
  • @sebasth can I use `tail -F -n 0` instead of `-f --follow=name --retry`? This [answer](https://unix.stackexchange.com/questions/22698/how-to-do-a-tail-f-of-log-rotated-files) seems to suggest it. – Basj Sep 15 '17 at 13:02
  • `tail -n 0 -f --follow=name --retry /var/log/apache2/other_vhosts_access.log | ...` – sebasth Sep 15 '17 at 17:45
1

You can take a one liner like this:

grep -q "utm_source=email392" /var/log/apache2/other_vhosts_access.log && grep -q "utm_source=email392" /var/log/apache2/other_vhosts_access.log | mail -S "Accessed!" [email protected]

and run it periodically via cron.

Explaining it in more detail: the first grep is used only to check if further action is needed (adding -q makes it quiet, hiding any matches it might find). && means that the rest of the line will only run if the first grep finds a match (i.e. returns 0). If that is the case, the matching line(s) obtained by the second grep are piped into mail to be sent to [email protected], on an email with the subject specified by the -s argument ("Acessed!").

The same logic (grep -q ... && ...) can be used to perform any other actions. You can run whatever you want after &&, e.g. a shell script for more complex stuff.

Note that if you run this at a higher frequency than the log's rotation -- e.g. checking hourly but rotating the logs daily -- the action might be triggered multiple times, since grep will keep finding the same line(s) over and over again until the log rotates.

Zé Loff
  • 1,627
  • 8
  • 20
  • Thanks for your answer, but this is a problem indeed: "if you run this at a higher frequency than the log's rotation -- e.g. checking hourly but rotating the logs daily -- the action might be triggered multiple times", because I only rotate logs weekly. – Basj Sep 15 '17 at 11:52
1

While I wrote my solution I've found that the first answer is similar to mine. I would recommend to avoid crontab too in this case. I'll post my code .

#!/bin/bash
file="$1"
pattern="$2"

tail -f -n0 $file | {
   while IFS= read -r line
   do
      if [ ! -z $(echo $line | grep "$pattern") ] ; then
         echo "visited $pattern" | mail [email protected]
      fi
   done
}

In addition you can run it on the backround with the & operator:

./checklog.sh /var/log/apache2/other_vhosts_access.log "somefile.pdf?utm_source=email392" &

or start it as a 'daemon' when the system boots up

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
Wax
  • 61
  • 2
  • NIce! Will this run forever? (I would like this: run forever, until I kill it). Also will it reopen the file automatically when the logs rotate? – Basj Sep 15 '17 at 11:59
  • It will run until you kill it or the system goes down (in almost any case) :-) As mentioned in the first answer by sebasth, you can use the `--follow` and `--retry` options when the log rotates. I'm glad I could help. – Wax Sep 15 '17 at 12:12
  • Why did you put `while IFS=(nothinghere) read -r line`? – Basj Sep 15 '17 at 12:38
  • 1
    The value of the `IFS` (internal field separator) variable holds the character(s) used to split the string read. Therefore `read` will assign the whole line to `line` when `IFS` is empty. – Wax Sep 15 '17 at 13:11
  • Thanks again. Now a [linked issue](https://unix.stackexchange.com/questions/392457/too-many-emails-sent-when-sending-emails-when-specific-url-has-been-visited-in-a) just in case you have an idea ;) – Basj Sep 15 '17 at 14:25
1

Try fail2ban with filter apache-badbots.conf, (replace the regex with your url) and as action sendmail.conf

[mycustombot] enable = true filter = apache-badbots ##your "custom" script action = sendmail[name=MyBadBot,[email protected]] logpath = /your/access/logs/*/path

BrenoZan
  • 345
  • 1
  • 5
0

Posting what I'm finally using, for future reference (ok I know one liners are sometimes bad, but...):

tail -F -n0 /var/log/apache2/other_vhosts_access.log | grep --line-buffered "?src=_" | { while IFS= read -r line; do echo "$line" | mail [email protected]; done } &

Notes:

  • I have to use grep --line-buffered because of this.

  • tail -F seems to replace --follow=name --retry, as mentioned here.

(Of course credit goes to sebasth and Wax.)

Basj
  • 2,351
  • 9
  • 37
  • 70
0

you can do that using rsyslog and the ommail module

http://www.rsyslog.com/doc/v8-stable/configuration/modules/ommail.html

something like:

module(load="ommail")

if $msg contains "/somefile.pdf?utm_source=email392" then {
   action(type="ommail" server="..." port=".."
       mailfrom="...."
       mailto="..."
       subject.text="Page Viewed!")
}

this will work if apache is configured to log using syslog

Diego Roccia
  • 823
  • 5
  • 9