3

I'm working on a web capture script where I only want to dump the traffic if the HTML body or URL contains a string defined in a .txt file.

Here is my tshark command which dumps src ip dest url (HTTP host) which works like a charm.

tshark -i eth1 'port 80' -R 'http.request' -T fields \
   -e frame.number -e frame.time -e ip.src  -e http.host \
   -e http.request.uri -E header=y -E separator=, \
   -E quote=d -E occurrence=f 

Now here is where im stuck. How do I go about only triggering this dump when a user browses to lets say hidemyass or the HTML body contains key words like porn?

schaiba
  • 7,493
  • 1
  • 33
  • 31
jemik
  • 31
  • 2
  • @Warren This is done on a box that's on a span port, so I'm not able to do any HTTPS traffic inspection. I don't care if users are using proxy services or trying to encrypt their traffic to evade being logged, because we already have application firewalls and IPS implemented. If I wanted to spend $100,000 on a webgateway I would do that, but that's beside the point. In this case I only want to log HTTP traffic. – jemik Mar 06 '13 at 09:31
  • I'm not going for an custom Ids or fw, as I wrote we have systems in place for that . And hidemyass was just an example key word. I don't agree with you that tshark is the wrong tool. And again you can't inspect ssl encrypted traffic on a mirrored port. – jemik Mar 06 '13 at 20:36

2 Answers2

3

You could use ngrep.

It supports both pcap-filters and regex matching of packets. Example:

ngrep -tqW byline 'somethingbad|banana' port 80

will find packets containing request or response bodies containing 'somethingbad' or 'banana' over port 80.

Caveats:

  • If your keyword is split across multiple packets, it won't match.
  • Only packets matching the keyword will be captured. So if you want an entire request/response body that spans multiple packets related to the transaction, it gets more complicated.
cpugeniusmv
  • 2,627
  • 17
  • 25
0

Short answer: You can't.

Long answer: Wireshark works on layers 1-6 (best on layers 1-3). That http-content-information is on layer 7.

So if your want to got into depth here this is what you could do (along these lines):

Constantly watch tcp/http-traffic with tpcdump for packets smaller than 900 bytes (typical lenght of an initial http-request) . If you encounter "interesting" URLs, trigger a full dump of the connection in question.

You could do this with a constant full dump, too, but in this case your sniffer-server will most propably get into performance problems.

Any way you need a second process that filters/triggers on tcpdump-output.

An alternative could be to script the wireshark-gui or use it to capture the packets in question.

Nils
  • 18,202
  • 11
  • 46
  • 82