tshark dump only when string is matched

Question

I'm working on a web capture script where I only want to dump the traffic if the HTML body or URL contains a string defined in a .txt file.

Here is my tshark command which dumps src ip dest url (HTTP host) which works like a charm.

tshark -i eth1 'port 80' -R 'http.request' -T fields \
   -e frame.number -e frame.time -e ip.src  -e http.host \
   -e http.request.uri -E header=y -E separator=, \
   -E quote=d -E occurrence=f

Now here is where im stuck. How do I go about only triggering this dump when a user browses to lets say hidemyass or the HTML body contains key words like porn?

@Warren This is done on a box that's on a span port, so I'm not able to do any HTTPS traffic inspection. I don't care if users are using proxy services or trying to encrypt their traffic to evade being logged, because we already have application firewalls and IPS implemented. If I wanted to spend $100,000 on a webgateway I would do that, but that's beside the point. In this case I only want to log HTTP traffic. — jemik, Mar 06 '13 at 09:31
I'm not going for an custom Ids or fw, as I wrote we have systems in place for that . And hidemyass was just an example key word. I don't agree with you that tshark is the wrong tool. And again you can't inspect ssl encrypted traffic on a mirrored port. — jemik, Mar 06 '13 at 20:36

cpugeniusmv · Answer 1 · 2013-03-13T01:38:10.733

You could use ngrep.

It supports both pcap-filters and regex matching of packets. Example:

ngrep -tqW byline 'somethingbad|banana' port 80

will find packets containing request or response bodies containing 'somethingbad' or 'banana' over port 80.

Caveats:

If your keyword is split across multiple packets, it won't match.
Only packets matching the keyword will be captured. So if you want an entire request/response body that spans multiple packets related to the transaction, it gets more complicated.

score 0 · Answer 2 · answered Mar 11 '13 at 13:32

Short answer: You can't.

Long answer: Wireshark works on layers 1-6 (best on layers 1-3). That http-content-information is on layer 7.

So if your want to got into depth here this is what you could do (along these lines):

Constantly watch tcp/http-traffic with tpcdump for packets smaller than 900 bytes (typical lenght of an initial http-request) . If you encounter "interesting" URLs, trigger a full dump of the connection in question.

You could do this with a constant full dump, too, but in this case your sniffer-server will most propably get into performance problems.

Any way you need a second process that filters/triggers on tcpdump-output.

An alternative could be to script the wireshark-gui or use it to capture the packets in question.

tshark dump only when string is matched

2 Answers2