So you know what is a microservice, you also probably know what is an API, or more in particular what is a REST API. But.. do you also know what tcpdump is? If not, let me explain why do I think it is a great tool when working with APIs.
Sometimes when I think about what I use on my daily work, my favorite tools and software, what comes to my mind first is the IDE or the editor of choice for the task. Or if I think of something even more critical to me well Ubuntu, Firefox… things I can’t replace without crying a lot. Or maybe more specific tools like Postman or git. But I never think of small things, those that you have or install in almost every single machine you ssh in!
This post is for one of those small things that are always there when needed (or as far as an apt-get install command away) and that can save you hours of time easily. This post, as you have guessed, is for tcpdump.
They needed something to find out why ARPANET was collapsing but didn’t like etherfind, a tool written by Sun, which was problematic and slow. So they started working on a way to filter packets efficiently and this is how tcpdump was born.
After that they separated the packet capture and filtering logic into a library so other applications could benefit from it, and this is how libpcap was released.
Finally they also defined a file format (*.pcap) so that all these applications could store and share traffic captures easily.
Such a great engineering work! Almost 30 years have passed and we still use it and love it. Oh and tcpdump/libpcap is free software (BSD license) by the way, which means you can use it both in free and non-free software.
Have you ever ssh’ed into a machine to find out what’s going on with that microservice/API and thought: “it would be great to see the requests“. Or maybe on your own machine, you are using an API client and for some reason it’s not working as expected and you would like to see what the request that client is sending looks like… or the response to that request.
Well, that is just one command. Two if you have to install tcpdump. You can see both requests and responses, in real time, in your terminal. Isn’t it awesome???? You can filter them, store them to inspect them later with Wireshark… still not impressed? Oh come on, you can feel like Cypher watching what’s going on with all those hidden bytes hitting your API!
In a world embracing microservices, where JSON APIs are everywhere tcpdump is more useful than ever. I am not saying you should not use other methods to store and analyze traffic, cause there are really good ways to do that (I personally like Kibana and Logstash for example) but you don’t need that complexity to just see traffic on a machine.
There are visual tools too, like Fiddler and Charles, but you can’t use them on a remote machine when all you have is an ssh connection.
I see requests
Okay, I have to admit something before we start: tcpdump output isn’t beautiful. But that’s because TCP was not designed to look beautiful to humans. HTTP looks better, but unfortunately tcpdump as you can imagine works at a lower level. In fact TCP belongs to the Transport Layer while HTTP belongs to the Application Layer.
So don’t panic when you start to see TCP flags and keep in mind that one HTTP call is not just one packet on TCP. Here is an image, it shows how a few packets look like with tcpdump. It is just a GET request from the browser to http://www.tcpdump.org/tcpdump_man.html.
As you can see there are a few packets there for just a request and a response:
- First packet is from my computer to the server, and is a SYN packet as the Flag shows
- Second packet is from the server to my machine, and it is a SYN-ACK
- Third packet is an ACK from my machine. This completes the famous 3 way handshake and means the TCP connection has been established.
- The 4th packet is the actual request from my browser, requesting /tcpdump_man.html to the server
- Server responds with an ACK
- Server sends the response to my request. As you can see it says the page was not modified since my last request. If that wasn’t the case, you would see a huge body containing all the HTML just under the headers section.
Ok, now that we are no longer scared about tcpdump let’s see some examples. First of all, if you just run tcpdump you will see all the traffic from and to your machine. That’s not very useful, you need filters depending on what you are looking for. So here are a few scenarios:
Capture traffic on localhost
You want to see traffic from your machine to your machine. If you have an API running on localhost:8080 and you are throwing request with Postman for example, you can use this command:
sudo tcpdump port 8080 -i lo
Explained: in this particular case the traffic is not going to leave your machine, and thus you have to tell tcpdump to listen on “lo”, or whatever your loopback interface is called (you can find it out with ifconfig). And regarding “port 8080” well, that is a filter! It only shows packets going to port 8080 or coming from port 8080.
Capture traffic on a remote machine
Another scenario: you have something deployed on a remote machine, listening on port 80 for example and traffic is coming from clients. You want to see requests coming from a specific client that is facing problems:
sudo tcpdump port 80 and src 220.127.116.11 -nA
Explained: as before, “port 80 and src 18.104.22.168” is the filter. It will capture packets from/to port 80 and from client “22.214.171.124”. This means that you won’t see what the server is sending back to the client, the response. If you want to see both requests and responses from that client:
sudo tcpdump 'port 80 and (src 126.96.36.199 or dst 188.8.131.52)' -nA
Now you can see everything. Note that we are using now a couple flags:
- -n tells tcpdump not to translate IPs into names
- -A tells tcpdump to print packets in ASCII. It is helpful when you want to see body requests that contain JSON for example, or query strings.
Capture and save it for later
Sometimes you just want to capture traffic and analyze it later more carefully. You can do that with tcpdump, just append -w captured.pcap to any tcpdump command and stop it when you feel you have captured enough packets. You can also limit the number of packets or the size of the pcap file with -c and -C.
After that you can open the pcap file with Wireshark, a much more visual tool that is built on top of libpcap and is thus compatible with pcap files.
tcpdump is really powerful. I don’t usually need more advanced filters, but if you are interested here is the manpage: man tcpdump
Theory says that tcpdump is fast. And that filters are compiled and optimized but is it a good idea to capture traffic in a production environment?
Well, lets check it out. I’ve created a dummy server with Node.js that just returns a bunch of JSON to the client on port 80. Deployed it on a single core, 512MB RAM server on DigitalOcean and tried 3 different scenarios:
- No tcpdump
- Capturing traffic on port 80 and storing it on a pcap file
- Capturing traffic on port 80 and displaying it on the console with -A option
I have used Apache Benchmark to throw 100.000 calls on each one of these scenarios, with 100 concurrency and measured the results:
So it seems that watching traffic as ASCII on the console hurts a bit. But not too much, I mean, if you filter traffic so you only see a few packets that’s not too bad. Also there is no point in displaying one million packets per minute, you can’t see them!
On the other hand, storing traffic on a pcap file seems to be quite cheap, as long as your traffic is not huge. For this scenario the pcap file was 150MB and contained around 1M packets.
If you are working for YouTube maybe you should not launch tcpdump on a production streaming server hehe. But for an API I don’t think it is a problem, JSON requests and responses are small. Just be careful not to capture traffic for too long or you will fill the disk.