Masscan as a lesson in TCP/IP

By Robert Graham

When learning TCP/IP it may be helpful to look at the masscan port scanning program, because it contains its own network stack. This concept, "contains its own network stack", is so unusual that it'll help resolve some confusion you might have about networking. It'll help challenge some (incorrect) assumptions you may have developed about how networks work.

For example, here is a screenshot of running masscan to scan a single target from my laptop computer. My machine has an IP address of 10.255.28.209, but masscan runs with an address of 10.255.28.250. This works fine, with the program contacting the target computer and downloading information -- even though it has the 'wrong' IP address. That's because it isn't using the network stack of the notebook computer, and hence, not using the notebook's IP address. Instead, it has its own network stack and its own IP address.

At this point, it might be useful to describe what masscan is doing here. It's a "port scanner", a tool that connects to many computers and many ports to figure out which ones are open. In some cases, it can probe further: once it connects to a port, it can grab banners and version information.

In the above example, the parameters to masscan used here are:

  • -p80 : probe for port "80", which is the well-known port assigned for web-services using the HTTP protocol
  • --banners : do a "banner check", grabbing simple information from the target depending on the protocol. In this case, it grabs the "title" field from the HTML from the server, and also grabs the HTTP headers. It does different banners for other protocols.
  • --source-ip 10.255.28.250 : this configures the IP address that masscan will use
  • 172.217.197.113 : the target to be scanned. This happens to be a Google server, by the way, though that's not really important.

Now let's change the IP address that masscan is using to something completely different, like 1.2.3.4. The difference from the above screenshot is that we no longer get any data in response. Why is that?

The answer is that the routers don't know how to send back the response. It doesn't go to me, it goes to whoever owns the real address 1.2.3.4. If you visualize the Internet, the subnetworks are on the edges. The routers in between examine the destination address of each packet and route it in the proper direction. You can send packets from 1.2.3.4 from anywhere in the network, but responses will always go back to the proper owner of that address.

Thus, masscan can spoof any address it wants, but if it's an address that isn't on the local subnetwork, then it's never going to see the response -- the response is going to go back to the real owner of the address. By the way, I've made this mistake before. When doing massive scans of the Internet, generating billions of packets, I've accidentally typed the wrong source address. That meant I saw none of the responses -- but the hapless owner of that address was inundated with replies. Oops.

So let's consider what masscan does when you use --source-ip to set its address. It does only three things:

  • Uses that as the source address in the packets it sends.
  • Filters incoming packets to make sure they match that address.
  • Responds to ARP packets for that address.

Remember that on the local network, communication isn't between IP addresses but between Ethernet/WiFi addresses. IP addresses are for remote ends of the network, MAC addresses are how packets travel across the local network. It's like when you send your kid to grab the mail from the mailbox: the kid is Ethernet/WiFi, the address on the envelope is the IP address.

In this case, when masscan transmits packets to the local router, it needs to first use ARP to find the router's MAC address. Likewise, when the router receives a response from the Internet destined for masscan, it must first use ARP to discover the MAC address masscan is using.

As you can see in the picture at the top of this post, the MAC address of the notebook computer's WiFi interface is 14:63:a3:11:2d:d4. Therefore, when masscan see's an ARP request for 10.255.28.250, it must respond back with that MAC address.

These three steps should impress upon you that there's not actually a lot that any operating system does with the IP address assigned to it. We imagine there is a lot of complicated code involved. In truth, there isn't -- there's only a few simple things the operating system does with the address.

Moreover, this should impress upon you that the IP address is a property of the network not of the operating system. It's what the network uses to route packets to you, and the operating system has very little control over which addresses will work and which ones don't. The IP address isn't the name or identity of the operating system. It's like how your postal mailing address isn't you, isn't your identity, it's simply where you live, how people can reach you.

Another thing to notice is the difference between phone numbers and addresses. Your IP address depends upon your location. If you move your laptop computer to a different location, you need a different IP address that's meaningful for that location. In contrast, the phone has the same phone number wherever you travel in the world, even if you travel overseas. There have been decades of work in "mobile IP" to change this, but frankly, the Internet's design is better, though that's beyond the scope of this document.

That you can set any source address in masscan means you can play tricks on people. Spoof the source address of some friend you don't like, and they'll get all the responses. Moreover, angry people who don't like getting scanned may complain to their ISP and get them kicked off for "abuse".

To stop this sort of nonsense, a lot of ISPs do "egress filtering". Normally, a router only examines the destination address of a packet in order to figure out the direction to route it. With egress filtering, it also looks at the source address, and makes sure it can route responses back to it. If not, it'll drop the packet. I tested this by sending such spoofed addresses from 1.2.3.4 to a server of mine on the Internet, and found that I did not receive them. (I used the famous tcpdump program to filter incoming traffic looking for those packets).

By the way, masscan also has to ARP the local router. in order to find it's MAC address before it can start sending packets to it. The first thing it does when it starts up is ARP the local router. It's the reason there's a short delay when starting the program. You can bypass this ARP by setting the router's MAC address manually.

First of all, you have to figure out what the local router's MAC address is. There are many ways of doing this, but the easiest is to run the arp command from the command-line, asking the operating system for this information. It, too, must ARP the router's MAC address, and it keeps this information in a table.

Then, I can run masscan using this MAC address:

    masscan --interface en0 --router-mac ac:86:74:78:28:b2 --source-ip ....

In the above examples, while masscan has it's own stack, it still requests information about the operating system's configuration, to find things like the local router. Instead of doing this, we can run masscan completely indepedently from the operating system, specifying everything on the command line.

To do this, we have to configure all the following properties of a packet:

  • the network interface of my MacBook computer that I'm using
  • the destination MAC address of the local router
  • the source hardware address my MacBook computer 
  • the destination IP address of the target I'm scanning
  • the source IP address where the target can respond to
  • the destination port number of the port I am scanning
  • the source port number of the connection

An example is shown below. When I generated these screenshots I'm located on a different network, so the local addresses have changed from the examples above. Here is a screenshot of running masscan:

And here is a screenshot from Wireshark, a packet sniffer, that captures the packets involved:

As you can see from Wireshark, the very first packet is sent without any preliminaries, based directly on the command-line parameters. There is no other configuration of the computer or network involved.

When the response packet comes back in packet #4, the local router has to figure out the MAC address of where to send it, so it sends an ARP in packet #2, to which masscan responds in packet #3, after which that incoming packet can successfully be forwarded in packet #4.

After this, the TCP connection proceeds as normal, with a three way handshake, an HTTP request, an HTTP response, and so forth, with a couple extra ACK packets (noted in red) that happen because masscan is actually a bit delayed in responding to things.

What I'm trying to show here is again that what happens on the network, the packets that are sent, and how things deal with them, is a straightforward function of the initial starting conditions.

One thing about this example is that I had to set the source MAC address the same as my laptop computer. That's because I'm using WiFi. There's actually a bit of invisible setup here where my laptop must connect to the access-point. The access-point only knows the MAC address of the laptop, so that's the MAC address masscan must use. Had this been Ethernet instead of WiFi, this invisible step wouldn't be necessary, and I would be able to spoof any MAC address. In theory, I could also add a full WiFi stack to masscan so that it could create it's own independent association with the WiFi access-point, but that'd be a lot of work.

Lastly, masscan supports a feature where you can specify a range of IP addresses. This is useful for a lot of reasons, such as stress-testing networks. An example:

    masscan --source-ip 10.1.10.100-10.1.10.64 ....

For every probe, it'll choose a random IP address from that range. If you really don't like somebody, you can use masscan and flood them with source addresses in the range 0.0.0.0-255.255.255.255. It's one of the many "stupid pet tricks" you can do with masscan that have no purpose, but which comes from a straightforward applications of the principles of manually configuring things.

Likewise, masscan can be used in DDoS amplification attacks. Like addresses, you can configure payloads. Thus, you can set the --source-ip to that of your victim, a list of destination addresses consisting of amplifiers, and a payload that triggers the amplification. The victim will then be flooded with responses. It's not something the program is specifically designed for, but usage that I can't prevent, as again, it's a straightforward application of the basic principles involved.

Conclusion

Learning about TCP/IP networking leads to confusion about the boundaries between what the operating-system does, and what the network does. Playing with masscan, which has it's own network stack, helps clarify this.

You can download masscan source code at: