M17 callsign routing

M17 is a new protocol for ham radio operators, aiming to compete with DMR and D-STAR.

M17 is still under development, so this is a short post about one aspect of the design - handling direct calls.

We want to be able to dial a users callsign and have the radio automagically find the active radio for that user, if there is one. Let’s say SP5WWP wants to chat with N7TAE. SP5WWP would somehow type “N7TAE” into his radio, and if N7TAE is online, they’ll be connected.

This is trivial on one server, of course - have everyone connect to that server, and maintain that UDP connection. The server inspects the source and destination fields of each packet and routes them appropriately. (Handling more than one simultaneous call is more involved, but we’re not there yet.)

One big disadvantage is centralization - the power of a network is held by whoever controls the server. If that server is overloaded, the whole thing falls apart, as anyone who uses Brandmeister on DMR might recognize.

Another issue is fragmentation, a natural consequence of centralization and ornery amateur radio operators, which leads to many servers, all of which now need to communicate among themselves (or be entirely separate kingdoms with no interoperability).

Isolated kingdoms is easy to implement, but breaks our original use-case. Since we want that global interoperability, we’re back to finding a way to communicate to other servers what users are on each server.

Well, we could put together a server-server to keep track of all our servers…

So we’ve recreated the original problem, just with more computers, which as everyone knows is how you get simple systems that are easy to reason about.

What if we make the voice calls direct, and use the server as a phone book? Reflectors and other use cases can even register themselves (and maybe their users?), and we can avoid having to pass around IP addresses and manual reflector management.

Direct voice calls would keep the load down, and force the actual routing and traffic handling down to the individual M17 clients, whether that’s a repeater or a handheld radio using the wifi chip.

This scheme is not decentralized, but it’s certainly /less/ centralized than the first description above… and you still need a Schelling Point even in something like BitTorrent (maybe more on that in later posts).

But now that we are doing direct calls, we need to deal with firewalls, and we can’t expect every M17 user to set up a permanent port forward to their radio. Further, we want it to work on networks where the M17 user may not have control of the firewall.

Stateful firewalls, which are necessary for NAT, keep track of ongoing connections and allow traffic that matches an ongoing connection to continue. This means without manual configuration, all traffic must start as an outbound connection. There are ways around this - protocols like UPnP and others, but there’s a really simple concept we can use to our advantage.

If all incoming traffic has to match an existing connection that was started outbound to be delivered, then two clients behind their firewalls can both try to connect to each other. Both connections are outbound! e.g. First, SP5WWP tries to talk to N7TAE:

SP5WWP -> SP5WWP’s firewall -> internet -> N7TAE’s firewall -//-> N7TAE

N7TAE’s firewall doesn’t recognize the traffic, since it’s inbound and there aren’t any existing connections related to it. But, SP5WWP’s firewall keeps track of the outgoing packet and now knows to accept related inbound packets. So now, SP5WWP will see any inbound traffic, like so:

N7TAE -> N7TAE’s firewall -> internet -> SP5WWP’s firewall -> SP5WWP

Now both firewalls are allowing traffic through, and SP5WWP can send traffic to N7TAE and vice versa.

The only trick is to somehow communicate to N7TAE and SP5WWP that they should try and connect to each other. Since we already have a phone book server, we can use that to relay messages between users, kinda like the first draft at the very top of this post.

Now our protocol looks something like this:

SP5WWP -> phone book server:“connect”

N7TAE -> phone book server: “connect”

W2FBI -> phone book server: “connect”

(clients like SP5WWP, N7TAE, and W2FBI keep an open connection with the phone book server, which keeps track of the callsign and connection details)

(…some time passes, and SP5WWP initiates a call:)

SP5WWP -> phone book server: “Where is N7TAE?”

phone book server -> N7TAE: “SP5WWP is at 76.12.43.71:17000 and calling you”

phone book server -> SP5WWP: “N7TAE is at 123.8.8.8:17000”

N7TAE -> SP5WWP: “hi!” (content of this packet does not matter, it’s just N7TAE instructing his firewall to allow traffic from SP5WWP through)

SP5WWP -> N7TAE: (voice stream)

and the voice stream will get through!

There’s more detail to it, like port numbers, but the raw concept is as you see above. For more details, this kind of thing is called “UDP hole punching”.

What if we used a distributed hash table instead of the “phone book” server? Tune in next time.