Designing Painless Protocols

Of the several programming jobs I’ve had in my (still relatively short) career, the one thing I’ve had to do the most frequently is implement networking protocols. I’ve implemented standard protocols defined by RFCs, I’ve implemented in-house proprietary protocols, and I’ve implemented experimental protocols for academic research. I’ve yet to be asked to design my own from the ground up, but I have developed a good feel, from an implementer’s perspective, of what works and what doesn’t when it comes to legible, extensible, and robust protocol design.

The point of this post is not to give a guideline for how to design protocols that work well, or are efficient. That is a topic of much larger scope, and if you’re interested in that, RFC 3117 is a good jumping-off point. Rather, this post aims to give a set of suggestions for how to design protocols that will be the least painful for yourself and other programmers to implement, debug, and apply. There is an underlying assumption here that you have already spent the time to decide what your protocol is trying to accomplish and that you have found a way to make it work well (assuming that it is implemented correctly).

1. Don’t Re-invent the Wheel

If you can get away with using an existing protocol, do so. “Don’t re-invent the wheel” is not exactly a protocol-design axiom; it’s more or less a computer programming axiom in general. The premise is that, even if it doesn’t fit your purpose to a T, a well-worn (and therefore well-tested and well-debugged) existing protocol, hopefully with an existing implementation, will save you more in time and avoided headaches than rolling your own would ever save you in efficiency.

The corollary to this rule is that if you can’t use an existing protocol, at least use an existing framework. In particular, if your protocol takes the form of a remote procedure call, there are a wealth of solutions including SOAP and Google’s Protocol Buffers that can ease the effort.

2. Prefer determinism

I initially wanted to title this point “prefer simplicity”, but on reflection I realized that that isn’t really what I meant to say. Some protocols are inherently complicated, and there isn’t much that can be done about that. But even complicated protocols can have their complexity made tractable by proper modularization, and modularization is almost impossible without determinism.

The most familiar example of modularization in software is the encapsulation principle in object-oriented programming. The idea is that data is packaged into objects which can only be accessed through well-defined interfaces. This protects against unexpected operations on the data, and leads to less uncertainty (more determinism) about the state of the data and what could potentially happen to it next.

A fantastic example of a complex yet highly deterministic protocol is one half of the backbone of the modern Internet, TCP. The Transmission Control Protocol underlies most of the Internet communications we rely on daily. It provides ordered, reliable delivery with adaptive flow control to handle varying network conditions. Without TCP, hundreds of higher-level protocols would have to waste effort and add complexity by re-implementing some subset of its features. Yet, for all its power and complexity, its macro-operation can be accurately summarized in the following diagram:

TCP State Diagram. (see footer for credits)

The main thing to note about this diagram is that, given any state, there is usually only one, sometimes two, and at most three other states that can be transitioned to. Any other action is illegal. A SYN packet received in the FIN WAIT 2 state doesn’t have to be interpreted meaningfully.

In contrast, a poorly designed protocol might have the concept that “anything can happen at any time”, forcing an implementer to litter his or her code with complex state tracking and excessive case checks. Worse, non-determinism forces the implementer to consider how the implementation must respond in bizarre and unusual situations. In a deterministic protocol, these situations would either have their reactions explicitly defined, or be declared illegal and free an implementation from the requirement to make a meaningful response. In a non-deterministic protocol, failing to consider some obscure case or another becomes a sure source of bugs, incompatibilities, and potential exploits in young implementations.

Yes, an anything-goes non-deterministic protocol is quicker to design, but by doing so you shift all the effort of thinking out the implications of your design choices to the potentially numerous implementers, all of whom become encumbered with the burden of thinking out consistent reactions to every possibility, and none of whom have the luxury of being able to change the protocol’s design when a failing becomes apparent. They are forced simply to throw more code at the problem, leading to fragile, buggy, and generally poor implementations.

3. Prefer human readability

Premature optimization is the root of all evil. Unless your protocol needs, for some reason, to be blazing fast, and you know from testing that squeezing a few extra bytes of of each message will give you the speed you need, just use plain, simple text.

There are a number of advantages to a text protocol over a binary protocol. First is debugging. When your protocol implementation is behaving unexpectedly, it is much easier to read a captured stream of data and be able to immediately understand its contents than it is to be forced to write a protocol dissector that may have its own bugs getting in your way.

Second is discover-ability. A human-readable protocol is much easier for others to understand and learn. Unless you are attempting to intentionally impair interoperability (which is usually a misguided idea), this is a benefit. In lieu of, but preferably in addition to, formal documentation, a discover-able protocol is one for which someone wishing to write their own implementation can easily observe the behavior of two existing implementations, and learn how they communicate. Here, clear commands like LOGIN or GOODBYE make things a lot easier than raw numeric codes, in the absence of a protocol dissector.

Now, this doesn’t mean “don’t also be machine-readable”. You can combine the two approaches by using very structured syntax and/or keywords that can be machine-parsed with minimal effort, but which can still be read directly by humans.

One final note: if the human-readable part of your protocol is going to contain free-form text (i.e. strings that are not spelled out explicitly in the protocol specification), your protocol really should use Unicode. Next week we will be in the second decade of the twenty-first century, and English is most definitely not the only language on the Internet anymore.

4. Insist on network byte ordering

This one is not a suggestion; it’s a rule. If your protocol uses multi-byte binary-encoded numbers, and it does not use network byte ordering, then you have committed a mortal sin. Network byte ordering is called what it is for a reason – so that machines communicating on a network have a common standard for which byte ordering to use when interchanging data with one another. If new protocols don’t adhere to this standard, then what’s the point of having a standard?

The unfortunate part about network byte ordering is that it is big-endian, and as logical as big endianness is, the fact is that most of us program on little-endian machines. So the catch is that if we programmers don’t stop and think about what we’re sending out into the network, our numbers will more likely than not end up with little-endian byte ordering. The reason that this is a problem is that there are standard, portable functions that convert native byte ordering (whatever that may be) to big-endian, but there are no such things for native to little-endian. Speaking as someone who has more than once had to write an implementation of a little-endian protocol that will run on either architecture, this bites, so please be kind and use the standard.

Oh, and this relates to point 9 below as well, but please specify your protocol’s byte ordering in your documentation, even if it is little-endian for some (hopefully historical) reason. Otherwise, you will end up with people who write implementations that don’t do any conversion at all from their native ordering, and then then devices with different endianess can’t even talk over your protocol at all.

5. Make magic numbers meaningful

Let’s say your protocol can send some fixed set of message types. The simplest thing to do is just number them 1, 2, 3, 4, 5, etc, and be done with it. But you could be doing so much more with your numbers! Remember point 3, “Prefer Human Readability.” You can strike a happy compromise between human and machine readability by making your numbers meaningful, and then sparing a few extra bytes to encode them in text rather than as raw binary.

What do I mean by meaningful? Take for example the protocol that you probably just used to read this post, HTTP. Every HTTP response comes equipped with a numeric status code. These codes are not arbitrary. The code that every Internet denizen is most familiar with is 404 (File Not Found). The meaning embedded in this code is the first digit. HTTP response codes beginning with 4 indicate a client error. This contrasts with the ‘1’ prefix indicating information, ‘2’ indicating content, ‘3’ indicating redirection, and ‘5’ indicating a server error.

The FTP protocol goes even farther than this. The second digit gives an analogous summary of the response’s purpose, and the first digit gives an indication of which state the protocol instance should be in as a result. For example a response code beginning with 20 indicates that the client may now send more commands (2) and that the previous command failed due to some kind of syntax error (0). A code beginning with 12 indicates that another response message is forthcoming (1), and that the response content pertains to server status (2).

This method of classifying machine-readable numeric representations of information into meaningful numeric ranges yields two important advantages. First, it is much easier for a human debugging the protocol to remember what the numerically-indicated semantic categories are than to remember each of a few dozen arbitrarily enumerated values individually. Second, it gives logically similar values numerically similar numbers. In other words, you’re not going to end up assigning “Vanilla Ice Cream” to 3, and then when you realize later that you forgot about chocolate, assigning “Chocolate Ice Cream” to 417, right between “Cyanide” and “Adult Diapers”.

Which brings me to my next point…

6. Design for expansion

If your protocol is worth anything, it will be revised. And it will be revised again and again. In fact, the more useful you protocol is, the more likely it is to have multiple revisions and numerous extensions. Prepare for this from the state.

There are two ways to go about this, and you can do both. One is to reserve space for expansion. One form of this is to assign meaningful numbers as described above. If you decree that all numbers beginning with 3 mean “A Type of Ice Cream”, then you’ll have plenty of room to add Pistachio and Rocky Road when you remember them in version 2.0. A similar idea is to explicitly assign some values (or bit positions, in the case of flags) as “reserved” initially, and assign them to something meaningful when the time comes and they are needed.

Another way to go is to explicitly indicate your protocol version. I’d say that it’s probably always a good idea to do this. This way, one end of the connection (or both) can announce its version, and if the versions mismatch, the implementations can either decide not to communicate, or better, agree on a highest commonly-supported version to use.

The benefit to this is that from the point of agreement forward, the implementations don’t have to worry about backwards-compatibility, even for very basic things such as message structure. Reserving numbers and bits is nice, but once you run out of numbers or bits, you have to change the message format in a way that can’t be pre-determined. To get maximum benefit out of agreeing on a version to use, the version negotiation (or announcement) should occur as soon as possible in the exchange. If possible, the version should be the very first byte (or bytes) sent so that everything else about the protocol can be changed with wild abandon if desired. The other backbone protocol of the Internet, IP, does exactly this, and that helps makes IPv6 possible.

7. Don’t be stingy with information

Except to the extent that it becomes a security concern, one end of your protocol should never hide relevant information from the other. In other words, each end of the connection should have within the protocol a mechanism for querying the other end for information. Relying on assumptions about the properties or state of the connection parter will only lead to increased fragility and more backwards-compatibility headaches. In fact, the version announcement mechanism described in the previous point is almost a special case of this. Each end of the connection preemptively answers the implicit query “which version(s) of the protocol do you support?”

For example, consider a case where you are a client trying to retrieve a piece of data from a remote server. The protocol first makes you select a data set, and then the client may either retrieve a specific datum by key, or retrieve the entire set. There are several practical queries that the server should be prepared to respond to, and which the protocol should make possible:

Which data sets exist on the server?
Which data sets does this client have permission to access?
Which data set is currently selected?
Which keys are available in this data set?

In the absence of these queries, the client may have to behave inefficiently, or be unable to operate at all. If there is no query to determine the available data sets, the protocol has made the assumption that the client and the server will always both know and agree upon the names of the available datasets, which makes it hard to modify the data structure without modifying both sides of the protocol. Similarly, if there is no mechanism to query for keys, a client may in some circumstances need to retrieve the entire data set when it could have gotten away with much less traffic. The ability to determine the currently selected set, while seemingly unnecessary, may allow a simplified client implementation with less explicit state tracking, depending on the details of this imaginary protocol.

In general these sorts of things are dead simple to include, and (again aside from security concerns) confer only benefits.

8. Document your protocol precisely

Code programs computers, and specifications program programmers. Much like you cannot expect a computer to do what you want it to do in the absence of specific, precise code explaining exactly what it should do, you cannot expect a programmer to do what you want him or her to do in the absence of a specific, precise specification explaining exactly what you want him or her to do.

If you ever hope to have other programmers write implementations of your protocol that are interoperable with yours, your protocol had better have good documentation. The documentation needs not only to specify what the packet layout or command syntax is, but also what the observable semantics of the messages should be.

For example, it is a really bad idea to create a flag in one of your packets named “restart connection”, and give no further documentation on what should happen when a packet with that flag set is received. You may know what you mean by that, but another programmer working only from your documentation, will not know whether this requires a confirmation packet, whether this is a request or a command, what will happen if another packet is sent without restarting the connection, what state the protocol should be in upon being reset, whether there should be a limit to the number of consecutive restarts before giving up.

Better documentation for such a flag (using RFC 2119 keywords) would be:

Restart Connection Flag. If a packet with the Restart Connection Flag set is received, the receiver MUST NOT send any further packets to the remote host. If any further packets with this connection ID are received by the remote host, they will be rejected with an Invalid Connection ID error. The host receiving the Restart Connection command SHOULD immediately restart the connection by sending a Hello packet to the remote host. No state information from the current connection will be retained in the new connection.

This makes it clear that this is a command, not a request, and does not require (and in fact forbids) a response. It makes clear what will happen if the request is ignored, and tells you exactly how to take the action that the command requires. It also implies (by using SHOULD rather than MUST), that an implementation may choose not to restart and instead just cease communication. Permitting this behavior allows an implementation to, for example, escape from an infinite loop of restarts without violating the protocol semantics.

9. Follow the Robustness Principle

Also known as Postel’s Law, the <a href="http://en.wikipedia.org/wiki/Robustness_principle"Robustness Principle states: “be conservative in what you do, be liberal in what you accept from others.” This was originally coined in RFC 761, the document specifying TCP. This is a very important, and widely known, yet also widely misunderstood aphorism.

The most notorious misapplication of this principle was in the implementation of early HTML parsers. Based on this idea, the parsers would take in any old junk that vaguely resembled HTML and try as hard as possible to make something legible out of it when displaying the results. The result of this extreme laxity was more than a decade of the nightmare known as “tag soup” which is only now beginning to abate.

The real meaning of the Robustness Principle is not that erroneous input should be accepted as valid, but that erroneous input should not cause catastrophic failure, that any valid parts of a partially-erroneous input should be accepted if possible, and that diagnostics should be given for erroneous input when feasible. An HTML parser implementation that properly followed this rule would, upon receiving “tag soup” HTML, produce a warning message that the HTML was invalid, hopefully display some sort of information about what about it was wrong (e.g. unclosed anchor tag, missing doctype, etc), and only then try to (or give the option to) display the parser’s best approximation of what the author meant.

An HTML parser is not a protocol, but a protocol should behave similarly. Given an illegal command or request, or a set of conflicting options, what a protocol implementation should not do is: crash, silently ignore the problematic messages, or arbitrarily choose one of the conflicting options to ignore. Instead, the implementation should respond with a diagnostic indicating why the message could not be fully processed, and should process any part of the message that was valid if it can be done safely and unambiguously.

Keep in mind that the diagnostics need not be completely machine readable. A set of mutually exclusive options specified together could be responded to with a message like “573 Illegal Options – Foo and Bar cannot be specified at the same time”, where 573 is a machine-readable code indicating that the message was not processed due to an error in the options field, and the rest is something that can be understood by a human programmer when debugging his or her implementation.

The only significant exception to the applying robustness principle is in avoiding denial-of-service attacks. Responding verbosely to malformed input may exacerbate the effects of a denial-of-service attack, and so it is for example reasonable to cease this behavior in the face of repeated offenses originating from the same host in a short period of time.

10. Design for security from the start

In the nine points above I have referenced numerous protocols from the standard Internet protocol suite. I did this not just because they are well-known but also because they are for the most part well-designed (otherwise they would not have survived the explosion in the size of the Internet in the past twenty years).

However, one shortcoming is common to many of them, and we live with its detrimental effects every day. These protocols, designed when the Internet was in its infancy as an academic and governmental experiment, were not designed with security in mind. This is what facilitates spam, denial-of-service, phishing, privacy invasion, and all other sorts of malfeasance on the Internet.

Today, however, the Internet is relatively mature and very, very public, so there is absolutely no excuse to design a new, security-free protocol. Neither is it acceptable to defer the addition of security features to a later version of the protocol. Another vital lesson that we have learned from the Internet protocol suite is that it is incredibly difficult to adopt secure protocol enhancements after a protocol has been widely deployed. If this were trivial, then the entire Internet would be running over IPsec already.

This isn’t to say that all traffic should be encrypted. Of course it is acceptable to forgo encryption if the traffic isn’t sensitive, but things that are completely unacceptable in the modern Internet include sending passwords in the clear, using predictable sequence numbers, and assuming good behavior from the other end of the connection. If you simply avoid those security pitfalls and make use of some established mechanism (e.g. TLS) for producing a cryptographic layer when you need to transmit sensitive information, your protocol will be much, much better off for it.

Note well, though, that to be secure does not mean to be obscure. Encrypting your protocol is quite different from making it incomprehensible. Encryption should be a layer. Once the encryption layer is removed, the protocol should continue to adhere to the design principles articulated above, including human-readability, discover-ability, meaningful magic numbers, etc.

Share this content on: