Encryption on the Internet has come a long, long way from the oft-ignored little yellow key in the lower left corner of your Netscape Navigator status bar. Today, cryptography is a vital part of all of our Internet lives, whether we realize it or not. Now, if you’re reading this article on Nerdland, chances are that you’re well aware of that, and I don’t need to explain why you need to be sure your online banking is done over an HTTPS connection, and why connecting your laptop to an open, unsecured wireless network is usually a bad idea.
But the little stuff can trip you up just as easily, and if you don’t have a solid understanding of the different facets of cryptography, you may well think that a system meets your security requirements when it does not. After all, modern cryptography is just mathematics. There’s no inherent application for it. Security isn’t a tangible property either; it’s an umbrella term for a whole class of goals. Rather, privacy, authentication, identification, trust, and verification — mechanisms of applied cryptography — are what provide the most commonly desired types of security. Understanding what these terms really mean, how they are implemented, and how they are different is essential to a true understanding of how encryption works to assure your security on the Internet, and even within a single computer.
This article assumes you are familiar with the fundamentals of cryptography: that you know what constitutes encryption, that you know what a key is, and that you know the basic difference between symmetric key cryptography and public key cryptography. I am concerned with describing and clearing up some misconceptions about the practical applications of cryptography to modern computing.
Privacy (or “secrecy”) is the cornerstone of applied cryptography. A commonly desired form of security is making data readable only by certain intended recipients. Whether symmetric or public key cryptography is in use, a person (or machine) proves that they are an intended recipient by possessing the key that can be used to decrypt the message. In the case of simply achieving privacy, it really doesn’t matter whether symmetric or public key encryption is used; public key encryption is very slow, so in practice, it’s only used to encrypt a symmetric key that is used to encrypt the rest of the data.
Privacy is commonly desired when sensitive data is being transmitted. In the case of web browsing, this is one of the purposes of the Secure HTTP (HTTPS) protocol. When communicating with, for example, your bank’s website, it is important that the information being transacted is private. It is highly undesirable for any other person, even a professional network administrator at your ISP, who happens to control a computer on the Internet through which the data between you and your bank passes, to be able to look at your account numbers and balances.
Similarly, if you store sensitive corporate information or highly personal documents on a laptop, you would want to make sure that these documents remain private if the laptop were ever lost or stolen. For this, you would encrypt the files (or better yet the entire hard drive) and either keep the decryption key outside of the laptop, or keep it protected with a strong passphrase. In the latter case, the passphrase itself is the key to a cryptographic algorithm will provide the unencrypted version of the decryption key for your files or hard drive, and the passphrase is ideally stored only in your head.
This is privacy: no third parties can read your data. No more, and no less. A common problem is that users, even technically savvy users, often make the false assumption that privacy implies authentication and verification. While the ability to create privacy is a prerequisite for authentication and verification, and they are often used in conjunction, it is not the case that obtaining privacy implies that the other two types of security have also been obtained.
Authentication is the act of proving who you are, or challenging someone else to prove who they are. The underlying technology for modern authentication schemes is public key cryptography. I said earlier that I was assuming familiarity with public key cryptography, but let me reiterate the most salient aspect of it for the purposes of authentication: In public key cryptography, only Alice’s private key is able to decrypt messages that have been encrypted with Alice’s public key, and only Alice’s private key is able to create encrypted messages that can be decrypted by Alice’s public key. Specifically, a message encrypted with any other private key will produce different (usually meaningless) unencrypted data if Bob attempts to decrypt it using Alice’s public key.
The fundamentals of authentication consist of a challenge-response exchange. If Bob presents (“challenges”) Alice with a piece of arbitrary data, and Alice responds with a piece of encrypted data that decrypts to Bob’s original arbitrary data when decrypted using Alice’s public key, this proves that Alice possesses Alice’s private key. Nobody else other than the person who possesses Alice’s private key (presumably only Alice) could produce encrypted data that would decrypt back to Bob’s initial data using Alice’s public key. If Bob presented Mallory with arbitrary data, and Mallory wanted to impersonate Alice, he could not; without Alice’s private key, he would not be able to produce the expected response that Bob was looking for.
It is clear from this, however, that authentication is only useful if you already know the public key of the person you are hoping to communicate with. One common application of cryptographic authentication on computer networks is Secure Shell (SSH) logins. Commonly, a user will install his or her public key on a server that they wish to log into via SSH, and will keep his or her private key on a personal machine. When logging into the server, the server challenges the client to prove that it holds the private key corresponding to the username that the client is trying to log in as. If the client satisfies the challenge with an appropriate response, the login is allowed without requiring a password for the user.
This is more secure and often more convenient than prompting for a password, since the private key is much harder to steal or guess than a password, and the same public key can be used on multiple servers with none of the security risks that apply to re-using the same password in multiple places. The same sort of thing can be done with web servers using something a little more complicated called a client-side certificate (see below about certificates), although these are uncommon on the public Internet and more often used on corporate intranets.
This is authentication: you can know with certainty who you are talking to. That is all; no more, no less. Note that this carries no implication of privacy. It is perfectly possible to authenticate your counterpart in a conversation and then proceed to have a non-private conversation. That wouldn’t be a common choice, but there’s nothing that prevents it.
More importantly, it is perfectly possible to have a private conversation without authenticating your counterpart. This is where a danger of a false sense of security lies. Bob could be talking over a perfectly private, encrypted connection, but if the person on the other end is Mallory and not Alice, Bob would never know that he is sending his sensitive data to, or receiving critical information from, a different and potentially malicious person.
In other words, just because you are sending your credit card number over a private, encrypted connection, doesn’t mean you aren’t unknowingly sending it directly to a criminal.
Identification is the aspect of applied cryptography that addresses the flaw in the above-described authentication process wherein you must know a priori the public key of the person you wish to communicate with. Perhaps surprisingly, this is the most complex common application of cryptography to security. If Alice and Bob wish to authenticate each other over the Internet, they must first exchange public keys. But they can’t just send them to each other over the Internet! If Bob received a message that purports to be from Alice and to contain Alice’s public key, he has no way to authenticate that the message actually came from Alice (and not from Mallory pretending to be Alice) without already knowing Alice’s public key. It’s a chicken-and-egg problem.
The direct solution to the problem is for Alice and Bob to exchange public keys off-line; to meet at Starbucks and hand each other CDs with their respective public keys on them. But this is not practical if Alice and Bob live thousands of miles apart, it is not practical if Alice is a banking institution and not a person, and it is still not practical if Alice and Bob do not already know each other.
If Alice and Bob are strangers (but still wish to authenticate one another) meeting to exchange CDs at Starbucks still, even if physically feasible, still isn’t secure. Mallory could show up at Starbucks a few minutes before Alice and, pretending to be Alice, give her public key to Bob, and now Bob will authenticate Mallory as Alice in future conversations. A way to fix this loophole is to have Bob check Alice’s driver’s license before accepting the CD. This is identification: you can know that a public key purporting to belong to a particular person or entity actually does.
Now, meeting in person and checking driver’s licenses is a human solution to a computing problem. There are of course computer-based solutions to this same problem that will also avoid the impracticalities of first having to meet in person with everyone whom you wish to authenticate later. But these solutions are based on the same principle as the driver’s license check: trust. The reason that Bob is willing to accept Alice’s driver’s license as proof that Alice is who she says she is because Bob trusts that the state government would not issue a license in a false name or with a false photograph (ignoring for the moment the possibility that the license itself is a fake and not issued by the state). Computational identification is based on the same notion of trust.
Ultimately, to accept that a public key belongs to the person it claims to, you must trust that it does. Trust can be simple, if for example the key was given to you in person by your friend Charlie who you are sure is not being impersonated by a shape-shifting alien. Trust can also be more indirect. If Charlie gives you his brother Dan’s public key, and you trust your that Charlie is honest and has good reason himself to trust that the key legitimately belong to Dan, then you can accept Charlie’s assertion that the key belongs to Dan as identification of Dan’s public key.
Computationally, this identification process is based on signatures and certificates. A certificate is like a driver’s license: it identifies a public key as belonging to a named individual, entity, company, or organization. The fundamentals of a certificate are simple. The person wishing to be certified generates a file with their identifying information (in a standardized format), and appends to it their public key. That’s all. But, of course, this certificate is worthless without trust. If a stranger just handed me a card saying “I am Alice, my public key is …”, I would not accept that as their identification, would you?
To be worth anything, certificates must be signed. I’ll get to the mechanics of signatures in the next section, but suffice to say that the goal of a cryptographic signature is to use a private key to produce a non-forgeable endorsement. If Dan produces a certificate for himself, and Charlie signs the certificate using his own private key, this functions as an assertion by Charlie that the contents of Dan’s certificate are accurate. Then, since I already trust my friend Charlie, Dan can simply present me with the signed certificate containing his public key to identify himself to me. I can check Charlie’s signature against Charlie’s public key (which I already have), and from that know that Charlie asserts that Dan’s certificate is accurate, and therefore that Dan’s purported public key actually belongs to him.
This is trust: you can know that a public key belongs to who it purports to by means of endorsement by a third party. What’s important is that this can all be done without ever actually contacting Charlie, beyond once to obtain and identify his public key in the first place.
Further yet, let’s say that Erin presents a certificate with her public key to me and this certificate is signed by Dan. If I trust that Charlie would only sign Dan’s certificate if Dan himself were trustworthy, then I can trust that Erin’s certificate is valid as well. This sort of peer-to-peer trust acquisition, where an identity certificate can be signed by any number of other individuals who trust the holder (with varying levels of expressed trust), is known as a web of trust, and is commonly used for personal communications amongst security-sensitive Internet users.
But most Internet users never encounter a web of trust explicitly, and don’t really need to know how it works. What they do encounter frequently, however, is the similar notion of a public key infrastructure. This is used to establish Secure HTTP (or, more generally, TLS) connections. When establishing a secure connection to, say, Bank of America, it really does no good just to make the connection private. You must authenticate that the server you are communicating with really does belong to Bank of America. The server will send your browser its public key for authentication, but in order for the authentication to mean anything, the public key itself must first be identified. To facilitate identification, the server will send you a certificate.
In order to be identifiable, the certificate will be signed by a “certificate authority“. A certificate authority is a company who sells certificate endorsements and who has the responsibility to do whatever is necessary to assure that the contents of the certificates they are signing is truthful. Part of this process may be to ask for a faxed-in copy of a driver’s license, or to call the company’s well-known phone number and check with their IT department. The price of the endorsement can itself be a means of ensuring that an applicant is not fraudulent; a large company will have no problem paying over a thousand dollars annually for an endorsement, but to a small-time impersonator, this might be prohibitive.
A public key infrastructure (PKI) differs from a web of trust in two major ways. First, in a PKI, a certificate is signed by only one endorser, while in a web of trust a certificate may have multiple endorsers. Second, while in a web of trust a user is interested in tracing the endorsement chain back to someone that he or she knows personally, in a PKI the browser is interested in tracing the endorsement chain back to a “root” Certificate Authority. What makes a certificate authority a functional “root” in the context of HTTPS is that the root authorities’ certificates and public keys are pre-installed in the browser, and signed only by themselves. And so, ultimately, you are trusting that the manufacturer of your browser (Microsoft, the Mozilla foundation, Apple, Google, Opera, etc) is pre-installing root certificates only for trustworthy certifying authorities.
By now, you should know enough about privacy, authentication, and identification to understand what those HTTPS certificate error messages you receive from your browser mean. A browser error or warning message about an HTTPS certificate almost always indicates that a problem was encountered while attempting to use the certificate to identify the remote server (the actual authentication or encryption of the data almost never fails). The most common errors encountered are that a certificate has expired, or that a certificate’s chain of endorsements cannot be traced back to a known root certifying authority. A special case of the latter is a self-signed certificate, which is not signed by any certifying authority, root or otherwise.
These errors are important because they mean that the certificate presented by the server cannot be trusted as identification. You should afford them the same level of trust as identification that you would afford the “I am Alice” card that was handed to you; that is to say, none. And without identification of the public key, any authentication you attempt to perform on the remote server is equally worthless. The person handing you the “I am Alice” card could easily be Mallory and you would never know the difference. Note, however, that this says nothing about the compromise of privacy.
An HTTPS (or TLS) connection using an expired, self-signed or otherwise untrusted certificate allows for private communication, but does not provide authenticated communication.
That is, your data is protected against third-party snoopers on its transit through the Internet, but it is most certainly not protected against your counterpart being a malicious imposter.
I took so much space writing about trust and certificates largely to get to that point, because it is perhaps the most widespread and dangerous misconception about cryptography on the Internet. It is perfectly possible to have a cryptographically private conversation with a cryptographically unauthenticated, unidentified, and untrusted server. Just because you have obtained the “privacy” form of security does not imply that you have all of these other forms of security that you may also desire, so you shouldn’t assume that you do.
This will almost seem like a post-script considering how simple it is compared to identification and trust, and really it should logically appear between identification and trust, since it is the basis for signatures, but I didn’t want to break up the narrative.
Above, I glossed over the fact that a person (in a web of trust) or a certifying authority (in a public key infrastructure) is able to endorse a certificate by “signing” it. But what does that mean, exactly? Cryptographic signatures provide verification, the final common form of cryptographic security in modern computing.
Suppose that Bob writes a will leaving half his estate to Alice and half to Charlie, and disinheriting Mallory. Suppose then that Mallory sneaks into Bob’s home office, finds his will in his desk drawer, and modifies it such that it now leaves the entire estate to Mallory and disinherits Alice and Charlie. When Bob dies and the will is read, how can the executor verify that the will is what Bob wrote and has not been tampered with? In this non-computing situation, the will will have been signed by a witness or a notary public, and the executor will trust the witness or notary to inform him if the document differs from the document that they signed.
In computing, things work essentially the same way. If an e-mail (or document, or certificate) needs to be verified as having not been tampered with, it will be cryptographically signed, and the public key of the signer will be used to verify that the contents of the e-mail, document, or certificate have not changed since the signature was applied. This is verification: you have assurance that the data has not changed since a trusted party signed it. Again, don’t infer that this means more than it does. The document need not be private, and it is important that the signature be authenticated with an identified, trusted key in order to mean anything.
The mechanics of a cryptographic signature are simple. First, a cryptographically secure hash function is applied to the document to obtain a relatively short sequence of bytes. Normally, the function used today is SHA-1. The important part about the sequence of bytes produced is that it would be incredibly difficult to create a meaningful document with different contents which would generate the same sequence when the cryptographic hash is applied to it. The output of the cryptographic hash is then encrypted using the signer’s private key and attached to the document.
The recipient can then use an identified and trusted public key belonging to the signer to decrypt the output of the cryptographic hash. If the recipient re-computes the hash on the data and compares it to the decrypted hash output, he can be assured that the document was not tampered with if the outputs match. In the case of certificates, the certifying authority’s signature of the certificate verifies that the identifying information contained within the certificate has not been altered since the time at which the certifying authority validated that the information was true.
In cases other than certificates, for example documents and e-mails, data is usually signed by its own author. For example, Alice sends an e-mail to Bob and signs it with her own private key. Then, presuming Bob already has an identified, trusted copy of Alice’s public key, he can not only verify that the message has not been tampered with, but he can also authenticate Alice as the author of the message, since no one but Alice could have produced a signature that would decrypt properly using Alice’s public key. If the message or document were signed by someone other than Alice, Bob would have to trust that the signer was being honest when endorsing that the message came from Alice.
What’s important to note is that if Alice simply sends a private message to Bob, this provides neither verification that the message has not been altered nor authentication that the message is actually from Alice. When Alice sends a private message to Bob, she encrypts it using Bob’s public key. This provides privacy and ensures that only the intended recipient (Bob) can read the message. But to provide verification and authentication, Alice must also sign the message with her own private key.
Hopefully, this article has helped the reader understand the similarities, differences, and interrelations between the five most common applications of cryptography to modern computing. To wrap up, I’ll repeat the most salient points about each:
- No third parties can read your data. Nothing is implied about the identity or trustworthiness of you or your counterpart. Neither you nor your counterpart can know that messages are not being altered or replaced in transit.
- You know with certainty that your counterpart possesses a particular private key. Nothing is implied about the identity or trustworthiness of your counterpart. The conversation may not be private, and neither you nor your counterpart can know that messages are not being altered or replaced in transit.
- You know (somehow) that a particular private key corresponds to a particular identity. There is no “conversation” involved.
- Due to an endorsement by an already-identified and already-trusted third party, you know that a particular private key corresponds to a particular identity. There is no “conversation” involved, but trust can be securely conveyed over insecure computer networks.
- You know with certainty that messages between you and your counterpart are not being altered or replaced in transit. The conversation may not be private, and nothing is implied about the identity or trustworthiness of your counterpart.
Ideally, you want all of these things at once, and that’s exactly what HTTPS (or other protocols on top of TLS) give you. That’s why it’s completely secure to give your credit card number and personal details to a bank or other merchant over the Internet, so long as you are using HTTPS and you are not otherwise worried that the bank or merchant will misuse or mishandle this information in some way completely unrelated to having transmitted it over the Internet.
The certificate given by the web server is trusted by your browser because it is identified by a certificate which has its contents verified by a certifying authority’s signature. Thus, the certificate can be used to authenticate that you are communicating with the server that the certificate describes. Once all of that that is ascertained, the cryptographic key in the certificate is used to ensure that the conversation between you and the web server is private with respect to third parties along the route of data transit.
But, of course, to be truly secure, all of these aspects must be present, and a savvy Internet user must recognize that an HTTPS error displayed by the browser indicates that that is not the case. Moreover, when using or devising security systems that are not as well automated as TLS, one must be sure that each desired aspect of security is in place, and not make the assumption that one aspect implies the others, which is most certainly not the case.
Share this content on: