Storing Passwords Securely

Time and time again you hear about a company having all of their users’ passwords, or “password hashes”, compromised, and often there’s a press response including one or more prominent security researchers demonstrating how 1,000 users had the password “batman”, and so on. It’s surprising how often this happens considering we’ve had ways to do password authentication that don’t expose users’ passwords, or at least makes it significantly harder to crack them, for several decades.

Personally, I think it boils down to a fundamental misunderstanding about what cryptographic hash functions are and what they are—or should be—used for, and a failure on the part of security researchers and advocates, myself included, to properly explain and emphasize the differences. So here’s an attempt to explain why “SHA 256-bits enterprise-grade password encryption” is only slightly better than storing passwords in plain text.

If you are familiar with cryptographic hash functions like MD5, SHA-1 and SHA-256, and perhaps even use them for password authentication, please jump to Cryptographic Hash Functions Are Not Password Hash Functions.

Password Storage

Typically, system designers choose one of two ways to store their users’ passwords: 1. in their original format, as plain text, or 2. as the digest (output) of a one-way hash function. It probably goes without saying that the first option is a bad idea considering that any kind of compromise of the users/password database immediately exposes login credentials clients may be using on many other sites—but would it surprise you that the latter, as implemented in the majority of web systems, only provides marginally stronger security?

(The popular argument for storing passwords in plain text is that it reduces the required labor for help desk staff in that they’ll easily be able to tell a person what their password is if they happen to forget it. This strikes me as very unconvincing. First, it’s easy to design a page where the user can reset their password if they forget it—the “Forgot your password?” page—and second, if you use a more secure method like storing the output of a hash function instead of the password itself, help desk staff can still reset a user’s password if they’ve forgotten it. It’s probably also a bad idea for help desk staff to have access to the plain text password of a large corporations’ CEO or CFO.)

One-way Hash Functions

The safe way (right?) of storing users’ passwords is by running them through a one-way hash function. A one-way hash function is a series of mathematical operations that transform some input into a sort of “fingerprint” called a digest. If you run the password hunter2 through the popular cryptographic hash function SHA-256, you get the following digest (in hex): f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7

As the name implies, these functions are one-way, meaning you can turn something into a digest, but you can’t turn the digest into what it originally was. When a user creates an account, the system stores their account information, but instead of storing their password on the disk, it runs the password through the one-way hash function and stores the digest instead. When a user wants to log in to the system in the future, the system takes the password that they provide, runs it through the one-way hash function, then compares it to the existing digest stored for that user. The system only needs to be able to check if the output from the hash function is the same; it doesn’t need to store any details about the password, and it doesn’t need to remember the password itself.

A good cryptographic hash function—the sort of one-way hash that we will be discussing—should produce digests that are very different when the input is altered even a little. (This is known as the avalanche effect.) The SHA-256 digest of hunter3, although it is very close to hunter2, is very different: fb8c2e2b85ca81eb4350199faddd983cb26af3064614e737ea9f479621cfa57a

These properties make it a lot harder for someone to infer anything about the original input by e.g. changing a certain character at a time in their guesses. But do they make it harder to guess, or “crack”, user passwords?

Cryptographic Hash Functions Are Not Password Hash Functions

In Python, we might write functions that get a SHA-256 digest of a password, and compare digests, like this, using functions from the standard library:

import hashlib

def getDigest(password):
    return hashlib.sha256(password).hexdigest()

def isPassword(password, digest):
    return getDigest(password) == digest

This is usually considered a finished password authentication mechanism (save for the storage of the digest) in most web systems. The password can’t be deduced by looking at the digest, so what is to be gained by doing anything further?

Well, there are many problems with simply hashing passwords. The two major ones are:

Recognizability

What is the major downside of the fact that a digest of a certain message always returns the same digest? Well, if an attacker spends a lot of resources pre-computing digests for as many passwords as possible, and they get access to your user database, they can check all of those digests against the ones you’ve stored. Furthermore, they can use the list of digests they’ve created to try to find matches in many different user databases that they’ve compromised. (This form of digest list is commonly called a rainbow table.)

What’s even worse is that, if the attacker guesses, or “cracks”, a password, they can simply search your database to find all users with the same password digest—they’ll know that anyone with the same password digest uses the same password as the user whose password they’ve guessed.

Speed

Hash functions are used for many things in cryptography, most commonly verifying messages—a message can be a block of data in an encrypted data stream—but they were not designed for password storage/to be used as stored secrets. They are designed to be very fast so the encryption process isn’t slowed down, and that’s a problem because, when used for storing passwords, an attacker can try to guess the password by hashing arbitrary strings and comparing the output to the stored digest at a very high speed. (For something like MD5, it’s possible to make up to, and more than, 5.6 billion guesses per second using commodity hardware.) Even if the password cannot be deduced by looking directly at the digest, the password, if it is not very long or very complex, can be guessed very easily. The vast majority of user-chosen passwords are not very long, nor very complex. The speed ends up hurting the user instead of helping them.

Password operations in web applications, for instance, are usually not time sensitive: Consider a person who has just spent 20 seconds entering their username and password to log in to your website. Would they mind very much if it took one second to generate the digest of their password (to check if their login details are valid) instead of a fraction of a millisecond?

Fortunately, there are solutions to these problems:

Salting

A salt is a random sequence of bytes which is added to the hash function, or just to the password string itself, so you create a digest for e.g. 6zvz3ylalpkp03lua8r4yyzdoq7e2js2 + sw0rdf1sh! rather than just sw0rdf1sh!. Anyone who knows what the digest for sw0rdf1sh! is won’t be able to match it against the salted digest. If each user has their own salt, there is no easy way to group users with identical digests to find users with the same passwords. This largely solves the recognizability problem.

Every password should have its own salt, and that salt should be at least 32 bytes or more to make it much harder to guess both the correct salt and digest.

To implement password salting, we can rewrite our functions to something like:

import base64
import hashlib
import os

def getDigest(password, salt=None):
    if not salt:
        salt = base64.b64encode(os.urandom(32))
    digest = hashlib.sha256(salt + password).hexdigest()
    return salt, digest

def isPassword(password, salt, digest):
    return getDigest(password, salt)[1] == digest

(Note: These examples are for demonstration purposes only.)

We then store the user’s password salt and digest in the database, and run isPassword(providedPassword, storedSalt, storedDigest) on subsequent login attempts to check whether the provided password is the same.

Stretching

If you create a digest of a password, then create a digest of the digest, and a digest of that digest, and a digest of that digest, you’ve made a digest that is the result of four iterations of the hash function. You can no longer create a digest from the password and compare it to the iterated digest, since that is the digest of the third digest, and the third digest is the digest of the second digest. To compare passwords, you have to run the same number of iterations, then compare against the fourth digest. This is called stretching.

A good password storage system takes so long to process a single input, e.g. 0.2 seconds on a modern computer, that guessing a password using brute force will take significantly longer. (With a hash algorithm like SHA-256, this might be 100,000 iterations or more.) Where, previously, one might have been able to compare digests 5.6 billion times per second, it might now be 5 times per second on the same computer without parallelization; more, maybe a few hundred or thousand attempts per second using hardware like GPUs—but still significantly less than 5,600,000,000!

As computers become more powerful, the number of iterations can be increased so it continues to take 0.2 seconds or more to generate each password digest. This might be accomplished like so:

import hashlib
import os

def getDigest(password):
    digest = hashlib.sha256(password).hexdigest()
    for x in range(0, 100001):
        digest = hashlib.sha256(digest).hexdigest()
    return digest

(Note: This is an over-simplification. Iterating a hash function is significantly better than just running the input through a hash algorithm once, but still loses you a fairly significant amount of entropy.)

Salting and stretching might then be implemented like this:

import base64
import hashlib
import os

def getDigest(password, salt=None):
    if not salt:
        salt = base64.b64encode(os.urandom(32))
    digest = hashlib.sha256(salt + password).hexdigest()
    for x in range(0, 100001):
        digest = hashlib.sha256(digest).hexdigest()
    return salt, digest

def isPassword(password, salt, digest):
    return getDigest(password, salt)[1] == digest

Now, we could implement a complete system that uses salts and multiple iterations, but it seems like a system for safely storing passwords should have already been implemented. If we could use a widespread, standardized system, we wouldn’t have to risk getting the implementation wrong.

(Implementing your own cryptographic functions is one of the riskiest things you can do: You can’t test for vulnerabilities if you don’t know what the vulnerabilities might be, and, most likely, you will be the only person to scrutinize and use the implementation. If it turns out that there are errors in your implementation, your users suffer, and you get some very bad press.)

As it turns out, there are several such systems, and variations of them have been used in systems like BSD and Linux for many years.

Adaptive Key Derivation Functions

Adaptive key derivation functions are exactly what we’ve discussed above: Functions that generate digests from passwords whilst applying salting and stretching. They implement all of the above features, and often in a way that would be difficult to achieve using just a programming language’s standard library. For instance, they might work such that the digest computation can’t easily be parallellized—something that is very doable with plain MD5 and all members of the SHA family. In effect, attackers can’t easily apply specialized hardware like GPUs or FPGAs to greatly improve the speed at which passwords can be guessed using a brute force approach.

(Technically, key derivation functions derive strong keys to be used for subsequent encryption, however, since the functions we’ll be discussing are one-way, they can be used for “password digests.”)

Some of the most prominent such functions are:

  • PBKDF2

    Arguably the most widely-used key derivation function, PBKDF2 (Password-Based Key Derivation Function) is a container for a hash function, e.g. SHA-1 or RIPEMD-160, which, for each input, applies a salt and iterates the hash function many times (in a way that doesn’t cause as much entropy to be lost), so it takes e.g. 1 second to generate a single digest rather than 0.01 milliseconds. Endorsed by NIST, it is used in U.S. government systems for the purposes of generating strong encryption keys from user-supplied (i.e. weak) passwords. PBKDF2 has the advantages that it’s very lightweight, easy to implement, and it uses only very strong, proven hash functions like the NSA’s SHA.

  • bcrypt

    bcrypt is an adaptive hash function designed specifically for password “storage.” It uses a modified version of Blowfish by Bruce Schneier rather than iterating a hash function. Its designers, Niels Provos and David Mazières, first published their paper describing it, A Future-Adaptable Password Scheme, at the 1999 USENIX, yet it is still one of the strongest password hashing mechanisms thanks to its “work factor” which determines how much processing is needed to produce a single hash digest. It is also easy to use: the password salt and a number indicating the work factor are included in the output so that system designers can keep using bcrypt, but up the work factor over time, without worrying about users being unable to login. bcrypt is the default password authentication mechanism in OpenBSD, an operating system notorious for being “obsessed with security”, and it is generally considered more future-proof than PBKDF2. Its major limitation is that, unlike PBKDF2 and scrypt (below), it places a hard size limit of 72 bytes/ASCII characters on the input.

  • scrypt

    scrypt is an adaptive key derivation function like PBKDF2 designed by Colin Percival for use in Tarsnap, a “backup solution for the truly paranoid.” It is much stronger than PBKDF2 because it has a significant memory overhead, which means it’s significantly harder to parallelize the function, and thus significantly harder to guess the original input for a key, or password digest, using a brute force approach. It is still steadily gaining library support, but is significantly more future-proof than both PBKDF2 and bcrypt.

  • Argon2

    (Update) The newcomer to the scene, Argon2, was designed by a team of cryptographers at the University of Luxembourg. It won the Password Hashing Competition organized by Jean-Phillipe Aumasson, et al. Like scrypt, it significantly increases the amount of memory and CPU required to crack stolen password digests. It is stronger than any of the above, but library support is still sparse. If you can find a library for your language, you may have to store the Argon2 parameters and output manually (whereas bcrypt stores a digest that includes the parameters.)

So, what can we gather from all of this?

Here is my view:

  1. MD5, SHA-1, SHA-256, SHA-512, et al, are not “password hashing functions.” By all means use them for message authentication and integrity checking, but not for password authentication.
  2. If you are a government contractor, want to be compliant with security certifications or regulations like ISO 27001 or FIPS 140-2, or don’t want to depend on third-party or less-scrutinized libraries, use PBKDF2-HMAC-SHA-256/SHA-512 with a large number of iterations to generate digests of your users’ passwords. (Ideally it should take a second or more to generate a single digest.)
  3. If you want very strong password digests, and a system that is very easy to use, use bcrypt. Simple, easy-to-use libraries exist for nearly every programming language. (Just google “bcrypt <language name>”, and chances are you’ll find a solid implementation.)
  4. If you want the strongest algorithm possible, and don’t mind doing a little more work to implement it, use Argon2 or scrypt (in that order.)

It’s easy to switch, too. You can use e.g. bcrypt for all new users, and generate a bcrypt digest for old users whenever they log in (and you have their passwords in memory) to migrate them to the new system.

Additional Measures

Adaptive key derivation and hash functions do worlds of difference, but they are not a silver bullet. The strength of a password digest still ultimately depends on the entropy (length and randomness) of a user’s password. Therefore, try to enforce a sensible password policy that encourages users to pick strong passwords. By this I mean encourage users to pick long passwords, or passphrases, rather than telling them to include X or Y amount of special or uppercase letters. The former does far more, and users won’t have as much difficulty remembering their password. (Ideally, users should have very long, all-random and unique passwords that they either write down or store in a password manager, but this is difficult to mandate.) Also, please do not enforce some arbitrary upper limit like 12, or even 8—can you believe some banks still do this?—characters, unless longer inputs cause your hash function to lose entropy that early (in which case you should change your hash, anyway.)

(As an example, a brute force cracker like ighashgpu for MD5 can make 5,600,000,000 guesses per second. That’s 5.6 billion, or 5,600 million, guesses at what your password, based on an MD5 digest, might be, per second. Using this tool, it would take approximately 3 million years to guess great hunter, and only around 3 hours to guess Hun!er2 using a naive brute force approach. A long and random passphrase is often both easier to remember, and far more secure than a traditional password—the ones people have been telling you to use. It is possible to make guesses that are combinations of words, of course, so a strong passphrase is typically longer and non-sensical so you can’t guess it using common phrases/excerpts from literature. A good one might be consider the army seahorse clicking the roof. For more inspiration, check out Diceware.)

Although implementing any of the above will make your password digests more secure than those of many large businesses—sadly—there’s more you can do relatively easily. Here are a few things:

  • Rate-limiting/Exponential backoff

    In your web application, keep track of account names and IP addresses when a login attempt is unsuccessful, and block, or slow down responses to requests from a certain such combination after e.g. 5 failed attempts. This won’t stop somebody from cracking your password digests if they are compromised, but it will make brute forcing an account password using your regular interface infeasible.

  • HMAC nonce on harddrive

    If you are using an application which communicates with a database, and especially if there are other applications using the same database which you do not have control over, consider adding an extra secret: Generate a long, random value—or “pepper”—to use in HMAC(userpassword, pepper), and store the pepper (not the HMAC digest) on the disk or in your application itself, then store e.g. bcrypt(hmacdigest) in the database. Even if your database is compromised, or a weakness is found in bcrypt that might leak information about its password, an attacker won’t be able to do much with your digests. This is described further here. Sample implementations: Python, Go.

  • Secure Remote Passwords

    If you control the client side of the connection, i.e. if you are not making a simple web application, it is absolutely worth looking into the Secure Remote Password protocol. Using assymetric key exchange protocols, this method lets you authenticate user passwords without ever receiving them—that is, without your users ever having to send their password over the Internet, and without you having to worry as much about their storage. Sounds magical? Well, it kinda is. It can be employed in conjunction with one of the key derivation functions (on the client-side) to make the password very hard to recover, even if you can listen in on network traffic.

Extra

Here is the equivalent of the salting and stretching hash example from above, implemented using py-bcrypt:

import bcrypt

def getDigest(password):
    return bcrypt.hashpw(password, bcrypt.gensalt())

def isPassword(password, digest):
    return bcrypt.hashpw(password, digest) == digest

There is no reason not to do this for your users.


Update: Thanks to several readers who have pointed out that isPassword in my examples doesn’t make use of a constant-time comparison function. To be clear, the code samples are not meant to be used, but serve to demonstrate how salting and stretching work. While timing attacks against a password digest comparison is less of a concern than in general—especially given a large salt—it deserves a mention.

When you compare messages of equal length, e.g. digests, you may want to use a function that takes a constant amount of time to run so that an attacker can’t learn anything about a digest by measuring how long it takes for the equality function, i.e. == to return. (== returns quickly because it’s designed to be efficient, and so returns when it encounters the first different byte.) Here is an example for Python (based on passlib.utils.consteq):

import sys

inputMismatchError = TypeError("inputs must be both unicode or both bytes")
def constantTimeCompare(a, b):
    if isinstance(a, unicode):
        if not isinstance(b, unicode):
            raise inputMismatchError
        isPy3Bytes = False
    elif isinstance(a, bytes):
        if not isinstance(b, bytes):
            raise inputMismatchError
        isPy3Bytes = sys.version_info >= (3, 0)
    else:
        raise inputMismatchError

    if isPy3Bytes:
        for x, y in zip(a, b):
            result |= x ^ y
    else:
        for x, y in zip(a, b):
            result |= ord(x) ^ ord(y)
    return result == 0

And Go (from crypto/subtle):

func ConstantTimeCompare(x, y []byte) int {
        var v byte

        for i := 0; i < len(x); i++ {
                v |= x[i] ^ y[i]
        }

        return ConstantTimeByteEq(v, 0)
}

func ConstantTimeByteEq(x, y uint8) int {
        z := ^(x ^ y)
        z &= z >> 4
        z &= z >> 2
        z &= z >> 1

        return int(z)
}

My bcrypt example might be rewritten as:

import bcrypt
import sys

inputMismatchError = TypeError("inputs must be both unicode or both bytes")
def constantTimeCompare(a, b):
    if isinstance(a, unicode):
        if not isinstance(b, unicode):
            raise inputMismatchError
        isPy3Bytes = False
    elif isinstance(a, bytes):
        if not isinstance(b, bytes):
            raise inputMismatchError
        isPy3Bytes = sys.version_info >= (3, 0)
    else:
        raise inputMismatchError

    if isPy3Bytes:
        for x, y in zip(a, b):
            result |= x ^ y
    else:
        for x, y in zip(a, b):
            result |= ord(x) ^ ord(y)
    return result == 0

def getDigest(password):
    return bcrypt.hashpw(password, bcrypt.gensalt())

def isPassword(password, digest):
    return constantTimeCompare(bcrypt.hashpw(password, digest), digest)

More reading here and here.

(Keep in mind that I only describe one timing attack: comparing the digests. An adversary may measure response times for different usernames, or determine which hash function or work factor is used based on the time spent processing different inputs. Timing and side channel attacks are a fairly complicated topic, and the above does not “solve the problem” by any means. However, you’re still much better off by using one of the KDFs described even if you don’t spend a lot of time worrying about doing things in constant time.)

Update 2: If this interested you, I strongly recommend taking a look at the history of password security.

Update 3: If you’re planning to upgrade your existing, weak hash digests to something stronger, but you don’t want to leave your MD5 or SHA-256 digests lying around in the interim, you can opt to do something more advanced, then switch users over to the new mechanism the next time they log in.

Update 4: Added Argon2, the winner of the Password Hashing Competition, as an option above.

What's Old Is New Again

From the OpenSSL mailing list:

A potentially exploitable vulnerability has been discovered in the OpenSSL function asn1_d2i_read_bio.

Any application which uses BIO or FILE based functions to read untrusted DER format data is vulnerable. Affected functions are of the form d2i_bio or d2i_fp, for example d2i_X509_bio or d2i_PKCS12_fp.

Applications using the memory based ASN1 functions (d2i_X509, d2i_PKCS12 etc) are not affected. In particular the SSL/TLS code of OpenSSL is not affected.

Applications only using the PEM routines are not affected.

S/MIME or CMS applications using the built in MIME parser SMIME_read_PKCS7 or SMIME_read_CMS are affected.

The OpenSSL command line utility is also affected if used to process untrusted data in DER format.

Note: although an application using the SSL/TLS portions of OpenSSL is not automatically affected it might still call a function such as d2i_X509_bio on untrusted data and be vulnerable.

Thanks to Tavis Ormandy, Google Security Team, for discovering this issue and to Adam Langley [email protected] for fixing it.

Affected users should upgrade to OpenSSL 1.0.1a, 1.0.0i or 0.9.8v.

This was posted in response to a security advisory posted by Tavis Ormandy earlier today.

The funny thing about this is that, not only has this problem been known for a while, it’s included in Mark Dowd’s book, “The Art of Software Security Assessment - Identifying and Preventing Software Vulnerabilities”, which was published in 2006:

An excerpt regarding an OpenSSL exploit from The Art of Software Security Assessment

An excerpt regarding an OpenSSL exploit from The Art of Software Security Assessment

Mark Dowd wrote on Twitter earlier, “I published that bug in our book (TAOSSA) in 2006. I just neglected to mention it was 0day.”

It’s interesting to consider how many security vulnerabilities might have existed for many years—known ones, too. It’s not the first time this happens, either—earlier this year, the algorithmic complexity attacks against the hash table implementations in many different programming languages garnered widespread attention, but that problem was highlighted by Scott Crosby and Dan Wallach at USENIX in 2003! (The hash function was subsequently randomized in Perl, but not many others until years later.)

Security Through Obscurity

People often make the black-and-white distinction, “That’s not security. That’s obscurity.” in contexts where it is not appropriate.

Security is not a boolean concept. There is no such thing as absolute security. You are not either completely secure or completely insecure. You are always insecure — but you can complement your security by reducing the likelihood that there will be an attack against you.

I think that, while obscurity should not be mistaken for a substitute for security (as it is in the original sense of “security through obscurity” in cryptography), it doesn’t hurt in other contexts.

Let me give a few examples which I see come up quite often:

  • Refusing DNS zone transfers (AXFR)

    If you follow the logic that it doesn’t matter how obscure the information about your network is if the network is secure, you should allow anyone to read the DNS zone files for your domains. This is exceptionally reckless if you have any non-public yet publicly accessible hosts associated with your domain.

    An example: Your Apache server serving phpmyadmin at admin.mysubhost.mydomain.com only serves that website to clients that do a GET request including “Hostname: admin.mysubhost.mydomain.com”. When your zone file is readable, this information is readily available to anyone who wants to perform experiments with your network. If your zone file is not available, that hostname could essentially be as difficult to guess as a regular password.

    I think it is a mistake to assume that this would make you “immune” to anything, but I also think it is foolish to argue that by limiting your exposure you are not complementing your security in any way. In the above scenario, access to phpmyadmin should be restricted properly to improve the security, but the obscurity doesn’t hurt.

  • Having critical services (e.g. SSH) on non-standard ports

    Many argue that “if your SSH is secure, which port it is on doesn’t matter” because the range of possibilities is too small to make a difference if someone is intent on getting in.

    While this is the argument that I think makes the most sense—because there’s such a limited range of possibilities that anyone with any conviction would overcome that obstacle with ease—a perhaps not so apparent flaw in this argument is that it doesn’t have to be one person who is intent on getting into your machine. Thousands of machines are scanning not just popular targets but entire IP blocks for open common ports in hopes of finding machines that are susceptible to existing attacks, and more importantly compiling lists of machines that will be susceptible to attacks in the future.

    An example: A person in China is using a machine to scan ranges of IP addresses and comes across yours, connects to port 22 TCP, and notes down the header “SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu6”. In the not so distant future, a zero-day exploit for OpenSSH 5.3 appears which this person gets a hold of. He spends 10 minutes writing a script that will connect to every host in his list then install a small backdoor, and runs it. Your machine has now been owned, and you probably don’t know it.

    (This is not just an example, by the way. It is extremely common for port 22 TCP to be swarmed by traffic from Eastern Europe and Asia.)

    Conversely, had your OpenSSH been configured to listen on, say, port 31382, it probably wouldn’t have made it to the list of our person from China, simply because it’s too expensive to scan 65535 ports on every machine in entire IP blocks (and, in some cases, firewalls block subsequent requests after detecting a port scan in progress).

  • Hiding application names and version numbers in application banners

    Many applications—web servers, mail relays, VPN servers, content management solutions, et cetera—proudly broadcast their names and version numbers to any and all visitors. As in the previous example, this leaves the machine open to the same kind of “TODO-list-making” since it helps identify what software is being used, and whether it is out of date.

  • Keeping password hash digests secret

    Okay, I don’t see this one come up, understandably. But, wait. Why is that understandable for people who argue that “obscurity is not related to security”?

    There is no evidence that hash algorithms like bcrypt and SHA-512 (in e.g. PBKDF2) are breakable by any modern machinery, so why are we keeping password digests secret? (For the sake of argument, let’s say the passwords are relatively complex, and that the digests have their own salt so you can’t just compare them.) Or keeping our private key files for our online banking secret? Surely, hiding them is unnecessary? If our crypto is strong enough we don’t need to worry about it, right?

    This is a scenario where the answer seems more obvious, and you are probably thinking, “Herp derp, Patrick. We don’t need to expose stuff like that for no reason.”

    Tell me: What is the difference between this and the previous examples, exactly?

    (It is, in fact, far more likely that a zero-day vulnerability for an otherwise secure application emerges than it is that one of the industry-standard crypto schemes are broken—so I guess we should actually be hiding everything about our applications and not care at all about our online banking keys.)

    A real world analogy: It’s unlikely that somebody could succeed in stealing your identity if your social security number was freely available on your website. Nevertheless it is generally considered a sensitive detail so you probably wouldn’t put it there. I think the same consideration should go for application names and version numbers, and, of course, password digests.

(A few other, pretty obvious real world examples of when obscurity adds to security are: Camouflage, decoys, and witness protection.)

There are also times where obscurity can be hurtful—where the pejorative does apply—namely:

  • Closed vs. open source software

    If we can’t see the source of the applications we are using, e.g. OpenSSH, how can we be sure they’re secure enough?

  • Cryptographic algorithms

    Similarly: If we don’t know the math behind the hash computation and file encryption we are using, how can we be sure they are secure enough?

In the case of open cryptographic algorithms and open source software we are relying on communities to continously and thoroughly evaluate security aspects of our applications, and, when the “good” community is larger than the “bad” community, it is often a very successful—superior even—approach.

The former and latter are two different debates, though. One should be careful to properly understand the distinction. Just because some respected security figure said something about obscurity in cryptographic algorithms (where it is not good) doesn’t mean you should tell the world everything about your network setup when doing that simply is not beneficial. (Note: Please let me reiterate that I am not saying you should rely on obscurity. You should not take security any less seriously just because you might have reduced your window of exposure.)

When it’s just you and your machine(s), exposing information that simply doesn’t need to be exposed, and then counting on everything being “secure enough”, doesn’t help your security.

There is a reason black box penetration testing is harder than white box penetration testing.

Update: Just to be clear, I am absolutely not suggesting you share your cryptographic hash digests with anyone. It was a tongue-in-cheek example to demonstrate the fallacy of the “exposing it doesn’t matter if it’s ‘secure’” attitude.

LastPass Disclosure Shows Why We Can't Have Nice Things

A few days ago, LastPass announced they would be forcing their users to change their master passwords in response to what was essentially “something weird”:

We take a close look at our logs and try to explain every anomaly we see. Tuesday morning we saw a network traffic anomaly for a few minutes from one of our non-critical machines. These happen occasionally, and we typically identify them as an employee or an automated script.

In this case, we couldn’t find that root cause. After delving into the anomaly we found a similar but smaller matching traffic anomaly from one of our databases in the opposite direction (more traffic was sent from the database compared to what was received on the server). Because we can’t account for this anomaly either, we’re going to be paranoid and assume the worst: that the data we stored in the database was somehow accessed.

LastPass acted exactly like we wish most companies would act: responsibly. And the media’s response? Declaring LastPass “hacked” and “vulnerable”, and placing them in the same category as Sony—who definitely were hacked—with sensationalist headlines like:

  • WARNING: Your Web Browser’s Master Password May Have Been Stolen – Change It Now
  • LastPass Has Been Hacked And Asking Everyone To Change Their Master Passwords
  • LastPass Hacked, Change of Master Password Urgent
  • LastPass Is Hacked – Change Your Master Password, But Don’t Panic
  • Should the LastPass, Sony hacks make you fear storing data in the cloud?

LastPass announced nothing more than that their recent statistics looked strange, and because of that they wanted to stay on the safe side just in case there was a breach—although that was unlikely—and the press responded exactly as it would if LastPass had been caught trying to cover up a certain leak.

(In the worst case scenario, a breach of LastPass’ data would reveal nothing more than master password hashes that are virtually uncrackable if the original password has just minimal complexity. Everything else, including information about individual websites and passwords, would be nothing more than an encrypted blob, the contents of which are inaccessible without the original password.)

You can argue if it’s wise to store your passwords online, but at least treat the few companies who act right right.

By acting the way they were supposed to, LastPass only hurt themselves — and that’s why we can’t have nice things.