I recently paid for something online using what I considered a secure online payments processor, and they asked that I provide a password to create an account to complete the transaction. You will understand in a second (if you don’t already) why I was so angry when, a few seconds later, I got this:
I couldn’t believe it. Please enter a shorter password.
Why does this make me mad? Because it means one of three things:
- You’re storing my password in plaintext, so its size in your database grows as I increase its length
- You’re encrypting the full text of my password using something like AES-256, and potentially using the same key for all passwords, instead of cryptographically hashing it, again resulting in a greater need for storage capacity with length, and potentially worse security
- You don’t understand how hashing works and/or are just too lazy to support large passwords even though there is little overhead to doing so
Hopefully you can see why I wasn’t impressed.
You’ve probably seen this xkcd comic before:
Ars Technica did a piece a while back where they gave a list of 16,000 MD5-hashed passwords to a group of crackers to see how many they could break. The most successful one was able to crack roughly 90% of the passwords. The big takeaways from this article were that:
- Most people use really terrible passwords
- Even the passwords most people consider “good” passwords are breakable
- Something needs to change in how we think about passwords
Some of the passwords recovered were things like “momof3g8kids”, “Coneyisland9/”, and “qeadzcwrsfxv1331”. Clearly the users who made these passwords were trying for something a little harder than “password”, and in years past they might’ve been safe, but we live in a time when it’s trivially easy to spin up cloud servers with specialized GPUs in them to churn through huge numbers of attempted passwords per second. Here’s a quote to give you an idea:
…Gosney’s first stage cracked 10,233 hashes, or 62 percent of the leaked list, in just 16 minutes. It started with a brute-force crack for all passwords containing one to six characters, meaning his computer tried every possible combination starting with “a” and ending with “//////.” … It took him just two minutes and 32 seconds to complete the round, and it yielded the first 1,316 plains of the exercise.
This attack had a keyspace of 742,912,017,120 possible passwords, and he chewed through them in a trivial amount of time on a specialized system. So anyone with a 6 character password is completely screwed. At least we can’t use a 6 character password on our payments site. What about an 8 character password? The next attack was all 7-8 character passwords with just loweralpha characters (26 total characters, 26^7+26^8 keyspace), and this took about 41 seconds. It starts to get much harder around the 9 character mark, so that’s when wordlists come into play.
Dictionary attacks have been used for a very long time, and hybrid dictionary attacks almost as long. This is the process of manipulating all the entries in a dictionary file using certain predefined rules, such as adding words together, tacking common years to the end of passwords, or replacing spaces with dollar signs. Again, as some of the revealed passwords show, even using a couple words in your password isn’t necessarily enough. The “correct horse battery staple” example is pretty good, but obviously that exact password is now probably a part of every attacker’s dictionary, which reenforces the necessity of choosing truly random words.
Even with hybrid dictionary attacks, the cost of stringing together lots of words with lots of string manipulation is still very high. Let’s say your attacker has done a lot of research on you, and they’ve put together a dictionary file of 10,000 words they suspect might be in your password, along with some other common ones like the/and/of/etc. This is a very small dictionary in some ways (one attacker favored a 111 million word dictionary), and a large one in others, as illustrated by the difficulty of combining 4 words together like the xkcd example (a keyspace of 10,000^4). Add to that permutations of spaces, and you have a huge number of passwords to try. Because it’s such a specialized list, it’s also much less valuable to precompute rainbow tables for every permutation, since you don’t expect it to work for future users. But even that’s not a problem, because our site is using randomized salts for each user. Right?
Now, granted, you could be up against someone with a lot of computing power. Let’s up the number of words in the dictionary and in the password. Let’s say the first attack fails because you choose some really obscure words (or misspell some words intentionally), and the attacker decides to generate a bigger wordlist, or use more permutations. Let’s say they bump up the wordlist to 100,000 and the number of words to 5. We’re rapidly growing the keyspace (now 1e25), without growing the difficulty of recall very much. This is 13,460,544,141,911.1 times larger than our original attack of all passwords with length 1-6, which if scaled linearly, would take 64,878,320.3 years ((((100000^5)/(742912017120))*(2+(32/60)))/60/24/365). Throw some extra machines at it, why doncha? It won’t make much difference.
And now we come back to the 8-20 character limit on my password. To get 5 words into 20 characters, I would be severely limited in my word choices and spacing patterns. Let’s take a look:
(1310 3-letter Scrabble words) ^ (5 words per password) = 3.85794897e15
(5526 4-letter Scrabble words) ^ (5 words per password) = 5.1529319e18
This isn’t terrible, but it drastically reduces the number of likely password phrases far below that, because people like passwords that are easy to remember. What if everyone used a number of arbitrary-length words? Well, if we used that 111 million word dictionary..
(111 million word dictionary) ^ (4 words per password) ~= 1.5e32
(111 million word dictionary) ^ (5 words per password) ~= 1.7e40
(111 million word dictionary) ^ (7 words per password) ~= 2.1e56
And that’s without elaborate spacing patterns (what if every other word was delineated by the word “horse”?). Our hacker is basically screwed. Their next attempt would be to figure out what words people are most likely to use, but as long as you have a reasonable amount of entropy (i.e. don’t use quotes by Aristotle over and over again), this would be much more random per-person, and would yield the hacker significantly lower returns for their time. Once you have the attacker trying to do machine learning to get n-grams or spacing patterns that you’re likely to use in your password, you’re in good shape.
So now we come back to the site from the beginning of this rant. In a question answered on their forums, one of their admins claims there is no maximum account balance on their site, and that users are free to “fill it as full as [they]’d like”. If only they took the same approach to password policy. Of course they want you to store as much money as you possibly can with them. But don’t they want you to be as secure as possible too?
There are clearly benefits to having no maximum password length, or at least a much higher one than is implemented here. Why might our friends have decided to use maximum password lengths? What are the limitations they’re running up against? Let’s start with our hashing algorithms. SHA-256, SHA-1, and even MD5 have maximum input sizes of about 2048 petabytes, so that’s not our limiting factor. I’m assuming (with my fingers crossed) that they’re using a hash function or key derivation function (KDF) that produces uniform-length output, so the length of the password doesn’t change the length of the hash and disk space isn’t a limiting factor. If they were to do the crypto client-side, the bandwidth, memory, and CPU costs wouldn’t even be higher. But that’s fraught with its own problems, so we won’t go that way.
Let’s say they do the crypto server-side, and I’m being a bit of a dick, so I choose a 1 megabyte password. It’s unlikely, but it could happen. Let’s pretend everyone did that. That might seem like a lot of bandwidth, until you see that their front page alone is about 5 MB. I care much less about your marketing videos, and much more about my security, so if you have to scrimp on bandwidth to handle my password, I’m giving you permission to go back to marquees and blink tags.
How about memory in our application? PHP’s default max memory allocation is 128MB, so I think we’re probably good there. What about the HTTP server? Hmm, nginx’s default max client body size is 1 MB. They could obviously change this, but let’s say they don’t – shave off a generous half for other information they’re storing in my cookies or signup request. I’m cool with a 512KB password.
What if they’re using bcrypt? there is (potentially) a 50 character limit to the input size. Shoot. Oh, wait, we can just SHA-256 it first (32 characters), then pass that to bcrypt, scrypt, PBKDF2, etc. Doing pre-imaging on all valid outputs of SHA-256 is… highly unlikely, so I’ll take the potential length hit there. It’s much more likely that the attackers would use some type of hybrid dictionary attack than a naive bruteforce of the hash at that point.
The salt length recommended in that post is 128 bits (16 characters). In fact, another article specifically about how to store passwords securely recommends using a salt the same length as the hash, so 64 characters for SHA-256. (Please let them be using a salt. Please let them be using a salt. Please let them be using a salt.)
Wait a second. That means that either they’re using a less-than-perfectly-secure salting policy, or we’re already generating a thing that’s bigger than the current maximum password length, and storing it in the database. The salt would actually be providing more entropy than our password, if it weren’t already saved in the database in plaintext.
So what do we have left? Maybe a denial of service attack? I’m guessing these guys run firewalls that can ban IPs, and an IDS/waf that can detect generic POST requests with a megabyte of data in them. If not, why not just use that as an attack vector now? Why not just download that video from the front page a million times?
Maybe they run really old hardware in some old brick and mortar location that might need for some reason to hash your password at some point? Wait, nope. No brick and mortar locations anywhere.
As you can see, I’m really stretching to find any reason to enforce a maximum password length. At some point it comes down to my password potentially being a ridiculous size that no one would ever actually use, and exhausting the server’s memory or bandwidth. But even at that level, there are plenty of counter-measures that they could implement, and would implement, if it meant they could blast more shiny marketing videos at you.
So we’re just left with apathy. I don’t imagine everyone reading this is a security expert, but I do imagine that when you think about the security of your money or possessions, the word “apathy” doesn’t come to mind. When I read about 90% of a database dump getting cracked, I realize that it’s time for a new approach to passwords.
We should be encouraging our users to use outrageously long passwords, not restricting them to 20 characters. The minimum could reasonably be 20. It probably should be.
Addendum, 1/28/20: Ideally, you should be using a password manager to generate unique, random strings as passwords, and never reusing them. The advice in this post is applicable to passwords that you must remember, outside your password manager.