Xkcd Password Generator

Dove · on Aug 11, 2011

I find the discussion surrounding the XKCD strip alarming for the superstition it reveals about password generation. The particular theme I am alarmed by is that people seem to think that if a password looks alien, or was difficult for them to come up with, it will be hard for a machine to guess.

Look, we're working with big numbers here. You need to do the math.

In this thread alone, I've seen suggestions to use a common dictionary word translated into another language, or written in l33tsp34k with some permutations. From a probabilistic perspective, these are still dictionary words, even though they look like gibberish. The same is true of the common method of typing a word with ones fingers displaced on the keyboard.

Conversely, I see a lot of argument that these XKCD passphrases would be easy to guess because they are made up of dictionary words. This misunderstands the math behind the situation. Even if an attacker knows that your password was generated via this method, and even if they know the word list you used, the password is still hard to guess. The difficulty grows exponentially with each word in the phrase, and that's pretty fast.

The key with passwords is not to create something that looks random -- something that if you showed it to another human being, they'd have a hard time deciphering. It's to create something that is random; literally a result of a throw of the dice for every new password.

Human beings are really bad at creating randomness. There's a demonstration done in an early statistics class in which the professor divides the class into two groups. He tells one to toss a coin a hundred times and record the sequence of heads and tails, while the others are to write down a sequence they think is random using their imagination. The papers are completed and mixed and then -- magically! -- he is able to sort them into the two types, easily and with high accuracy.

The lesson is this: even when you think you're being random, you probably aren't. You're probably using the same tricks everyone else is, and making the same mistakes.

I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.

Cushman · on Aug 11, 2011

This should be higher up. It's scary to see people — intelligent people, I'm sure — saying things like "And that goes even higher when you add punctuation!"

No, it doesn't. All of the reasonable punctuation you could add to a sentence adds only a few bits of entropy at best. It also makes the sentence harder to remember— was there a comma or not? Adding unreasonable punctuation or symbols is even worse— you get slightly more entropy at the cost of a password that is way harder to remember.

The crucial point here is that four random words, separated by spaces, selected at random only from the 2000 most common English words — EVEN IF your attacker knows that your password is four random English words from the 2000 most common separated by spaces — already is a very long random string. If it's not random, each common English word you add adds 11 bits, and is only marginally harder for most English speakers to remember. Conversely, choosing "random" extra characters to add in makes it slightly longer, very slightly more random, and way, way harder to remember.

a3camero · on Aug 11, 2011

It's certainly a "very long random string" without context but as people have pointed out above, it's actually not a very good password if people adopted this pattern widely (and you said the attacker knows this).

2000^4 = 16000000000000 possible passwords = 1.6E13 = ([A-Z] + [a-z] + [0-9] + [!@#$%^&()])^7.1ish. So, your four words from the 2000 word list are equal to a 7ish character password that looks like "Av#12GH". I'm not sure if you meant that seven characters was "very long" but I wouldn't say it is. Still a very strong password but maybe not as random as it appears to be when the pattern is known.

Cushman · on Aug 11, 2011

My point was that adding a character to something like "valve tangle hastens accept" is like adding a couple bits to something like "Av#12GH", and yet people feel like it's accomplishing something valuable. They feel that way because "Av#12GH" looks random, but "valve tangle hastens accept" doesn't.

Obviously the literal length of the string is not strictly relevant, and it was probably inarticulate of me to include that.

Dove · on Aug 11, 2011

Knowledge of the pattern has nothing to do with it. That 2048^4 figure is what I mean when I say such a password is strong, and such a figure presumes the attacker knows what system I am using.

Recall that since the passphrase is randomly generated, that 2048^4 is the true probability of guessing it--all the elements of the set are live possibilities. To compete on equal footing, a seven character password must also be randomly generated.

A password is not necessarily strong simply because it spans a large character set. "Sp1d3r!", for example, may as well be a dictionary word. Raw length "spiderspiderspider" is not necessarily helpful either. Randomness is what you need.

j_baker · on Aug 12, 2011

And yet the point is moot because no one is going to use a password like "Av#12GH".

bigiain · on Aug 12, 2011

I have a password file here with several hundred passwords just like that (actually, they're all 12 chars with upper/lower case, digits, and "special chars", as chosen and stored by 1Password...)

Joe Public is unlikely to use passwords like that, but I'm 100% sure I'm not the only hackernews reader who does.

keithnoizu · on Aug 11, 2011

This is not entirely correct, we include non alphanumeric characters (punctuation) in our passwords occasionally because it increases the solution space for a brute force attack.

   While this doesn't really improve any individual password the fact that we occasionally include non alphanumeric charecters increases the possible password set from 62 possible charecters ^ password length to a something more like 90^password length.  

  Similarly we dictionary attacks are more efficient than brute force attacks because we're talking maybe 200,000^(words in password) if we allow for some common word permutations versus 90^passwordlength.

kragen · on Aug 12, 2011

You seem to believe that you are saying something different from the comment you are replying to, but actually you seem to simply not understand it.

keithnoizu · on Aug 14, 2011

are you kidding, i'm pointing out the historical reasons for why we add punctuation chars in our passwords, as it directly impacts the solution space a brute force attack needs to cover.

kragen · on Aug 15, 2011

No, I'm not kidding. And the comment you were replying to already explained that.

jcr · on Aug 11, 2011

I resent your accusation that I use gibberish for my passwords; I actually use perfectly well formed executable code in perl.

nakkiel · on Aug 12, 2011

Had you used brainfuck, I would have downvoted you.

dspillett · on Aug 12, 2011

> I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.

Definitely agree with you here.

I've been using the "few random words" method for passwords I need to remember for some time (and random 20 character mixes of alpha/numeric/symbol for the other, which I have stored in a keepass db), and I know I'm not all that random in my choice of words so if someone managed to see one or two of my passphrases it would be quite easy to create a script that could brute force the other couple quickly.

I shall have to use a script like this (or throw together my own for paranoia's sake) next time I change one of my passphrases.

Ideka · on Aug 11, 2011

May I ask how did you find out about that study?

It sounds very interesting. I've got to try it sometime :).

Dove · on Aug 11, 2011

You mean the statistics demonstration? I'm sure I've seen it in several places.

I know of two tricks for detecting the students. The first is to look for six or seven heads or tails in a row. Over a hundred tosses, a coin will probably do that, but humans "being random" won't. The other is to look at the page as a sequence of "HHH" and "TT" strings and estimate how many there are. A coin, of course, changes from heads to tails 50% of the time, but a human does it more like 70% of the time.

I'm sure there are other characteristics, too, but those two are sufficient to throw out most human attempts at a glance. It's actually kind of obvious, when you see the two side by side.

Me "being random" with the numpad: 10110101001010010101011010100101011010100101010110101

Computer-generated random: 1101110100000000011111110111011000010001010110011111

See? Here's a few more. Try it.

1000100111011001010000001001010110111000011011101011

1101010001010010101110100101000101010111101001001000

1101111010110111100010110001000100001001111001001110

1100000010000010101001000001101001101011111100111001

Cushman · on Aug 12, 2011

Cool! Makes sense, too— it feels unrandom to sit there hitting one key a bunch. How do you know when to stop?

So here's what I got: figure I have a bias to switch keys. That means that 01 and 10 are more common than 11 and 00. So what I need to do is group 01 with 00, and 10 with 11. What I do is generate twice as many bits as I need, treat the string as a sequence of two-bit pairs, and reduce each one to its first bit. That looks like this: 01011110000010111000010011111011111111000000111010110001111000111100111100001100011010011101

Gets you past the litmus test, but looks like it goes too far the other way? Hard for me to tell, actually.

Another thing would be something like a sequential xor of each bit in triples (i.e. 010 -> (0 ^ 1) ^ 0 = 1), which segments triples across probability like so:

     -       +           +       -
    000 001 010 011 100 101 110 111
      0   1   1   0   1   0   0   1

You can do that quickly by counting the 1s— 1 or 3 is 1, 0 or 2 is 0. That looks like this: 00011011011011000000111111001011000101010000100100110011011010

I don't know if it adds much more (apparent) entropy, though.

Data:

    011101101010100100010100100010111001010100110000101011101101110
     0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 1
      0  0  0  1  1  0  1  1  0  1  1  0  1  1  0  0  0  0  0  0  1

    111010111010100101000101011110100011011011010001111011110101001
     1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0
      1  1  1  1  1  0  0  1  0  1  1  0  0  0  1  0  1  0  1  0  0

    101110100101101010110101010001101101010010110111010110101001110
     1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1 0 1
      0  0  1  0  0  1  0  0  1  1  0  0  1  1  0  1  1  0  0  1  0

ddlatham · on Aug 11, 2011

A lot of comments here seem to be missing the point.

The main point is to use passwords that give you the most "bang for the buck" in the sense of adding the most bits of entropy for the least difficulty of remembering. Adding an extra number, or punctuation, or certain numbers of repetitions generally adds only a little bit of entropy for a significant cost in additional challenge to your memory.

Our minds are well suited to remembering combinations of common words, and by stringing a few such words together, you can generate a larger search space than using a single word with a few substitutions. Even if the attacker knows the scheme you're using, he still must search through the space of combinations of common words, which XKCD is pointing out is quite large.

mortenjorck · on Aug 11, 2011

I've started using song lyrics when given the option of an extra-long password. I can get a very long string with little effort, and it's trivial to remember.

The best part is that any automated attack would have to deal with ringtone popups.

gjm11 · on Aug 11, 2011

Be aware that adding to the length simply by taking more of the lyrics adds very little entropy. If you're trying "Oh say can you see" then it doesn't take a lot of extra bits also to try "Oh say can you see by the dawn's early light what so proudly we hailed at the twilight's last gleaming".

Similarly, extended passages of text -- even if they don't come from a restricted corpus like that of song lyrics -- have less entropy than you'd think. A smaller number of independent random words is likely to be a better tradeoff.

benmathes · on Aug 11, 2011

I can see your point in that the kolmogorov complexity of two lines in a song isn't much larger than one line. Similarly, 30 digits of pi and 300 digits of pi have very little difference in kolmogorov complexity.

What I don't know is if state-of-the-art password guessers are great at recognizing larger patterns in the entire canon of human knowledge. I.e. is there a "common phrases" attack that's analogous to a "dictionary attack"?

jfriedly · on Aug 12, 2011

Google released the world's largest corpus and did us a favor by analyzing it for n-grams. For example, they found that the phrase "serve as the initial" was over a 100 times more common than the phrase "serve as the insurance". [1] For $150 you can buy the 24GB data set yourself, so it's a fair assumption that makers of password crackers could reliably guess common phrases first. [2]

[1] http://googleresearch.blogspot.com/2006/08/all-our-n-gram-ar... [2] http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=...

singlow · on Aug 11, 2011

If these types of passwords become popular, brute force crackers will build dictionaries of well known phrases.

spydum · on Aug 11, 2011

That may be true, but we still end up better off. The compute time for the password cracker has gone up quite a bit, making it a more expensive endeavor (they've got to build dictionaries for both WKP's and passwords with fuzzing). It doesn't solve the problem, but it's a start in the right direction (away from fuzzing of dictionary words, which is clearly bad for human memory, and good for password crackers efficiency).

However, when using randomly chosen dictionary words to build phrases (not well known), the entropy shoots well above the level of being reasonable to crack in a lifetime.

nait · on Aug 12, 2011

Given that the knowledge about correct parts of a password based on known sources (pi, peace and war, song lyrics etc) drastically reduces the amount of possible solutions. But how would an attacker figure out the first part of such a password? What comes to mind are timing attacks http://en.wikipedia.org/wiki/Timing_attack What other possibilities did I miss?

EDIT: I get that having a long streak of my pass in a dictionary would reduce overall security but it's still unclear how a partial match in the dictionary would be detected.

ScottBurson · on Aug 11, 2011

But there's a long tail of song lyrics. If you pick something obscure, the odds of the attacker even having heard of it become very small (particularly if the attacker is from a different culture than your own). Pick something arty and incomprehensible, and the odds against someone else accidentally stringing those words together in some other context become astronomical.

For instance, I'd wager no cracker has ever heard the song containing the line "We barter images on the matrix". And that's one of the more intelligible lines from the song in question (from a 1978 album by the little-known prog-rock group Happy The Man). Pull it up on Google and you'll see what I mean.

If you don't know the song, of course, lines from it will be about as hard to remember as randomly chosen words. But if you do know it, you have a good mnemonic.

MichaelGagnon · on Aug 11, 2011

This gets into the whole "security through obscurity" thing. Ideally, you should use a password-generation system such that if the attacker knows your pasword-generation system (e.g. lines from songs) it would still be infeasible to guess your actual password.

Thats why the 4-random-words technique is good. According to XKCD, the 4-random-words technique generates about 17 trillion passwords---all equally likely.

But even with a long tail, song-lyric passwords relies on obscurity. I imagine there are much fewer than 17 trillion songs to choose from. And if the attacker knew some information about you (say from looking at your Facebook profile or your search history) I'm sure it could drastically weed out the search space.

JonnieCache · on Aug 12, 2011

The answer is obviously to write your own song or poem and not tell anyone about it. A passpoem, perhaps in the style of Lewis Carrol.

LXicon · on Aug 11, 2011

there might not be 17 trillion songs, but you aren't limited to the first 4 words of the song. there might be 100-300 words per song and you can pick your starting word anywhere you like.

Cushman · on Aug 11, 2011

But it falls into the same boat as any dictionary attack. Most people with a passphrase are probably going to use one from a song. 90% of them are going to use one of the top 1,000 songs, 90% of them are going to start at the beginning of a line. If we say there are ~20 unique lines in the average song, and most people won't use more than ten successive words even if it bridges a line, that's 1000 * 20 * 10 = a keyspace of 200,000. Trivial.

What this means is even if you decide you're going to be really secure and pick, say, the 30,000th most popular song, assume all songs have 200 unique lines (to account for sensical starting points in the middle of lines), and use 20 words from it, you're in a keyspace of only 120 million, which even if it takes 1ms to hash will be cracked in a day.

By contrast, four random english words chosen from the 2,048 most common has a keyspace of ~1.75e13, or 17,500,000,000,000.

Choosing a clever, unusual line from the middle of a very uncommon song is the passphrase land version of choosing a rare English dictionary word and replacing the vowels with numbers. If your hash gets compromised, it might as well be "password".

moe · on Aug 12, 2011

There's an easy way to defeat this:

   smellz like T33N SPIRIT!

Trivial to memorize. Unlikely to brute force.

I use phrases like that for the few locations where password managers don't reach (i.e. the password manager master password).

Cushman · on Aug 12, 2011

>.<

How is this an improvement? I now have to remember a song lyric, and some set of random manipulations of that song lyric. I've used that trick for passwords before, and it was a hassle. But that doesn't even matter— unless you're choosing the manipulations randomly (which is a contradiction in terms) you're falling right back into the exact damn trap the comic was about!

You've added ! at the end, replaced s with z, capitalized some words, and replaced vowels with numbers. These are already standard manipulations in a dictionary attack. And it's causing you to ignore the fact that you've chosen what is probably among the top 10 song lyrics used. "p4ssw0rd!" is "password" as far as a dictionary attacker is concerned. Calling this trivial to brute force is demeaning to the word "trivial". Your attacker wouldn't even laugh at you, because there'd be dozens of other hashes in the file just like yours.

It's been said over and over in these comments: the appearance of randomness is not randomness. Humans are horrible at making things random, as you've just demonstrated. Stop trying to make it look weird, and actually do the math.

moe · on Aug 12, 2011

It's fairly easy for me to remember those manipulations. But you're right insofar that this would probably be both safer and easier to remember:

   Smells like teen spirit, and I like that plenty mucho!

I'm too lazy to do the math on it, perhaps you can help out?

Edit: It's a little annoying to collect these downvotes from people who either haven't done the math themselves or are too lazy to explain their advanced attack methods.

In my naive opinion my string above is at least equivalent to a 12 character password from a set of "Mixed upper and lower case alphabet plus numbers and common symbols.".

I count each word (10) and both symbols (,!) as a character here.

According to [1] an 8 char password of that type would take 83½ Days to crack in a Class-F attack ("supercomputer"). I'm purely guessing that those additional 4 "chars" should put it well into the multi-year range, under the premise my other assumptions are not too far off and that the number of english words is quite a bit larger than the number of ascii characters/symbols.

Any of the downvoters care to debunk that with real math?

I'd be honestly curious about a worst-case analysis that assumes the fragment "Smells like teen spirit" does appear in the attackers dictionary.

[1] http://www.lockdown.co.uk/?pg=combi

Cushman · on Aug 12, 2011

Yeah, that's what I was getting at. Something like that is pretty much immune to naive brute force, even if we count "Smells like teen spirit" as a word. My guess would be that if it does get cracked, it would be by searching [lyric]+", and"+[some kind of Markov attack], but I honestly have no idea how one would work out the entropy in that model. It depends a lot on how the search is carried out, I think.

I guess we'll find out when passphrases become common :)

alanh · on Aug 11, 2011

What happens when your obscure song makes the soundtrack for a hit movie next summer?

akronim · on Aug 12, 2011

given how prone people are to mis-hearing song lyrics, the corpus isn't the full text of all published song lyrics as you suggest.

gabaix · on Aug 11, 2011

I make less typing mistakes with shorter, complicated passwords.

Long passwords are typing-error prone. With mobile devices, it gets worse, as typing is really painful.

ddlatham · on Aug 11, 2011

I've had the opposite experience, where I'm more likely to mistype passwords with mixed case letters and symbols holding the shift key down too long. With mobile devices, it gets worse, as common words are easy to type, but symbols and mixed case are a pain.

It is a good point, though, that for frequently typed passwords on a good keyboard, you can engage your muscle memory. That allows you to type short passwords very quickly, and even remember passwords with your fingers that you've lost from your conscious memory!

singlow · on Aug 11, 2011

Thats how I can remember 25 character passwords with symbols and mixed case. I am not thinking "43#gj(eO3%". I'm thinking "4-4-shift3-g-j-shift9-e-shifto-3-shift5". So basically you have about 47 keys with nice characters and they each have two of them.

Just use a random password generator with those 47 characters and type it 25 times into a notepad to pound it into your muscle memory. (and if you're paranoid, clean up your memory and swap file)

For the ones you don't use every few weeks, keep them in a password database like KeePass with 2 factor authentication and keep the key file on a thumb drive on your keychain.

gabaix · on Aug 11, 2011

mmm You got me thinking. You're right, both look error prone. Typing errors increase as it gets longer or more complicated.

lukev · on Aug 11, 2011

I don't think entropy is the whole story. I would argue that although security-through-obscurity is a terrible, awful practice for systems, it's not that bad for personal password schemes. Using a nonce "system" for passwords, even if it's mathematically low-entropy, is still secure, at least enough for personal use.

For example, if I use single dictionary words fed through a trivial ceasar cipher, then that is mathematically very low entropy. Realistically speaking, however, it's relatively safe if the cracker doesn't know that's what I'm doing, because it's impractical for crackers to compute all possible low entropy "alternative dictionaries."

khafra · on Aug 12, 2011

You'd be surprised. JTR does l33tspeak substitutions, one-row-up substitutions, keyboard walks, pretty much all of the common things everyone does because "no hacker would ever think of that."

eLod · on Aug 11, 2011

i think you are missing the point: passwords should be hard to guess first and should be easy to remember second. the former is the stronger need.

let's say there are 500.000 english words you are choosing from and you use 4 words. that gives you 500000^4 possibilities. let's assume the words averages about 5 characters, so we will compare this to a 20(=4 words * 5 characters) character long password made of 26 types of character (english alphabet, not using numbers and other special characters), that gives you 26^20 possibilities. and 26^20 - 500000^4 ~= 2x10^28, or put it this way: (26^20) / (500 000^4) = 318 850.382..

i know a random sequence of 20 characters are very hard to remember, but 500.000 is an overestimation too. let's say we use special symbols too (50 characters) and the word dictionary has 100.000 words. (50^12) / (100 000^4) = 2.44 so we can say it is better to have a 12 character long password (made of alphanums + symbols) than 4 random word concatenated (i think 12 is somewhat a 'standard' for 'sensitive' passwords). and i would argue that on the long term multiple concatenated passwords are very hard to remember. i'm not saying this is a terrible approach, just not the silver bullet to the 'password problem' (which xkcd never claimed of course, and for 'non sensitive', 'reused'/'throwaway' passwords it may be a viable option).

edit: and i forgot about case sensitivity too.

ddlatham · on Aug 11, 2011

I'll grant you that it's more important to have a password be difficult to guess, but that doesn't obviate the importance of it also being easy to remember. Even better, let's look for password schemes that are harder to guess and easier to remember at the same time.

It's easy to compare the entropy of two schemes as you're doing in your comment, but it's more difficult to objectively claim which is easier to remember. You argue that a random sequence of 12 characters is easier to remember than 4 words. If so, then I'd agree it would be a better scheme. However, I don't think that's the case. To really settle the argument, we should do some experiments - maybe someone already has?

Here's some examples I used a generator to create:

gangster insert madden quartic

overlong cage figurine hardship

trimmer wholly movie nadir

Bt].iu@0Soc*

Vf+pIW;C>\vp

'.}]Ba,g%@vI

Which do you think are easier to remember?

dpark · on Aug 11, 2011

Are you serious? You think it's easier to remember 12 random characters than to remember four English words?

g6M;`Zt3^,d" vs selected aardvark badminton winnings

The way the human brain works, it would be at least as easy to remember 12 random words as 12 random characters.

barrkel · on Aug 12, 2011

The trouble is that by human intuitions, you think there's a strong inverse correlation between being hard to guess and easy to remember; but that's not always the case.

It's hard for humans to remember meaningless conjunctions of symbols, so we think they are hard to guess; so we err on the side of making them too short. Contrariwise, we think a sequence of just four words couldn't possibly be hard to guess because it's so easy to remember, but it's only easy to remember because we can use the meanings of the words to form an idea or image, something our brains are built for (unlike strings of meaningless characters).

This is why the word technique is better: it corresponds better with how we remember, while reducing two other risks: the risk of losing your password - non-trivial - and choosing too short a password.

eykanal · on Aug 11, 2011

Don't forget spaces. And Poland.

Another point is that letter placement within words is significantly non-random. By intelligently choosing which letters to try in each position, the hacker could at the very least minimize the number of tries by an order of magnitude for the first word.

artmageddon · on Aug 11, 2011

I probably shouldn't announce, in a forum, that using Don't Forget About Poland! as a passphrase seems like an awfully tempting for someone like me :)

(American by birth, Polish by heritage)

Speaking of the example I just presented, how much more effective would it be to include special characters within these long passphrases? Obviously the goal is to be able to remember them, but surely most if not all of us, are already using special characters for our passwords.

pyre · on Aug 12, 2011

When counting the entropy you would probably count each word as a single entry, and each special character as an entry (and disregard spaces).

* By capitalizing the words you've doubled the search space for words (assuming that the search space starts with all words lowercased)

* You could increase the search space for each word by 200% (from the space of all lowercase words) by including the possibility of words in all caps (it's unlikely for people to start using alternating case in the middle of words).

* The ' in "Don't" doesn't increase the search space that much because there are a small number of (common) contractions like that, and each of them would only break down into 3 permutations:

  don't
  dont
  don t

(though the last one is highly unlikely). So you're adding maybe 30 more words to a search space much large than that.

* As far as the special character is concerned, it probably doesn't add too much to the search space. You can break down your phase like so:

  Don't Forget About Poland!
              ||
              \/
  {item} {item} {item} {item}{item}
              ||
              ||  Disregard whitespace (acquire entropy!)
              \/
  {item}{item}{item}{item}{item}

So now you've got 5 items. Each item could be either a word or punctuation. The search space for words is huge. The search space for punctuation is small. Your algorithm just has to realize that if it chooses punctuation for one of the items, then it doesn't bother to use whitespace to separate it from the preceding word ("word," vs "word ,").

* You can also further reduce the effects of punctuation on the search space by realizing that punctuation will almost always follow a word, and not other punctuation. This also discounts punctuation as the first item in the passphrase too.

Edit:

Upon further though, if the attacker uses a simplified algorithm to account to upper-/lowercase, then it may not have that much of an effect on the search of each individual item (i.e. n!4 instead of (n+4)!). An attacker could break the common instances of case down into:

  * All words lowercased  "don't forget about poland!"
  * All words uppercased  "DON'T FORGET ABOUT POLAND!"
  * All words titlecased  "Don't Forget About Poland!"
  * First word titlecased "Don't forget about poland!"

This discounts the possibility of people alternating titlecase across words, because that's probably as likely to happen as people alternating case within words (e.g. WoRdS lIkE ThIs). Granted, this also discounts proper nouns in the middle of the passphrase (things that don't require extra effort for people to remember to capitalize).

zobzu · on Aug 11, 2011

171K words in the english language 4 words no spaces 171k^4 vs 255^8 for a 8 char pass

ddlatham · on Aug 11, 2011

First, that's still beside the point. You shouldn't evaluate a password scheme solely by entropy if it is a password you intend to memorize. XKCD argues that it's easier to remember 4 random words than 8 random characters.

Second, your example isn't very good because it assumes that every 8 byte character (save one) is acceptable, which is rarely the case, especially if you are trying to memorize them.

Finally, as another commenter pointed out, you've got your math wrong, and even your example has more entropy for the words than the characters.

drcode · on Aug 11, 2011

Incorrect: It's 171k^4 and 255^8.

(which works out to 8.55E20 and 1.78E19)

pak · on Aug 11, 2011

Yep, and that's assuming 8 random bytes from extended ASCII. The other point of the article was that nobody actually makes a password from random characters because words are easier to remember. And I think it's disingenuous to suppose people will enter alt-codes and that nonprintable characters would be allowed, so assuming MENSA-quality users with internal random number generators, we get 95^8 ~= 6.6E15, a clear loss of entropy.

kijinbear · on Aug 11, 2011

Actually, since you normally can't use anything but characters in the 0x20-0x7E range, the 8 char password has much less entropy: 95^8 ~= 6.63E15.

I love the backtick in my passwords. If a website accepts it and doesn't give me any issues, it's a decent indicator of basic security.

pbhjpbhj · on Aug 11, 2011

>the 8 char password has much less entropy: 95^8 ~= 6.63E15 //

Most of the word usage is going to be limited though too. testyourvocab.com put the average at 27k I think. We're looking for words one can remember easily so the word pool is going to be a lot lower - 15000^4 ~= 5E16 FWIW.

drcode · on Aug 11, 2011

Right- I was just correcting the dude's math :-)

zobzu · on Aug 11, 2011

hey it was an honest typo

zobzu · on Aug 11, 2011

yeah thats correct

bnegreve · on Aug 11, 2011

No, the single word password is based on a dictionary word with some chars replaced by other visually similar chars. That's much less than 8^255.

zobzu · on Aug 11, 2011

real complex passwords are more like '"^vmds!w*é$sé550µW"'-à the point of the post was to show the maximum theorical possibilities for both. As many pointed out not all 255 are usually printable and not all 171K words are used then that's for english only and not counting old english and not taking care of possible punctuation

hvs · on Aug 11, 2011

It's actually closer to 92^8 (printable ASCII) or even 62^8 (if they only allow letters and numbers).

nmcfarl · on Aug 11, 2011

I've been using phrases and sentences as passwords for a while, and I've found that there are 2 main problems;

1) A lot of sites, still in this day and age, have max password lengths, so I still have a lot of short passwords. Usually this is bank sites and the like.

2) Password entry fields are often very short visually, and with a long password getting lost is much easier. I find I have to type them over A LOT.

The second is actually the more annoying problem.

dpark · on Aug 11, 2011

These are the real issues with this. Banks seem to be borderline idiots when it comes to password security: case-insensitive, no spaces, 20-character max, small choice of "special characters". These are from Amex, who's password requirements sadly were even worse a few months ago.

With crappy password requirements, it's impossible to use decent passphrases. Getting locked out of your account for 3 failed attempts at typing a 30-character password is pretty obnoxious, too.

In situations that allow passphrases, you don't need a password generator like this. You can grab a sentence from your favorite book and use it. e.g. "How do you do, Miss Doolittle?" That's not the best choice, but it's still got way more entropy than a standard password, probably a lot more entropy more than you'll get by choosing a 4-gram composed of words from a corpus of 2k, and it's easier to remember.

kragen · on Aug 11, 2011

It turns out that you are mistaken.

Your favorite book is almost certainly chosen from the 129 million books that Google knows about: http://www.fastcompany.com/1678254/how-many-books-are-there-...

That gives you 27 bits of entropy.

The average book length is probably not over 400 pages. An average page probably doesn't have over 25 sentences on it. So the whole book contains only ten thousand sentences.

That gives you 14 more bits of entropy.

The total is 41 bits of entropy. This is one-eighth as secure as a 4-gram composed of random words from a corpus of 2k, if we measure strictly by entropy.

The situation is actually much worse, though: your favorite book is probably a popular book. So the number of bits of entropy provided by the choice of book might be a lot smaller than 27. I would guess that it's perhaps 10.

And many of those 129 million books are not very different. They contain quotes from other books, reprinted short stories, folk tales, set phrases, and so on.

In practice I think it might be difficult to mount a password-guessing attack using the Google Books corpus, because it's hard to get access to that corpus. The Project Gutenberg corpus would not be so hard.

ubernostrum · on Aug 11, 2011

Of course, the flip side of this is that we're veering off into attacks where you're targeting one specific person and know a bit about how they've chosen their password.

If you want to mount such an attack, fine, but most of us are dealing with the much-more-common threat of someone who gets a file or a database of hashed passwords and wants to crack them all in one go.

kragen · on Aug 11, 2011

The analysis applies to that threat as well; it just adds some constant number of bits of entropy.

dpark · on Aug 11, 2011

That's an interesting analysis. I can't really see any major deficiencies with it.

On the plus side, a sentence is probably going to be easier to remember than 4 random words. Personally, I draw some of my "high-security" passwords from literature, but then I modify the case and do the "leetspeak" character substitution, so a naive sentence attack would not work. A more clever one might, though.

alanh · on Aug 11, 2011

Edit: Oops, as dpark points out, I swapped two digits. My apologies. Below, my original, erroneous comment.

41 bits of entropy means you have on the order of a one in 10^12 chance (2^41) of guessing it, and 2,000^4 is on the order of 10^16. So how is the former "one eighth as secure" as the other? Wouldn’t it be 10^4 times less secure, that is, 10,000 times more likely to be cracked?

dpark · on Aug 11, 2011

2000^4 is on the order of 10^13. Specifically it's 1.6E13. Maybe you accidentally swapped the 1.6 and the 13?

ScottBurson · on Aug 11, 2011

The attack you describe is easy to defeat by making a small modification to the selected sentence.

kragen · on Aug 12, 2011

If you choose one of eight small modifications to apply at a randomly-selected character, you get perhaps 6 bits of entropy from the choice of character and 3 bits from the choice of modification. That's better, but adding an extra common word to the end of the sentence would be better still.

colanderman · on Aug 11, 2011

Don't forget sites that require: "your password MUST contain at least one number, one uppercase letter, and one of the following characters: !, @, #, or $, but not %, ^, &, or *". I slap my forehead at how counterproductive these requirements are.

Lexarius · on Aug 11, 2011

This is why, for my lab's password changer, the requirement for short passwords is simply that it must have one upper, one lower, one digit, and one none-of-the-above (and be at least 8 characters).

If you have a long password (at least 16 characters), all other requirements are waived so that you can use passphrases.

dpark · on Aug 11, 2011

Wow, sanity in password requirements? Do they also avoid the silly mandatory 30-day password change?

lhnn · on Aug 11, 2011

I hope so; that's annoying and counterproductive.

onemoreact · on Aug 11, 2011

Forcing one or more digits has little value. You are better off with 1 uppercase one lower case and 2 non alphabet characters. (Users are very likely to be replacing a letter with 1,0 so 2options * 8posistions = 16 possibility's = fail.)

pedro_a · on Aug 12, 2011

Which is exactly the sort of terrible rules xkcd is criticizing (paraphrasing glenra).

Instead of 4 extra enforcements you could add 8 extra characters.

Your entropy is (somewhat simplified)

One 8 letter word: 15 bits

1 uppercase = 3 bits (or even just 1 bit, people capitalize the first letter)

reversing 2 rules above: 1 bit

replacing two characters at random places: 8*7/2 = 4.8 bits

inserting 2 random non alphabet characters: 40^2 = 10.6 bits Total: 34.4

The entropy of three medium difficulty words is log(4000^3) = 35.9

Instead of memorizing K!ybo4rd it could be mykeyboardisblue.

bryanlarsen · on Aug 11, 2011

The requirement for many of my website is simply that it "must not consist solely of lowercase letters". (as well as a minimum length).

glenra · on Aug 11, 2011

>must not consist solely of lowercase letters

Which is exactly the sort of terrible restriction xkcd is criticizing.

bryanlarsen · on Aug 12, 2011

A space is not a lowercase letter, so the xkcd password would pass my test.

pedro_a · on Aug 12, 2011

Then the space would be "the obeisance to the stupid website piece". Note that the entropy of "correct horse battery staple" is only one bit more than "correcthorsebatterystaple".

Dove · on Aug 11, 2011

Yeah, a lot of my passwords look like "securesecretpassphraseA1!"

There's the secure piece, and there's the obeisance to the stupid website piece.

andyking · on Aug 11, 2011

I have a couple of domains registered with 123-reg.

To prevent unauthorised access to your account your password must contain 8 characters.

Wait, what? They're right, too. You can't have 7 characters and you can't have 9.

wgx · on Aug 11, 2011

Yes a friend was complaining about that recently.

It's a bruteforcer's dream.

webfusion · on Aug 12, 2011

Hi

I work on behalf of 123-reg.

We are working on changing this in future control panel updates.

Regards,

Ricky

Simucal · on Aug 11, 2011

What could the reasoning behind those requirements possibly be?

Lexarius · on Aug 11, 2011

Usually the symbols involved are used by SQL or some other layer, and the programmers insert the password directly into the query string because they don't know any better. This leads to SQL injection and other issues.

So rather than discovering the correct way to do things, they try to prevent you from using any characters that might be involved in an SQL injection.

In some cases the guys on the backend know what they're doing, but the requirement can still be passed down from on high from some manager who absorbed the practice from another project.

enduser · on Aug 11, 2011

If anyone knew what they were doing the uncrypted password would be nowhere near a SQL statement.

dpark · on Aug 11, 2011

They're trying to force users to use those characters in an attempt to enlarge the space passwords are drawn from. It doesn't work very well, of course. Instead of "password", you just get "Password1!". That said, I might make the same choice (for short passwords) if I were implementing password policy.

Edit: If you meant the "but not %, ^, &, or *" requirement, that's an indication that the devs don't know how to use prepared statements or at least escape properly.

rmc · on Aug 11, 2011

Those requirements are there for the people who try putting just their name or "password" or their 4 digit ATM PIN as their password. For very short passwords, only having alphabetical (not even alphanumeric) passwords is terrible. Those requirements are there to prevent some really stupid passwords.

eru · on Aug 11, 2011

Covering your ass by disallowing passwords like "password".

Simucal · on Aug 11, 2011

No, I meant specifically why they would allow certain special characters and not others.

jobu · on Aug 11, 2011

Also annoying is that a lot of sites require gibberish. Apple requires at least one uppercase, one lowercase, and one number. Some sites require a symbol as well.

flatline · on Aug 11, 2011

Especially if you are logging into multiple systems regularly using domain credentials, it rapidly becomes apparent that the faster and easier the password is to type, the better. I've found that some passwords with symbols and numbers just roll off the fingertips with a little practice, others not so much, but longer passphrases are for some reason the worst.

jarek · on Aug 11, 2011

This. My password is not a word, not even a word with substitutions, but it is optimized towards typing it on a keyboard (in terms of when caps come in, when numbers are added, switching hands, etc). I can knock it out in a second and it's muscle memory with zero risk of forgetting. correct horse battery staple, not so much. I lose some entropy by making it typing-friendly, but the cracking algorithm to simulate that would be pretty difficult. I'll take the loss.

As an aside, 1000 guesses a second? Seems generous.

ghshephard · on Aug 11, 2011

Very few sites have a short max password length. I use 1password, and of the 63 sites I've stored passwords, all but 2 allow 25 character password lengths. Ironically, my Bank only allows me 15 characters.

I haven't typed a password in 3+ months - don't know what any of mine are anymore, so I find typing is no longer an issue.

cschneid · on Aug 11, 2011

I really like using 1password. I have a long passphrase as my unlock key, easy to remember, then do the randomly generated codes as long as is feasible for each different service.

1) Nothing written down 2) Unique per service 3) Adjustable difficulty & char set per service, to match their stupid requirements.

Seems like the best of several worlds.

r00fus · on Aug 11, 2011

I just use a password manager like laspass or 1password. If it has submission automation, you don't even need to type your password.

Just choose a nice strong master password.

wisty · on Aug 11, 2011

How about (NOT SECURE YET, IT NEEDS MORE ENTROPY):

    from nltk.corpus import wordnet as wn
    
    all_animals = set()
    def add_to_set(animal):
        all_animals.add(animal.name.split('.')[0].replace('_',' '))
        for child in animal.hyponyms():
            add_to_set(child)

    add_to_set(wn.synset('animal.n.01'))
    all_animals = list(all_animals)

    actions = ['ate','chased','killed','fought','kissed',
               'talked to','hated','loved','ambushed','fled'] # can add more

    def make_password():
        import random
        random = random.SystemRandom() # is this secure?
        choice = random.choice
        return 'the %s %s the %s'%(choice(all_animals), choice(actions), choice(all_animals))

If you pruned out 90% of the animals (i.e. the obscure, hard to spell, or scientific names), this is still about 20 bits. And the passwords are kind of memorable (I've gotten such gems as "the dodo chased the guppy" or "the tigress killed the king charles spanial").

You could also add a humorous adjective ("rabid", "talking", "magic", "invisible", "evil" ...) or adverb ("roughly", "quickly", "quietly", "secretly" ...).

You could also add a place name.

Periodic · on Aug 11, 2011

Completely random strings of words can be hard for me to remember, but something like, "the {adjective1} {animal1} {verb} the {ajective2} {verb2}" would be much easier for me to remember because the words relate to each other ways I already understand.

I expect we can get some fairly high entropy from just simple schemes like this.

However, the length of the password can be a real pain if you have to type it often, even once a day.

wisty · on Aug 11, 2011

You could get about 8 bits per animal, and 5 bits per hand-written verb / adjective / place (32 choices per category). So that's about 7-10 words you need in the frame.

You could get decent entropy with: the {adj} {adj} {animal} {verbed} the {adj} {adj} {animal} from in {place}. That's 5+5+8+5+5+8+5 = 41 bits.

I'm just wondering if it's worth it.

kragen · on Aug 12, 2011

This is truly awesome. You could easily use a more complicated grammar, but it might get tricky to generate a password with a specified amount of entropy.

drcode · on Aug 11, 2011

One slight addition to the xkcd password scheme that would add another order of magnitude of security would be to have your own personal "salt" that you add to all your passphrases. In this case, the salt would be a short, traditional, hard to remember password that you re-use with every xkcd style password. It would be hard to remember, but you'd only need to memorize it once.

So if your personal salt is "@T#23a" you would use "@T#23a correct horse battery staple" on one website and "@T#23a giant bug transistor leech" on another website.

re_todd · on Aug 11, 2011

That is what I do, I have a 4 character personal salt, like "7Pd$", and put it in the middle of a lowercase word or phrase. Having a symbol, lowercase letter, uppercase number, and number will satisfy most password requirements. I use it on many sites, so it is easy to remember. It also makes it simple to write passwords down, e.g. "correct horse ^ battery staple" which means to me "correct horse 7Pd$ battery staple", but would not be useful to someone who saw it, since they don't know my personal salt. A combination of what xkcd said and a short personal salt that's easy to remember is probably best.

cdavidcash · on Aug 11, 2011

You might want to read the cartoon again to see why this is useless, counterproductive advice.

xyzzyb · on Aug 11, 2011

Yes, but the salt could also be useful for sites that require passwords to include a number, a non alphanumeric character, etc.

dsmithn · on Aug 11, 2011

If this kind of thing takes off, it will become easier for dictionary based password attacks. Using this advice would go a long way towards preventing this.

AdamTReineke · on Aug 11, 2011

Easier, yes, but not easy. A dictionary attack on 4 words is the same as brute forcing 4 letters except now instead of just 26 letters there are thousands. 2000^4 vs 26^4 = 35,000,000% more to check.

alanh · on Aug 11, 2011

Careful! This is only using `Math.random` and does not attempt to use `window.crypto.random` (though most browsers do not support it yet: http://jsfiddle.net/alanhogan/trUYu/) or anything that would attempt to bring real entropy into the process.

I don’t mean to fault the creator of this page, but at the same time, I would not trust this generator for important passwords, simply because you cannot know if others are getting the same 'random' results as you are.

More info on SO: http://stackoverflow.com/questions/5651789/is-math-random-cr...

PDF on the topic: http://www.trusteer.com/sites/default/files/Temporary_User_T...

> In the Javascript engines of IE (Trident), Firefox (Gecko), Safari (WebKit) and Chrome (V8), the output of Math.random() can be used to reconstruct the random seed, and thus provide both this seed and the current “JS mileage” (i.e. the number of times Math.random() was invoked).

kragen · on Aug 12, 2011

I wouldn't use a JS program served from somebody else's website to generate my password anyway. How do I know it's not sending them a copy of the passwords it generates?

alanh · on Aug 12, 2011

Well, I watched network connections and saw none. Do that + use Incognito mode = you're probably good.

kragen · on Aug 13, 2011

He recently changed it to use a random seed sent from the server instead of the client-side RNG. Over, I believe, unencrypted HTTP. Your suggested countermeasure would not have detected that attack; indeed, perhaps it was already in place before you reported no evidence of attacks.

It would, however, have made it harder for him (or your ISP) to tell whose password they'd stolen.

IgorPartola · on Aug 11, 2011

Put this in your .bashrc:

  function rpass() {
      strings /dev/urandom | grep -o '[[:alnum:]\/!@#$%^&*()<>,.,{}]' | head -n $1 | tr -d '\n'; echo
  }

Then run $ rpass 16 and get a 16 character random password with a fairly high entropy. Then just use a service like LastPass or a solution like KeePassX or even a single GPG-encrypted file to store your passwords. Problem solved.

Passwords are evil. Most of them should be treated the way you'd treat your private SSH or SSL key. Whenever you can eliminate a password and get the user to authenticate using a third-party identity provider, you are doing them a favor.

Edit: with 80 possible characters, you get 80^16 possible passwords: 10^19 years at 1000 guesses/second.

yuvadam · on Aug 11, 2011

Actually LastPass has this option built-in. It can generate a strong password in-form and directly save it to your password vault.

Very useful.

IgorPartola · on Aug 11, 2011

Yes, but I prefer to generate the passwords on my own. I also use this to generate random passwords for root accounts (sudo FTW), etc.

duck · on Aug 11, 2011

If you use KeePass there is no need for the script since it will generate one for you based on rules you can set.

parfe · on Aug 11, 2011

1000? Try 600 million passwords a second.

http://www.elcomsoft.com/lhc.html

IgorPartola · on Aug 11, 2011

80^16/(600 x 10^6)/(365 x 24 x 3600) = 10^14 years.

slug · on Aug 11, 2011

or use pwgen , apg, etc

jsulak · on Aug 11, 2011

I prefer using a program like Password Safe (http://passwordsafe.sourceforge.net/), and use a safe password that's a long sentence (with punctuation). Then I can use arbitrarily long and complex passwords for all my accounts, and not have to worry about memorizing them individually. The password safe can even be synced across computers using Dropbox.

nollidge · on Aug 11, 2011

I prefer KeePass simply because it's got implementations on multiple OSs, as does Dropbox (to sync the password database file). So I've got it on my iMac, Android phone, Windows laptop, and Windows work PC.

mtogo · on Aug 11, 2011

If you have an iPhone or don't want to use keepassx, you can use an online password manager like Passpack or Lastpass.

The downside is that you need to really trust the password manager, as they have all of your usernames and password.

shinratdr · on Aug 12, 2011

1Password supports all those devices as well.

dredmorbius · on Aug 13, 2011

GPG-encrypted free-form file (though it's fairly structured), edited via vim and a well-known "auto-encrypt/decrypt GPG files" configuration: http://vim.wikia.com/wiki/Encryption

(Actually, from that page, vim now has built-in blowfish encryption, which I'll have to look at -- yet another argument in favor of sharing tips on the 'TarTubes: you may learn something even when you're sharing your own knowledge).

dredmorbius · on Aug 13, 2011

Erm: the blowfish encryption was in reference to the old '-x' vi encryption option (using a now pretty insecure Unix 'crypt' function).

A better "configure GPG edit mode" .vimrc is here: http://vim.wikia.com/wiki/Edit_gpg_encrypted_files

I believe that's based on the one first posted by Wouter Hanegraaff <wouter@blub.net>.

zobzu · on Aug 11, 2011

I prefer using a digital key that's always going to beat the entropy of the memorable passwords

ajross · on Aug 11, 2011

I can't help but think that this is a solution to the wrong problem. The big problem with password security in the modern world really isn't that they're easy to break, but that they're pervasively reused between sites. So breaking them (for example, by reading them in plain text out of a dumb database!) in one place opens up attacks on higher value accounts.

The fix, of course, is to get users to stop re-using passwords between sites.

How does making passwords more memorable fix this? If anything, forcing users to use random base64 strings strikes me as more secure as they will be forced into some sort of password locker implementation by their inability to remember them.

crizCraig · on Aug 11, 2011

Right, maybe if you use the first letter of the words in a sentence, like "Hey Jude, don't make it bad, take a sad song, and make it better." -> "HJ,dmib,tass,amib." Then you can add in some characters that make it different for each site without it being obvious which characters you added. I wrote a blog post on how to create different passwords for sites that are easy to remember: http://craigquiter.com/post/8668237043/creating-and-remember...

GFischer · on Aug 11, 2011

The link posted on the article merits a submission by itself:

"The science of password selection" (a breakdown of common passwords by selection practices, as taken from public leaks)

http://www.troyhunt.com/2011/07/science-of-password-selectio...

In short, passwords are chosen from:

People names: this includes a list of about 26,000 common first and last names.

Place names: this is everything from towns to states to countries and includes about 32,000 entries.

English dictionary

The most common passwords by group:

Name:

   1. maggie
   2. michael
   3. jennifer

Place:

   1. dallas
   2. canada
   3. boston

Dictionary Words:

   1. password (oh dear)
   2. monkey
   3. dragon

Numbers:

   1. 123456
   2. 12345678
   3. 123456789

kragen · on Aug 11, 2011

Is it possible that the breached Sony passwords he was analyzing may have been cracked with dictionary attacks? Maybe the reason only 1% of the passwords had a non-alphanumeric character was that the crackers mostly didn't crack the passwords that had any non-alphanumeric characters.

nrbafna · on Aug 11, 2011

"For those of us pedantic enough to want a rule, here it is: The preferred form is "xkcd", all lower-case. In formal contexts where a lowercase word shouldn't start a sentence, 'XKCD' is an okay alternative. 'Xkcd' is frowned upon."

wcoenen · on Aug 11, 2011

Note that 44 bits of entropy is still nothing if you want protection from off-line attacks on password hashes. A couple of GPUs together can calculate a billion hashes per second, which eats through 2^44 possible passwords in only a few hours.

This was recently demonstrated when the mtgox password database was compromised.

edit: but this shouldn't be a problem if the password is properly hashed with bcrypt or some other scheme with a work factor.

salvadors · on Aug 12, 2011

But this approach scales at a much faster rate. Simply adding a fifth word throws even a billion-per-second attack out into hundreds-of-years territory.

billybob · on Aug 11, 2011

Example generated phrase: "married greatly snake battle"

These phrases would be easier to remember if they made grammatical sense. Like Chomsky's famous "colorless green ideas sleep furiously" - the words relate to each other grammatically, even though it makes no sense.

Imagine memorizing "married greatly snake battle" vs "married snakes battle greatly." I think the latter is easier.

burgerbrain · on Aug 11, 2011

Entropy would take a serious hit if you did that.

kragen · on Aug 11, 2011

Not necessarily. If only one-fourth of all English words are grammatical after an average prefix, then you lose two bits of entropy off each word after the first. I suspect that the actual situation is not as bad as that. You might end up using "uncommon" words like "deceased", "advent", "fearful", and "ram" to compensate, instead of more common words like "strongly", "contains", "afterwards", and "corporate", but that doesn't seem like a major loss to me.

burgerbrain · on Aug 11, 2011

Any narrowing of the search space will most definetely reduce entropy.. by how much is calculatable but I don't have the time nor language statistics right now to do it.

billybob · on Aug 12, 2011

I'm not sure the technical meaning of entropy in this context, but personally, I would offset the narrowing effect of "restrict to grammatical phrases" by adding uncommon words. "Besotted ophthalmoscopes gambol indicatively" forms a coherent, if silly, word picture for me, so I think I can remember it.

As far as possible combinations, my vague memories of linguistics 1001 include the idea that this is one of the essential properties of language: it has so many possible combinations, that every speaker is continually creating sentences that have never before been uttered. Unlike, say, honey bee dances, which are often repeated.

kragen · on Aug 12, 2011

> I'm not sure the technical meaning of entropy in this context

Roughly, it's the logarithm (base 2) of the number of guesses that an optimal password guesser would have to guess in order to guess your password. It's a measure of how unknown your password is.

> it has so many possible combinations, that every speaker is continually creating sentences that have never before been uttered.

Yes, this is why all of the suggested alternatives like "choose a line from a popular song" are so much less secure.

pedro_a · on Aug 12, 2011

You can modify slightly your sentence without loosing entropy

"married greatly snake battle" becomes "a married great snake will battle".

"correct horse battery staple" becomes "correctly the horse inserted the battery staple"

Note that the extra words add little or no entropy, at the cost of increased length.

kragen · on Aug 12, 2011

This is a good idea!

You do lose a little entropy that way: you've merged "great" and "greatly", and "correct" and "correctly", suggesting that your modification process considers adjectives and their corresponding adverbs as equivalent. If those examples are typical, that removes one bit of entropy. But you've probably added more than that back in, if your choices of "a", "will", "the", and especially "inserted" are unpredictable. (Alternatives might include "the", "would/did/could", "a", and "removed".)

gjm11 · on Aug 11, 2011

For what it's worth, Google finds more hits for "fearful" than for "afterwards" and more for "ram" than for "corporate". ("Strongly" and "contains" do beat "deceased" and "advent", though. And yes, many of the hits for "ram" are really for "RAM".)

kragen · on Aug 11, 2011

My frequencies are from this word frequency list from the British National Corpus: http://canonical.org/~kragen/sw/wordlist

ZoFreX · on Aug 11, 2011

I would actually advise going against this advice. While it isn't a best practice, password sharing can and does happen, as does shoulder-surfing. It would take a LOT of effort to memorise my password, but a simple four word password will probably be remembered by accident. In a year's time if I piss a friend off, I don't want my Facebook password to be readily accessible in their memory.

I think more people need to learn to remember arbitrary strings. There really is no way around that problem if you want a decently secure password, and it's rare someone has a "good memory" - in most cases they've just learnt how to remember things well.

(Note: This doesn't really apply to me or most of us here in most cases, but for example my WiFi password is of the form "Mycatsname9" and yet my neighbour still has to ask me for it whenever her phone forgets it)

darklajid · on Aug 11, 2011

How do you share your preferred password? Because I guess everything but sending it per text/mail would be tedious, while it would work better with a couple of words.

Shoulder surfing: It's certainly a risk, but I'd say that prolonged shoulder surfing shouldn't be possible. If I type fast, it will be very hard to make out the phrase. If I type slow, you cannot stand around that long.

And - I'm not a security expert, but how much do you gain if you saw a couple of chars here? My intuition (yeah, shouldn't trust that) says that it's worse if I watch you and know the _first_ character of your password than you seeing the first 1-3 characters of the first word of my passphrase?

(We don't know the name of your cat, so judging the quality of the password or your neighbo(u)r's ability to remember it is hard)

ZoFreX · on Aug 12, 2011

> And - I'm not a security expert, but how much do you gain if you saw a couple of chars here? My intuition (yeah, shouldn't trust that) says that it's worse if I watch you and know the _first_ character of your password than you seeing the first 1-3 characters of the first word of my passphrase?

Novel thought and possibly worth persuing, I hadn't thought of that. I want to re-iterate this isn't something I broadly apply across all my passwords or even many of them, just that for some users password sharing is a use-case.

nxn · on Aug 11, 2011

So you're advising against what appears to be a more practical and secure methodology on the basis that it's worse when you share your password? If you share your password, your exact problem is that you're sharing your password -- it's not how easy or hard the password is to remember. In fact, why does this even have any significance when the person you're sharing it with can just write it down?

Oh and if within a year's time you do not change your password, that could very well be another problem. I think you'd be better off just using easy to remember pass phrases and changing them every once in a while. Shouldn't be a problem because they are, after all, easy to remember.

ZoFreX · on Aug 12, 2011

> So you're advising against what appears to be a more practical and secure methodology on the basis that it's worse when you share your password?

Iff you share your password then yes, you should use a scheme more suitable for that. And yes we shouldn't be sharing passwords, and if we do we should be changing them, but in the real world where most people don't do that I don't think we should encourage passwords which their friends will remember easily - because that is a very common attack vector.

commandar · on Aug 11, 2011

>I think more people need to learn to remember arbitrary strings.

The entire point is that humans aren't very good at doing this.

>(Note: This doesn't really apply to me or most of us here in most cases, but for example my WiFi password is of the form "Mycatsname9" and yet my neighbour still has to ask me for it whenever her phone forgets it)

This is actually exactly the kind of scenario where using pass phrases makes the most sense. WPA2 is vulnerable to rainbow table attacks; relatively long passphrases are both easier to remember for mere mortals and less likely to be broken by a rainbow table attack.

ZoFreX · on Aug 12, 2011

> The entire point is that humans aren't very good at doing this.

How true is this? Because everyone that tells me "I have a bad memory" doesn't even know the most basic tricks.

> This is actually exactly the kind of scenario where using pass phrases makes the most sense

I agree, actually - I don't mind if my neighbour does remember it, I was just trying to illustrate that things that are easy to remember are remembered by accident, and things like that are easily forgotten without effort.

kragen · on Aug 12, 2011

Yes, if you share your password, it's probably better to use a password that needs to be written down and can't be memorized, in order to have a chance of revocation. (Or you could just change your password.) But for most of us, most of the time, memorizable passwords are a boon.

nakkiel · on Aug 11, 2011

This might come in handy:

shuf -n4 /usr/share/dict/words | tr '\n' ' '

eru · on Aug 11, 2011

If you allow multiple occurrences of the same word, you can get slightly higher entropy while making the passwords potentially even easier to remember.

    echo $(for i in 1 2 3 4; do shuf -n1 /usr/share/dict/words; done)

(Sorry, I'm not very good at bash, so this loop is probably not idiomatic.)

bnegreve · on Aug 11, 2011

for i in `seq 1 4`; :)

bronson · on Aug 11, 2011

If this is golf, you took two more strokes than he did.

eru · on Aug 11, 2011

Thanks. By the way, $(seq 4) is enough.

DufusM · on Aug 11, 2011

I don't think those words are very practical. For example, 4 consecutive runs produced:

shippon preannouncer half-hourly withgang egotize baffs chapter monolater photoengraver beachhead linguidental autoheader hazeled defloration exhumate barretries

none of which seem particularly easy to remember (or spell even).

nakkiel · on Aug 12, 2011

It's like with any other program supposed to help you picking up a password; you run it a couple of times until you find something that ticks.

  Beirut ejecting sidings mourns

scythe · on Aug 11, 2011

You could probably get a few more bits of entropy kind of easily if you use words from other languages. This doesn't help the monolingual among us but it's great for me.

kijinbear · on Aug 11, 2011

Some Koreans do this: they just type up some Korean words. Since most password fields only accept ASCII symbols, the password gets entered as a nonsensical string of alphabets. For example, the Korean word '비밀번호' (meaning 'password'), when typed on a standard Korean keyboard, becomes 'qlalfqjsgh'.

eru · on Aug 11, 2011

Yes, though the number of additional bits you get from increasing the size of the dictionary decreases fast. E.g. suppose English and German have the same number of words, then using both only gives you one more bit per word.

(Actually, slightly less since some words exist in both languages. Like `hell'.)

scythe · on Aug 11, 2011

>Yes, though the number of additional bits you get from increasing the size of the dictionary decreases fast.

Well, sure -- but once you're at around two or three languages, you get to imagine that the attacker doesn't know what languages you're using. If I use English, Japanese, and Spanish, I can figure on the attacker needing to check the Germanic (English, Dutch, German), Romance (Spanish, French, Italian), and Asian (Japanese, Chinese, Korean) languages at a minimum.

Jargon helps too, and proper names. "dijkstra bicycle entonces boojum daihinmin"

eru · on Aug 11, 2011

Always assume the attacker knows your scheme, but not your random bits.

pedro_a · on Aug 12, 2011

>Jargon helps too, and proper names. "dijkstra bicycle entonces boojum daihinmin"

Instead of that, just add an extra common word: "correct horse battery staple bicycle".

numeromancer · on Aug 11, 2011

What a great article! I'm changing all my passwords to "correct horse battery staple" today!

mrspeaker · on Aug 11, 2011

He he, reminds me of the password generator I made concatenating 3 words from the list of the "500 most common passwords": http://www.mrspeaker.net/2009/01/09/make-secure-passwords/

The top 500 list has an awful lot of naughty words - so the phrases are pretty easy to remember ;)

marze · on Aug 11, 2011

Isn't this discussion premised on a server configured to allow fast password guessing indefinitely?

This is 2011, shouldn't every server be configured to allow a guess every two seconds for 20 guesses, then every 10 minutes, or something similar?

I'm not familiar with common practices in this area, but why wouldn't all such services be configured to limit the incorrect guesses?

ry0ohki · on Aug 11, 2011

Most really strong systems lock an account after a couple of incorrect guesses. I assume this is all for systems that may not be secured to prevent brute force.

mtogo · on Aug 11, 2011

Locking the account is the wrong way to go about it since it makes DoS on known accounts trivial.

Blocking the IP or an increasing time between tries is, afaik, the "right way".

ck2 · on Aug 11, 2011

I've been doing this for years on sites that allow long passwords - "pass sentences" - but I also throw in a number or two.

buro9 · on Aug 11, 2011

I've also been doing this for years, but with bits of my post code thrown in to fulfil those edqe cases where complexity requirements are needed.

petenixey · on Aug 11, 2011

... random combinations of bike bits plus a greater London post code. Give me enough monkeys and typewriters and I could take you ;)

buro9 · on Aug 11, 2011

nah, my typos will laways protect me

jwingy · on Aug 11, 2011

+1 (TM now I guess)

You could have four word phrases that are maybe only ~12 characters, which if there are only alphabetical characters in the password, are still very much crackable via GPU brute force (http://mytechencounters.wordpress.com/2011/04/03/gpu-passwor...)

abecedarius · on Aug 11, 2011

"It's a novel idea."

No, I posted about my own generator in 2005: http://darius.livejournal.com/38591.html (getting the words from Beowulf). Then Zooko or Kragen pointed out some even older system in response (I forget the name).

ajross · on Aug 11, 2011

Compuserve was generating automatic account passwords in the early 1980's from two dictionary words and a non-alpha character in between them. Mine was "sleeve;coast". No doubt they didn't invent the trick either.

_hgt1 · on Aug 11, 2011

Such a password scheme provides much less than 44 "bits" of entropy. Considering the use of 4 randomly chosen words from the c.170000 english words in general use, means we can guess the paraphrase in around 2^22 tries - even less than "Tr0ub4d0r3&".

EDIT: I'm totally wrong, it's more like 2*10^22 ... oops!

drblast · on Aug 11, 2011

Randall's math is spot on in this comic.

Assuming the attacker knows the the password creation method (and the math assumes the attacker does in the first case), then the attacker knows the word list and the passphrase algorithm.

11 bits per word gives you a grab bag of 2048 possible common words. To guess the password, assuming each word is unique, the attacker needs to try

2048 * 2047 * 2046 * 2045 = 17,540,692,561,920

possible combinations. Initially, you'd think an eleven character password with totally random uppercase, lowercase, numbers and symbols would give you 66 ^ 11 combinations for 66 bits of entropy, but since nobody can actually remember such a random combination, the resulting passwords using these rules are much less secure than that.

TetOn · on Aug 11, 2011

Wouldn't you first have to know that the passphrase consists of four randomly chosen words (eg not three, five, or eight)? To me, that's the underlying strength of the approach that the comic (!) is trying to highlight.

burgerbrain · on Aug 11, 2011

The entropy is actually calculated with the assumption that the attacker already knows those things. If they don't, then it is higher.

burgerbrain · on Aug 11, 2011

There is no need for scare quotes around bits. The term is being used in a technically correct fashion. https://secure.wikimedia.org/wikipedia/en/wiki/Bit#Informati...

_hgt1 · on Aug 11, 2011

Err, yes thanks. I was trying to emphasise the fact that the multiplier of entropy is not the "bit", but the "word" (in the linguistic, rather than the computer architecture sense)

meentsbk · on Aug 11, 2011

There is a really interesting discussion on using passphrases from stackexchange that is probably worth linking: http://security.stackexchange.com/questions/6095/xkcd-936-sh...

mathattack · on Aug 12, 2011

The beauty of this discussion is not just "How to create memorable but hard to break password?" but "How much deep insight can a 4-6 frame cartoon contain?"

The signal to noise ratio of xkcd is fantastic! They've again zipped a great discussion in just a few frames.

hm2k · on Aug 11, 2011

What about sites that don't allow spaces?

I know hotukdeals.com only allows [a-zA-Z0-9] which sucks.

wisty · on Aug 11, 2011

And also, sites that have a limit on password length. And sites that have a silent limit on password length, and secretly truncate it. And sites that have different truncation, depending on which form you use. And sites that require a Capitals, lowercase, and numbers, because nobody would just use name+birthday.

pavel_lishin · on Aug 11, 2011

Leave the spaces out..?

duck · on Aug 11, 2011

Exactly, or if the site requires numbers/symbols those three spots are perfect place to put those instead of spaces.

hm2k · on Aug 11, 2011

That's not what is proposed here though.

The point is that one size does not fit all at the moment.

I guess developers need to change the way they handle passwords.

dkokelley · on Aug 11, 2011

I still sense a problem. While these passwords ARE easier to remember, there is still the security flaw that most people reuse passwords. A key-logger or shoulder-surfer could snag this (or a website could store your password in plaintext and be compromised) and then it's game over. Password managers are the future. They can memorize unique passwords of any length and complexity for every website you use, and they can store the passwords with very strong encryption with 1 key that is memorized. That key is where a password like 'correcthorsebatterstaple' could be effectively used.

Khao · on Aug 11, 2011

I remember wanting to sign up on a website that had the worst password "feature" ever : you typed your password in a plain textfield, and once you clicked away it was changed to a password field. Seeing as how this "feature" was on the main page I decided never to use this service and sent the website an e-mail saying that their password field is not clever but instead is a big fat counter-security measure.

Edit : I managed to find back what website it was : http://www.advirtus.com/ when you register it shows the password as you type it

CWuestefeld · on Aug 11, 2011

I think that's fantastic.

1: what purpose do the stupid asterisks serve, anyway? I understand them on an ATM machine, but not on my desktop PC or phone.

2: Very frequently (like, maybe 50% of the time) when trying to type a password on my phone, I miss the little "key" and mistype, but can't see that I did. I have to make multiple tries at entering the password. This feature would prevent that.

So it looks like all upside, with no cost (when used only in appropriate contexts).

dfxm12 · on Aug 11, 2011

1. Shoulder surfing a password is way easier than other forms of cracking. Sure, you might not think people are looking at you as you type your password, but then this becomes a crime of opportunity...

2. There are technological ways around this, like finding a bigger keyboard for your phone, using password management software, imploring the website to make an app, practicing typing, etc. Also, BB/iOS/Android all show the most recent character you typed in a password field. Many soft keyboards give feedback about the most recent character you types as well.

I'm fine with my password fields being obfuscated.

Khao · on Aug 11, 2011

I agree that this feature is good while working with a smartphone, but I'm pretty sure Android has a settings somewhere to always show the last letter you typed in every password field. I would be surprised if there wasn't a setting for that also on iOS.

The thing is, I think it makes perfect sense to implement this in certain situations, but at an OS or browser level, not in the website or inside an application. Passwords are something we have grown used to and we always expect them to behave the same way! If we were to change the way passwords are handled, it should be consistent across everything.

For example, browsers could implement password fields with a checkbox next to it that lets you show/hide password at your will. The fact that this website has only one setting (always show when in focus) is scaring me.

roc · on Aug 11, 2011

> "I would be surprised if there wasn't a setting for that also on iOS."

That's the default behavior for password fields in iOS. Trick is, when you have a long password it takes far too long to shift focus from the keyboard to the text field to verify each character before moving on.

I'd very much like to have a client-side show/hide button for password fields.

Aloisius · on Aug 11, 2011

Personally I think password entry should be done in madlib form. Each user would have a unique madlib prompt like:

Username: ___________

Password: Twelve ___(pl. noun)___ jumped over a ___(adjective)___ ___(noun)___ named ___(proper noun)___.

y0ghur7_xxx · on Aug 11, 2011

Reminds me of this previous discussion: http://news.ycombinator.com/item?id=2450972

Maybe Randall was inspired by that post.

tantalor · on Aug 11, 2011

Sorry, but this isn't novel. I can't find it now, but I read a blog post that described this technique recently (~6 months ago).

Edit: y0ghur7_xxx (http://news.ycombinator.com/item?id=2872827) found it: http://www.baekdal.com/tips/password-security-usability

Shenglong · on Aug 11, 2011

Does anyone else here not really remember their main password? Mine's all in muscle memory and I can't write it out unless I imagine a keyboard.

shinratdr · on Aug 12, 2011

I wish that was why I didn't know my password. The real reason is that 1Password manages that part of my life for me so all my passwords are long randomly generated strings that I don't know.