I find the discussion surrounding the XKCD strip alarming for the superstition it reveals about password generation. The particular theme I am alarmed by is that people seem to think that if a password looks alien, or was difficult for them to come up with, it will be hard for a machine to guess.
Look, we're working with big numbers here. You need to do the math.
In this thread alone, I've seen suggestions to use a common dictionary word translated into another language, or written in l33tsp34k with some permutations. From a probabilistic perspective, these are still dictionary words, even though they look like gibberish. The same is true of the common method of typing a word with ones fingers displaced on the keyboard.
Conversely, I see a lot of argument that these XKCD passphrases would be easy to guess because they are made up of dictionary words. This misunderstands the math behind the situation. Even if an attacker knows that your password was generated via this method, and even if they know the word list you used, the password is still hard to guess. The difficulty grows exponentially with each word in the phrase, and that's pretty fast.
The key with passwords is not to create something that looks random -- something that if you showed it to another human being, they'd have a hard time deciphering. It's to create something that is random; literally a result of a throw of the dice for every new password.
Human beings are really bad at creating randomness. There's a demonstration done in an early statistics class in which the professor divides the class into two groups. He tells one to toss a coin a hundred times and record the sequence of heads and tails, while the others are to write down a sequence they think is random using their imagination. The papers are completed and mixed and then -- magically! -- he is able to sort them into the two types, easily and with high accuracy.
The lesson is this: even when you think you're being random, you probably aren't. You're probably using the same tricks everyone else is, and making the same mistakes.
I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.
This should be higher up. It's scary to see people — intelligent people, I'm sure — saying things like "And that goes even higher when you add punctuation!"
No, it doesn't. All of the reasonable punctuation you could add to a sentence adds only a few bits of entropy at best. It also makes the sentence harder to remember— was there a comma or not? Adding unreasonable punctuation or symbols is even worse— you get slightly more entropy at the cost of a password that is way harder to remember.
The crucial point here is that four random words, separated by spaces, selected at random only from the 2000 most common English words — EVEN IF your attacker knows that your password is four random English words from the 2000 most common separated by spaces — already is a very long random string. If it's not random, each common English word you add adds 11 bits, and is only marginally harder for most English speakers to remember. Conversely, choosing "random" extra characters to add in makes it slightly longer, very slightly more random, and way, way harder to remember.
It's certainly a "very long random string" without context but as people have pointed out above, it's actually not a very good password if people adopted this pattern widely (and you said the attacker knows this).
2000^4 = 16000000000000 possible passwords = 1.6E13 = ([A-Z] + [a-z] + [0-9] + [!@#$%^&()])^7.1ish. So, your four words from the 2000 word list are equal to a 7ish character password that looks like "Av#12GH". I'm not sure if you meant that seven characters was "very long" but I wouldn't say it is. Still a very strong password but maybe not as random as it appears to be when the pattern is known.
My point was that adding a character to something like "valve tangle hastens accept" is like adding a couple bits to something like "Av#12GH", and yet people feel like it's accomplishing something valuable. They feel that way because "Av#12GH" looks random, but "valve tangle hastens accept" doesn't.
Obviously the literal length of the string is not strictly relevant, and it was probably inarticulate of me to include that.
Knowledge of the pattern has nothing to do with it. That 2048^4 figure is what I mean when I say such a password is strong, and such a figure presumes the attacker knows what system I am using.
Recall that since the passphrase is randomly generated, that 2048^4 is the true probability of guessing it--all the elements of the set are live possibilities. To compete on equal footing, a seven character password must also be randomly generated.
A password is not necessarily strong simply because it spans a large character set. "Sp1d3r!", for example, may as well be a dictionary word. Raw length "spiderspiderspider" is not necessarily helpful either. Randomness is what you need.
I have a password file here with several hundred passwords just like that (actually, they're all 12 chars with upper/lower case, digits, and "special chars", as chosen and stored by 1Password...)
Joe Public is unlikely to use passwords like that, but I'm 100% sure I'm not the only hackernews reader who does.
This is not entirely correct, we include non alphanumeric characters (punctuation) in our passwords occasionally because it increases the solution space for a brute force attack.
While this doesn't really improve any individual password the fact that we occasionally include non alphanumeric charecters increases the possible password set from 62 possible charecters ^ password length to a something more like 90^password length.
Similarly we dictionary attacks are more efficient than brute force attacks because we're talking maybe 200,000^(words in password) if we allow for some common word permutations versus 90^passwordlength.
are you kidding, i'm pointing out the historical reasons for why we add punctuation chars in our passwords, as it directly impacts the solution space a brute force attack needs to cover.
> I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.
Definitely agree with you here.
I've been using the "few random words" method for passwords I need to remember for some time (and random 20 character mixes of alpha/numeric/symbol for the other, which I have stored in a keepass db), and I know I'm not all that random in my choice of words so if someone managed to see one or two of my passphrases it would be quite easy to create a script that could brute force the other couple quickly.
I shall have to use a script like this (or throw together my own for paranoia's sake) next time I change one of my passphrases.
You mean the statistics demonstration? I'm sure I've seen it in several places.
I know of two tricks for detecting the students. The first is to look for six or seven heads or tails in a row. Over a hundred tosses, a coin will probably do that, but humans "being random" won't. The other is to look at the page as a sequence of "HHH" and "TT" strings and estimate how many there are. A coin, of course, changes from heads to tails 50% of the time, but a human does it more like 70% of the time.
I'm sure there are other characteristics, too, but those two are sufficient to throw out most human attempts at a glance. It's actually kind of obvious, when you see the two side by side.
Me "being random" with the numpad: 10110101001010010101011010100101011010100101010110101
Cool! Makes sense, too— it feels unrandom to sit there hitting one key a bunch. How do you know when to stop?
So here's what I got: figure I have a bias to switch keys. That means that 01 and 10 are more common than 11 and 00. So what I need to do is group 01 with 00, and 10 with 11. What I do is generate twice as many bits as I need, treat the string as a sequence of two-bit pairs, and reduce each one to its first bit.
That looks like this: 01011110000010111000010011111011111111000000111010110001111000111100111100001100011010011101
Gets you past the litmus test, but looks like it goes too far the other way? Hard for me to tell, actually.
Another thing would be something like a sequential xor of each bit in triples (i.e. 010 -> (0 ^ 1) ^ 0 = 1), which segments triples across probability like so:
You can do that quickly by counting the 1s— 1 or 3 is 1, 0 or 2 is 0.
That looks like this:
00011011011011000000111111001011000101010000100100110011011010
I don't know if it adds much more (apparent) entropy, though.
A lot of comments here seem to be missing the point.
The main point is to use passwords that give you the most "bang for the buck" in the sense of adding the most bits of entropy for the least difficulty of remembering. Adding an extra number, or punctuation, or certain numbers of repetitions generally adds only a little bit of entropy for a significant cost in additional challenge to your memory.
Our minds are well suited to remembering combinations of common words, and by stringing a few such words together, you can generate a larger search space than using a single word with a few substitutions. Even if the attacker knows the scheme you're using, he still must search through the space of combinations of common words, which XKCD is pointing out is quite large.
I've started using song lyrics when given the option of an extra-long password. I can get a very long string with little effort, and it's trivial to remember.
The best part is that any automated attack would have to deal with ringtone popups.
Be aware that adding to the length simply by taking more of the lyrics adds very little entropy. If you're trying "Oh say can you see" then it doesn't take a lot of extra bits also to try "Oh say can you see by the dawn's early light what so proudly we hailed at the twilight's last gleaming".
Similarly, extended passages of text -- even if they don't come from a restricted corpus like that of song lyrics -- have less entropy than you'd think. A smaller number of independent random words is likely to be a better tradeoff.
I can see your point in that the kolmogorov complexity of two lines in a song isn't much larger than one line. Similarly, 30 digits of pi and 300 digits of pi have very little difference in kolmogorov complexity.
What I don't know is if state-of-the-art password guessers are great at recognizing larger patterns in the entire canon of human knowledge. I.e. is there a "common phrases" attack that's analogous to a "dictionary attack"?
Google released the world's largest corpus and did us a favor by analyzing it for n-grams. For example, they found that the phrase "serve as the initial" was over a 100 times more common than the phrase "serve as the insurance". [1] For $150 you can buy the 24GB data set yourself, so it's a fair assumption that makers of password crackers could reliably guess common phrases first. [2]
That may be true, but we still end up better off. The compute time for the password cracker has gone up quite a bit, making it a more expensive endeavor (they've got to build dictionaries for both WKP's and passwords with fuzzing). It doesn't solve the problem, but it's a start in the right direction (away from fuzzing of dictionary words, which is clearly bad for human memory, and good for password crackers efficiency).
However, when using randomly chosen dictionary words to build phrases (not well known), the entropy shoots well above the level of being reasonable to crack in a lifetime.
Given that the knowledge about correct parts of a password based on known sources (pi, peace and war, song lyrics etc) drastically reduces the amount of possible solutions. But how would an attacker figure out the first part of such a password? What comes to mind are timing attacks http://en.wikipedia.org/wiki/Timing_attack
What other possibilities did I miss?
EDIT: I get that having a long streak of my pass in a dictionary would reduce overall security but it's still unclear how a partial match in the dictionary would be detected.
But there's a long tail of song lyrics. If you pick something obscure, the odds of the attacker even having heard of it become very small (particularly if the attacker is from a different culture than your own). Pick something arty and incomprehensible, and the odds against someone else accidentally stringing those words together in some other context become astronomical.
For instance, I'd wager no cracker has ever heard the song containing the line "We barter images on the matrix". And that's one of the more intelligible lines from the song in question (from a 1978 album by the little-known prog-rock group Happy The Man). Pull it up on Google and you'll see what I mean.
If you don't know the song, of course, lines from it will be about as hard to remember as randomly chosen words. But if you do know it, you have a good mnemonic.
This gets into the whole "security through obscurity" thing. Ideally, you should use a password-generation system such that if the attacker knows your pasword-generation system (e.g. lines from songs) it would still be infeasible to guess your actual password.
Thats why the 4-random-words technique is good. According to XKCD, the 4-random-words technique generates about 17 trillion passwords---all equally likely.
But even with a long tail, song-lyric passwords relies on obscurity. I imagine there are much fewer than 17 trillion songs to choose from. And if the attacker knew some information about you (say from looking at your Facebook profile or your search history) I'm sure it could drastically weed out the search space.
there might not be 17 trillion songs, but you aren't limited to the first 4 words of the song. there might be 100-300 words per song and you can pick your starting word anywhere you like.
But it falls into the same boat as any dictionary attack. Most people with a passphrase are probably going to use one from a song. 90% of them are going to use one of the top 1,000 songs, 90% of them are going to start at the beginning of a line. If we say there are ~20 unique lines in the average song, and most people won't use more than ten successive words even if it bridges a line, that's 1000 * 20 * 10 = a keyspace of 200,000. Trivial.
What this means is even if you decide you're going to be really secure and pick, say, the 30,000th most popular song, assume all songs have 200 unique lines (to account for sensical starting points in the middle of lines), and use 20 words from it, you're in a keyspace of only 120 million, which even if it takes 1ms to hash will be cracked in a day.
By contrast, four random english words chosen from the 2,048 most common has a keyspace of ~1.75e13, or 17,500,000,000,000.
Choosing a clever, unusual line from the middle of a very uncommon song is the passphrase land version of choosing a rare English dictionary word and replacing the vowels with numbers. If your hash gets compromised, it might as well be "password".
How is this an improvement? I now have to remember a song lyric, and some set of random manipulations of that song lyric. I've used that trick for passwords before, and it was a hassle. But that doesn't even matter— unless you're choosing the manipulations randomly (which is a contradiction in terms) you're falling right back into the exact damn trap the comic was about!
You've added ! at the end, replaced s with z, capitalized some words, and replaced vowels with numbers. These are already standard manipulations in a dictionary attack. And it's causing you to ignore the fact that you've chosen what is probably among the top 10 song lyrics used. "p4ssw0rd!" is "password" as far as a dictionary attacker is concerned. Calling this trivial to brute force is demeaning to the word "trivial". Your attacker wouldn't even laugh at you, because there'd be dozens of other hashes in the file just like yours.
It's been said over and over in these comments: the appearance of randomness is not randomness. Humans are horrible at making things random, as you've just demonstrated. Stop trying to make it look weird, and actually do the math.
It's fairly easy for me to remember those manipulations. But you're right insofar that this would probably be both safer and easier to remember:
Smells like teen spirit, and I like that plenty mucho!
I'm too lazy to do the math on it, perhaps you can help out?
Edit: It's a little annoying to collect these downvotes from people who either haven't done the math themselves or are too lazy to explain their advanced attack methods.
In my naive opinion my string above is at least equivalent to a 12 character password from a set of "Mixed upper and lower case alphabet plus numbers and common symbols.".
I count each word (10) and both symbols (,!) as a character here.
According to [1] an 8 char password of that type would take 83½ Days to crack in a Class-F attack ("supercomputer"). I'm purely guessing that those additional 4 "chars" should put it well into the multi-year range, under the premise my other assumptions are not too far off and that the number of english words is quite a bit larger than the number of ascii characters/symbols.
Any of the downvoters care to debunk that with real math?
I'd be honestly curious about a worst-case analysis that assumes the fragment "Smells like teen spirit" does appear in the attackers dictionary.
Yeah, that's what I was getting at. Something like that is pretty much immune to naive brute force, even if we count "Smells like teen spirit" as a word. My guess would be that if it does get cracked, it would be by searching [lyric]+", and"+[some kind of Markov attack], but I honestly have no idea how one would work out the entropy in that model. It depends a lot on how the search is carried out, I think.
I guess we'll find out when passphrases become common :)
I've had the opposite experience, where I'm more likely to mistype passwords with mixed case letters and symbols holding the shift key down too long. With mobile devices, it gets worse, as common words are easy to type, but symbols and mixed case are a pain.
It is a good point, though, that for frequently typed passwords on a good keyboard, you can engage your muscle memory. That allows you to type short passwords very quickly, and even remember passwords with your fingers that you've lost from your conscious memory!
Thats how I can remember 25 character passwords with symbols and mixed case. I am not thinking "43#gj(eO3%". I'm thinking "4-4-shift3-g-j-shift9-e-shifto-3-shift5". So basically you have about 47 keys with nice characters and they each have two of them.
Just use a random password generator with those 47 characters and type it 25 times into a notepad to pound it into your muscle memory. (and if you're paranoid, clean up your memory and swap file)
For the ones you don't use every few weeks, keep them in a password database like KeePass with 2 factor authentication and keep the key file on a thumb drive on your keychain.
I don't think entropy is the whole story. I would argue that although security-through-obscurity is a terrible, awful practice for systems, it's not that bad for personal password schemes. Using a nonce "system" for passwords, even if it's mathematically low-entropy, is still secure, at least enough for personal use.
For example, if I use single dictionary words fed through a trivial ceasar cipher, then that is mathematically very low entropy. Realistically speaking, however, it's relatively safe if the cracker doesn't know that's what I'm doing, because it's impractical for crackers to compute all possible low entropy "alternative dictionaries."
You'd be surprised. JTR does l33tspeak substitutions, one-row-up substitutions, keyboard walks, pretty much all of the common things everyone does because "no hacker would ever think of that."
i think you are missing the point: passwords should be hard to guess first and should be easy to remember second. the former is the stronger need.
let's say there are 500.000 english words you are choosing from and you use 4 words. that gives you 500000^4 possibilities. let's assume the words averages about 5 characters, so we will compare this to a 20(=4 words * 5 characters) character long password made of 26 types of character (english alphabet, not using numbers and other special characters), that gives you 26^20 possibilities. and 26^20 - 500000^4 ~= 2x10^28, or put it this way: (26^20) / (500 000^4) = 318 850.382..
i know a random sequence of 20 characters are very hard to remember, but 500.000 is an overestimation too. let's say we use special symbols too (50 characters) and the word dictionary has 100.000 words. (50^12) / (100 000^4) = 2.44 so we can say it is better to have a 12 character long password (made of alphanums + symbols) than 4 random word concatenated (i think 12 is somewhat a 'standard' for 'sensitive' passwords). and i would argue that on the long term multiple concatenated passwords are very hard to remember. i'm not saying this is a terrible approach, just not the silver bullet to the 'password problem' (which xkcd never claimed of course, and for 'non sensitive', 'reused'/'throwaway' passwords it may be a viable option).
I'll grant you that it's more important to have a password be difficult to guess, but that doesn't obviate the importance of it also being easy to remember. Even better, let's look for password schemes that are harder to guess and easier to remember at the same time.
It's easy to compare the entropy of two schemes as you're doing in your comment, but it's more difficult to objectively claim which is easier to remember. You argue that a random sequence of 12 characters is easier to remember than 4 words. If so, then I'd agree it would be a better scheme. However, I don't think that's the case. To really settle the argument, we should do some experiments - maybe someone already has?
Here's some examples I used a generator to create:
The trouble is that by human intuitions, you think there's a strong inverse correlation between being hard to guess and easy to remember; but that's not always the case.
It's hard for humans to remember meaningless conjunctions of symbols, so we think they are hard to guess; so we err on the side of making them too short. Contrariwise, we think a sequence of just four words couldn't possibly be hard to guess because it's so easy to remember, but it's only easy to remember because we can use the meanings of the words to form an idea or image, something our brains are built for (unlike strings of meaningless characters).
This is why the word technique is better: it corresponds better with how we remember, while reducing two other risks: the risk of losing your password - non-trivial - and choosing too short a password.
Another point is that letter placement within words is significantly non-random. By intelligently choosing which letters to try in each position, the hacker could at the very least minimize the number of tries by an order of magnitude for the first word.
I probably shouldn't announce, in a forum, that using Don't Forget About Poland! as a passphrase seems like an awfully tempting for someone like me :)
(American by birth, Polish by heritage)
Speaking of the example I just presented, how much more effective would it be to include special characters within these long passphrases? Obviously the goal is to be able to remember them, but surely most if not all of us, are already using special characters for our passwords.
When counting the entropy you would probably count each word as a single entry, and each special character as an entry (and disregard spaces).
* By capitalizing the words you've doubled the search space for words (assuming that the search space starts with all words lowercased)
* You could increase the search space for each word by 200% (from the space of all lowercase words) by including the possibility of words in all caps (it's unlikely for people to start using alternating case in the middle of words).
* The ' in "Don't" doesn't increase the search space that much because there are a small number of (common) contractions like that, and each of them would only break down into 3 permutations:
don't
dont
don t
(though the last one is highly unlikely). So you're adding maybe 30 more words to a search space much large than that.
* As far as the special character is concerned, it probably doesn't add too much to the search space. You can break down your phase like so:
So now you've got 5 items. Each item could be either a word or punctuation. The search space for words is huge. The search space for punctuation is small. Your algorithm just has to realize that if it chooses punctuation for one of the items, then it doesn't bother to use whitespace to separate it from the preceding word ("word," vs "word ,").
* You can also further reduce the effects of punctuation on the search space by realizing that punctuation will almost always follow a word, and not other punctuation. This also discounts punctuation as the first item in the passphrase too.
Edit:
Upon further though, if the attacker uses a simplified algorithm to account to upper-/lowercase, then it may not have that much of an effect on the search of each individual item (i.e. n!4 instead of (n+4)!). An attacker could break the common instances of case down into:
* All words lowercased "don't forget about poland!"
* All words uppercased "DON'T FORGET ABOUT POLAND!"
* All words titlecased "Don't Forget About Poland!"
* First word titlecased "Don't forget about poland!"
This discounts the possibility of people alternating titlecase across words, because that's probably as likely to happen as people alternating case within words (e.g. WoRdS lIkE ThIs). Granted, this also discounts proper nouns in the middle of the passphrase (things that don't require extra effort for people to remember to capitalize).
First, that's still beside the point. You shouldn't evaluate a password scheme solely by entropy if it is a password you intend to memorize. XKCD argues that it's easier to remember 4 random words than 8 random characters.
Second, your example isn't very good because it assumes that every 8 byte character (save one) is acceptable, which is rarely the case, especially if you are trying to memorize them.
Finally, as another commenter pointed out, you've got your math wrong, and even your example has more entropy for the words than the characters.
Yep, and that's assuming 8 random bytes from extended ASCII. The other point of the article was that nobody actually makes a password from random characters because words are easier to remember. And I think it's disingenuous to suppose people will enter alt-codes and that nonprintable characters would be allowed, so assuming MENSA-quality users with internal random number generators, we get 95^8 ~= 6.6E15, a clear loss of entropy.
>the 8 char password has much less entropy: 95^8 ~= 6.63E15 //
Most of the word usage is going to be limited though too. testyourvocab.com put the average at 27k I think. We're looking for words one can remember easily so the word pool is going to be a lot lower - 15000^4 ~= 5E16 FWIW.
real complex passwords are more like '"^vmds!w*é$sé550µW"'-à the point of the post was to show the maximum theorical possibilities for both.
As many pointed out not all 255 are usually printable and not all 171K words are used
then that's for english only and not counting old english and not taking care of possible punctuation
I've been using phrases and sentences as passwords for a while, and I've found that there are 2 main problems;
1) A lot of sites, still in this day and age, have max password lengths, so I still have a lot of short passwords. Usually this is bank sites and the like.
2) Password entry fields are often very short visually, and with a long password getting lost is much easier. I find I have to type them over A LOT.
These are the real issues with this. Banks seem to be borderline idiots when it comes to password security: case-insensitive, no spaces, 20-character max, small choice of "special characters". These are from Amex, who's password requirements sadly were even worse a few months ago.
With crappy password requirements, it's impossible to use decent passphrases. Getting locked out of your account for 3 failed attempts at typing a 30-character password is pretty obnoxious, too.
In situations that allow passphrases, you don't need a password generator like this. You can grab a sentence from your favorite book and use it. e.g. "How do you do, Miss Doolittle?" That's not the best choice, but it's still got way more entropy than a standard password, probably a lot more entropy more than you'll get by choosing a 4-gram composed of words from a corpus of 2k, and it's easier to remember.
The average book length is probably not over 400 pages. An average page probably doesn't have over 25 sentences on it. So the whole book contains only ten thousand sentences.
That gives you 14 more bits of entropy.
The total is 41 bits of entropy. This is one-eighth as secure as a 4-gram composed of random words from a corpus of 2k, if we measure strictly by entropy.
The situation is actually much worse, though: your favorite book is probably a popular book. So the number of bits of entropy provided by the choice of book might be a lot smaller than 27. I would guess that it's perhaps 10.
And many of those 129 million books are not very different. They contain quotes from other books, reprinted short stories, folk tales, set phrases, and so on.
In practice I think it might be difficult to mount a password-guessing attack using the Google Books corpus, because it's hard to get access to that corpus. The Project Gutenberg corpus would not be so hard.
Of course, the flip side of this is that we're veering off into attacks where you're targeting one specific person and know a bit about how they've chosen their password.
If you want to mount such an attack, fine, but most of us are dealing with the much-more-common threat of someone who gets a file or a database of hashed passwords and wants to crack them all in one go.
That's an interesting analysis. I can't really see any major deficiencies with it.
On the plus side, a sentence is probably going to be easier to remember than 4 random words. Personally, I draw some of my "high-security" passwords from literature, but then I modify the case and do the "leetspeak" character substitution, so a naive sentence attack would not work. A more clever one might, though.
Edit: Oops, as dpark points out, I swapped two digits. My apologies. Below, my original, erroneous comment.
41 bits of entropy means you have on the order of a one in 10^12 chance (2^41) of guessing it, and 2,000^4 is on the order of 10^16. So how is the former "one eighth as secure" as the other? Wouldn’t it be 10^4 times less secure, that is, 10,000 times more likely to be cracked?
If you choose one of eight small modifications to apply at a randomly-selected character, you get perhaps 6 bits of entropy from the choice of character and 3 bits from the choice of modification. That's better, but adding an extra common word to the end of the sentence would be better still.
Don't forget sites that require: "your password MUST contain at least one number, one uppercase letter, and one of the following characters: !, @, #, or $, but not %, ^, &, or *". I slap my forehead at how counterproductive these requirements are.
This is why, for my lab's password changer, the requirement for short passwords is simply that it must have one upper, one lower, one digit, and one none-of-the-above (and be at least 8 characters).
If you have a long password (at least 16 characters), all other requirements are waived so that you can use passphrases.
Forcing one or more digits has little value. You are better off with 1 uppercase one lower case and 2 non alphabet characters. (Users are very likely to be replacing a letter with 1,0 so 2options * 8posistions = 16 possibility's = fail.)
Then the space would be "the obeisance to the stupid website piece". Note that the entropy of "correct horse battery staple" is only one bit more than "correcthorsebatterystaple".
Usually the symbols involved are used by SQL or some other layer, and the programmers insert the password directly into the query string because they don't know any better. This leads to SQL injection and other issues.
So rather than discovering the correct way to do things, they try to prevent you from using any characters that might be involved in an SQL injection.
In some cases the guys on the backend know what they're doing, but the requirement can still be passed down from on high from some manager who absorbed the practice from another project.
They're trying to force users to use those characters in an attempt to enlarge the space passwords are drawn from. It doesn't work very well, of course. Instead of "password", you just get "Password1!". That said, I might make the same choice (for short passwords) if I were implementing password policy.
Edit: If you meant the "but not %, ^, &, or *" requirement, that's an indication that the devs don't know how to use prepared statements or at least escape properly.
Those requirements are there for the people who try putting just their name or "password" or their 4 digit ATM PIN as their password. For very short passwords, only having alphabetical (not even alphanumeric) passwords is terrible. Those requirements are there to prevent some really stupid passwords.
Also annoying is that a lot of sites require gibberish. Apple requires at least one uppercase, one lowercase, and one number. Some sites require a symbol as well.
Especially if you are logging into multiple systems regularly using domain credentials, it rapidly becomes apparent that the faster and easier the password is to type, the better. I've found that some passwords with symbols and numbers just roll off the fingertips with a little practice, others not so much, but longer passphrases are for some reason the worst.
This. My password is not a word, not even a word with substitutions, but it is optimized towards typing it on a keyboard (in terms of when caps come in, when numbers are added, switching hands, etc). I can knock it out in a second and it's muscle memory with zero risk of forgetting. correct horse battery staple, not so much. I lose some entropy by making it typing-friendly, but the cracking algorithm to simulate that would be pretty difficult. I'll take the loss.
As an aside, 1000 guesses a second? Seems generous.
Very few sites have a short max password length. I use 1password, and of the 63 sites I've stored passwords, all but 2 allow 25 character password lengths. Ironically, my Bank only allows me 15 characters.
I haven't typed a password in 3+ months - don't know what any of mine are anymore, so I find typing is no longer an issue.
I really like using 1password. I have a long passphrase as my unlock key, easy to remember, then do the randomly generated codes as long as is feasible for each different service.
1) Nothing written down
2) Unique per service
3) Adjustable difficulty & char set per service, to match their stupid requirements.
How about (NOT SECURE YET, IT NEEDS MORE ENTROPY):
from nltk.corpus import wordnet as wn
all_animals = set()
def add_to_set(animal):
all_animals.add(animal.name.split('.')[0].replace('_',' '))
for child in animal.hyponyms():
add_to_set(child)
add_to_set(wn.synset('animal.n.01'))
all_animals = list(all_animals)
actions = ['ate','chased','killed','fought','kissed',
'talked to','hated','loved','ambushed','fled'] # can add more
def make_password():
import random
random = random.SystemRandom() # is this secure?
choice = random.choice
return 'the %s %s the %s'%(choice(all_animals), choice(actions), choice(all_animals))
If you pruned out 90% of the animals (i.e. the obscure, hard to spell, or scientific names), this is still about 20 bits. And the passwords are kind of memorable (I've gotten such gems as "the dodo chased the guppy" or "the tigress killed the king charles spanial").
You could also add a humorous adjective ("rabid", "talking", "magic", "invisible", "evil" ...) or adverb ("roughly", "quickly", "quietly", "secretly" ...).
Completely random strings of words can be hard for me to remember, but something like, "the {adjective1} {animal1} {verb} the {ajective2} {verb2}" would be much easier for me to remember because the words relate to each other ways I already understand.
I expect we can get some fairly high entropy from just simple schemes like this.
However, the length of the password can be a real pain if you have to type it often, even once a day.
You could get about 8 bits per animal, and 5 bits per hand-written verb / adjective / place (32 choices per category). So that's about 7-10 words you need in the frame.
You could get decent entropy with: the {adj} {adj} {animal} {verbed} the {adj} {adj} {animal} from in {place}. That's 5+5+8+5+5+8+5 = 41 bits.
This is truly awesome. You could easily use a more complicated grammar, but it might get tricky to generate a password with a specified amount of entropy.
One slight addition to the xkcd password scheme that would add another order of magnitude of security would be to have your own personal "salt" that you add to all your passphrases. In this case, the salt would be a short, traditional, hard to remember password that you re-use with every xkcd style password. It would be hard to remember, but you'd only need to memorize it once.
So if your personal salt is "@T#23a" you would use "@T#23a correct horse battery staple" on one website and "@T#23a giant bug transistor leech" on another website.
That is what I do, I have a 4 character personal salt, like "7Pd$", and put it in the middle of a lowercase word or phrase. Having a symbol, lowercase letter, uppercase number, and number will satisfy most password requirements. I use it on many sites, so it is easy to remember. It also makes it simple to write passwords down, e.g. "correct horse ^ battery staple" which means to me "correct horse 7Pd$ battery staple", but would not be useful to someone who saw it, since they don't know my personal salt. A combination of what xkcd said and a short personal salt that's easy to remember is probably best.
If this kind of thing takes off, it will become easier for dictionary based password attacks. Using this advice would go a long way towards preventing this.
Easier, yes, but not easy. A dictionary attack on 4 words is the same as brute forcing 4 letters except now instead of just 26 letters there are thousands. 2000^4 vs 26^4 = 35,000,000% more to check.
Careful! This is only using `Math.random` and does not attempt to use `window.crypto.random` (though most browsers do not support it yet: http://jsfiddle.net/alanhogan/trUYu/) or anything that would attempt to bring real entropy into the process.
I don’t mean to fault the creator of this page, but at the same time, I would not trust this generator for important passwords, simply because you cannot know if others are getting the same 'random' results as you are.
> In the Javascript engines of IE (Trident), Firefox (Gecko), Safari (WebKit) and
Chrome (V8), the output of Math.random() can be used to reconstruct the
random seed, and thus provide both this seed and the current “JS mileage” (i.e.
the number of times Math.random() was invoked).
I wouldn't use a JS program served from somebody else's website to generate my password anyway. How do I know it's not sending them a copy of the passwords it generates?
He recently changed it to use a random seed sent from the server instead of the client-side RNG. Over, I believe, unencrypted HTTP. Your suggested countermeasure would not have detected that attack; indeed, perhaps it was already in place before you reported no evidence of attacks.
It would, however, have made it harder for him (or your ISP) to tell whose password they'd stolen.
function rpass() {
strings /dev/urandom | grep -o '[[:alnum:]\/!@#$%^&*()<>,.,{}]' | head -n $1 | tr -d '\n'; echo
}
Then run $ rpass 16 and get a 16 character random password with a fairly high entropy. Then just use a service like LastPass or a solution like KeePassX or even a single GPG-encrypted file to store your passwords. Problem solved.
Passwords are evil. Most of them should be treated the way you'd treat your private SSH or SSL key. Whenever you can eliminate a password and get the user to authenticate using a third-party identity provider, you are doing them a favor.
Edit: with 80 possible characters, you get 80^16 possible passwords: 10^19 years at 1000 guesses/second.
I prefer using a program like Password Safe (http://passwordsafe.sourceforge.net/), and use a safe password that's a long sentence (with punctuation). Then I can use arbitrarily long and complex passwords for all my accounts, and not have to worry about memorizing them individually. The password safe can even be synced across computers using Dropbox.
I prefer KeePass simply because it's got implementations on multiple OSs, as does Dropbox (to sync the password database file). So I've got it on my iMac, Android phone, Windows laptop, and Windows work PC.
GPG-encrypted free-form file (though it's fairly structured), edited via vim and a well-known "auto-encrypt/decrypt GPG files" configuration: http://vim.wikia.com/wiki/Encryption
(Actually, from that page, vim now has built-in blowfish encryption, which I'll have to look at -- yet another argument in favor of sharing tips on the 'TarTubes: you may learn something even when you're sharing your own knowledge).
I can't help but think that this is a solution to the wrong problem. The big problem with password security in the modern world really isn't that they're easy to break, but that they're pervasively reused between sites. So breaking them (for example, by reading them in plain text out of a dumb database!) in one place opens up attacks on higher value accounts.
The fix, of course, is to get users to stop re-using passwords between sites.
How does making passwords more memorable fix this? If anything, forcing users to use random base64 strings strikes me as more secure as they will be forced into some sort of password locker implementation by their inability to remember them.
Right, maybe if you use the first letter of the words in a sentence, like "Hey Jude, don't make it bad, take a sad song, and make it better." -> "HJ,dmib,tass,amib." Then you can add in some characters that make it different for each site without it being obvious which characters you added. I wrote a blog post on how to create different passwords for sites that are easy to remember: http://craigquiter.com/post/8668237043/creating-and-remember...
Is it possible that the breached Sony passwords he was analyzing may have been cracked with dictionary attacks? Maybe the reason only 1% of the passwords had a non-alphanumeric character was that the crackers mostly didn't crack the passwords that had any non-alphanumeric characters.
"For those of us pedantic enough to want a rule, here it is: The preferred form is "xkcd", all lower-case. In formal contexts where a lowercase word shouldn't start a sentence, 'XKCD' is an okay alternative. 'Xkcd' is frowned upon."
Note that 44 bits of entropy is still nothing if you want protection from off-line attacks on password hashes. A couple of GPUs together can calculate a billion hashes per second, which eats through 2^44 possible passwords in only a few hours.
This was recently demonstrated when the mtgox password database was compromised.
edit: but this shouldn't be a problem if the password is properly hashed with bcrypt or some other scheme with a work factor.
But this approach scales at a much faster rate. Simply adding a fifth word throws even a billion-per-second attack out into hundreds-of-years territory.
Example generated phrase: "married greatly snake battle"
These phrases would be easier to remember if they made grammatical sense. Like Chomsky's famous "colorless green ideas sleep furiously" - the words relate to each other grammatically, even though it makes no sense.
Imagine memorizing "married greatly snake battle" vs "married snakes battle greatly." I think the latter is easier.
Not necessarily. If only one-fourth of all English words are grammatical after an average prefix, then you lose two bits of entropy off each word after the first. I suspect that the actual situation is not as bad as that. You might end up using "uncommon" words like "deceased", "advent", "fearful", and "ram" to compensate, instead of more common words like "strongly", "contains", "afterwards", and "corporate", but that doesn't seem like a major loss to me.
Any narrowing of the search space will most definetely reduce entropy.. by how much is calculatable but I don't have the time nor language statistics right now to do it.
I'm not sure the technical meaning of entropy in this context, but personally, I would offset the narrowing effect of "restrict to grammatical phrases" by adding uncommon words. "Besotted ophthalmoscopes gambol indicatively" forms a coherent, if silly, word picture for me, so I think I can remember it.
As far as possible combinations, my vague memories of linguistics 1001 include the idea that this is one of the essential properties of language: it has so many possible combinations, that every speaker is continually creating sentences that have never before been uttered. Unlike, say, honey bee dances, which are often repeated.
> I'm not sure the technical meaning of entropy in this context
Roughly, it's the logarithm (base 2) of the number of guesses that an optimal password guesser would have to guess in order to guess your password. It's a measure of how unknown your password is.
> it has so many possible combinations, that every speaker is continually creating sentences that have never before been uttered.
Yes, this is why all of the suggested alternatives like "choose a line from a popular song" are so much less secure.
You do lose a little entropy that way: you've merged "great" and "greatly", and "correct" and "correctly", suggesting that your modification process considers adjectives and their corresponding adverbs as equivalent. If those examples are typical, that removes one bit of entropy. But you've probably added more than that back in, if your choices of "a", "will", "the", and especially "inserted" are unpredictable. (Alternatives might include "the", "would/did/could", "a", and "removed".)
For what it's worth, Google finds more hits for "fearful" than for "afterwards" and more for "ram" than for "corporate". ("Strongly" and "contains" do beat "deceased" and "advent", though. And yes, many of the hits for "ram" are really for "RAM".)
I would actually advise going against this advice. While it isn't a best practice, password sharing can and does happen, as does shoulder-surfing. It would take a LOT of effort to memorise my password, but a simple four word password will probably be remembered by accident. In a year's time if I piss a friend off, I don't want my Facebook password to be readily accessible in their memory.
I think more people need to learn to remember arbitrary strings. There really is no way around that problem if you want a decently secure password, and it's rare someone has a "good memory" - in most cases they've just learnt how to remember things well.
(Note: This doesn't really apply to me or most of us here in most cases, but for example my WiFi password is of the form "Mycatsname9" and yet my neighbour still has to ask me for it whenever her phone forgets it)
How do you share your preferred password? Because I guess everything but sending it per text/mail would be tedious, while it would work better with a couple of words.
Shoulder surfing: It's certainly a risk, but I'd say that prolonged shoulder surfing shouldn't be possible. If I type fast, it will be very hard to make out the phrase. If I type slow, you cannot stand around that long.
And - I'm not a security expert, but how much do you gain if you saw a couple of chars here? My intuition (yeah, shouldn't trust that) says that it's worse if I watch you and know the _first_ character of your password than you seeing the first 1-3 characters of the first word of my passphrase?
(We don't know the name of your cat, so judging the quality of the password or your neighbo(u)r's ability to remember it is hard)
> And - I'm not a security expert, but how much do you gain if you saw a couple of chars here? My intuition (yeah, shouldn't trust that) says that it's worse if I watch you and know the _first_ character of your password than you seeing the first 1-3 characters of the first word of my passphrase?
Novel thought and possibly worth persuing, I hadn't thought of that. I want to re-iterate this isn't something I broadly apply across all my passwords or even many of them, just that for some users password sharing is a use-case.
So you're advising against what appears to be a more practical and secure methodology on the basis that it's worse when you share your password? If you share your password, your exact problem is that you're sharing your password -- it's not how easy or hard the password is to remember. In fact, why does this even have any significance when the person you're sharing it with can just write it down?
Oh and if within a year's time you do not change your password, that could very well be another problem. I think you'd be better off just using easy to remember pass phrases and changing them every once in a while. Shouldn't be a problem because they are, after all, easy to remember.
> So you're advising against what appears to be a more practical and secure methodology on the basis that it's worse when you share your password?
Iff you share your password then yes, you should use a scheme more suitable for that. And yes we shouldn't be sharing passwords, and if we do we should be changing them, but in the real world where most people don't do that I don't think we should encourage passwords which their friends will remember easily - because that is a very common attack vector.
>I think more people need to learn to remember arbitrary strings.
The entire point is that humans aren't very good at doing this.
>(Note: This doesn't really apply to me or most of us here in most cases, but for example my WiFi password is of the form "Mycatsname9" and yet my neighbour still has to ask me for it whenever her phone forgets it)
This is actually exactly the kind of scenario where using pass phrases makes the most sense. WPA2 is vulnerable to rainbow table attacks; relatively long passphrases are both easier to remember for mere mortals and less likely to be broken by a rainbow table attack.
> The entire point is that humans aren't very good at doing this.
How true is this? Because everyone that tells me "I have a bad memory" doesn't even know the most basic tricks.
> This is actually exactly the kind of scenario where using pass phrases makes the most sense
I agree, actually - I don't mind if my neighbour does remember it, I was just trying to illustrate that things that are easy to remember are remembered by accident, and things like that are easily forgotten without effort.
Yes, if you share your password, it's probably better to use a password that needs to be written down and can't be memorized, in order to have a chance of revocation. (Or you could just change your password.) But for most of us, most of the time, memorizable passwords are a boon.
If you allow multiple occurrences of the same word, you can get slightly higher entropy while making the passwords potentially even easier to remember.
echo $(for i in 1 2 3 4; do shuf -n1 /usr/share/dict/words; done)
(Sorry, I'm not very good at bash, so this loop is probably not idiomatic.)
You could probably get a few more bits of entropy kind of easily if you use words from other languages. This doesn't help the monolingual among us but it's great for me.
Some Koreans do this: they just type up some Korean words. Since most password fields only accept ASCII symbols, the password gets entered as a nonsensical string of alphabets. For example, the Korean word '비밀번호' (meaning 'password'), when typed on a standard Korean keyboard, becomes 'qlalfqjsgh'.
Yes, though the number of additional bits you get from increasing the size of the dictionary decreases fast. E.g. suppose English and German have the same number of words, then using both only gives you one more bit per word.
(Actually, slightly less since some words exist in both languages. Like `hell'.)
>Yes, though the number of additional bits you get from increasing the size of the dictionary decreases fast.
Well, sure -- but once you're at around two or three languages, you get to imagine that the attacker doesn't know what languages you're using. If I use English, Japanese, and Spanish, I can figure on the attacker needing to check the Germanic (English, Dutch, German), Romance (Spanish, French, Italian), and Asian (Japanese, Chinese, Korean) languages at a minimum.
Jargon helps too, and proper names. "dijkstra bicycle entonces boojum daihinmin"
Most really strong systems lock an account after a couple of incorrect guesses. I assume this is all for systems that may not be secured to prevent brute force.
No, I posted about my own generator in 2005: http://darius.livejournal.com/38591.html (getting the words from Beowulf). Then Zooko or Kragen pointed out some even older system in response (I forget the name).
Compuserve was generating automatic account passwords in the early 1980's from two dictionary words and a non-alpha character in between them. Mine was "sleeve;coast". No doubt they didn't invent the trick either.
Such a password scheme provides much less than 44 "bits" of entropy. Considering the use of 4 randomly chosen words from the c.170000 english words in general use, means we can guess the paraphrase in around 2^22 tries - even less than "Tr0ub4d0r3&".
EDIT: I'm totally wrong, it's more like 2*10^22 ... oops!
Assuming the attacker knows the the password creation method (and the math assumes the attacker does in the first case), then the attacker knows the word list and the passphrase algorithm.
11 bits per word gives you a grab bag of 2048 possible common words. To guess the password, assuming each word is unique, the attacker needs to try
2048 * 2047 * 2046 * 2045 = 17,540,692,561,920
possible combinations. Initially, you'd think an eleven character password with totally random uppercase, lowercase, numbers and symbols would give you 66 ^ 11 combinations for 66 bits of entropy, but since nobody can actually remember such a random combination, the resulting passwords using these rules are much less secure than that.
Wouldn't you first have to know that the passphrase consists of four randomly chosen words (eg not three, five, or eight)? To me, that's the underlying strength of the approach that the comic (!) is trying to highlight.
Err, yes thanks. I was trying to emphasise the fact that the multiplier of entropy is not the "bit", but the "word" (in the linguistic, rather than the computer architecture sense)
The beauty of this discussion is not just "How to create memorable but hard to break password?" but "How much deep insight can a 4-6 frame cartoon contain?"
The signal to noise ratio of xkcd is fantastic! They've again zipped a great discussion in just a few frames.
And also, sites that have a limit on password length. And sites that have a silent limit on password length, and secretly truncate it. And sites that have different truncation, depending on which form you use. And sites that require a Capitals, lowercase, and numbers, because nobody would just use name+birthday.
I still sense a problem. While these passwords ARE easier to remember, there is still the security flaw that most people reuse passwords. A key-logger or shoulder-surfer could snag this (or a website could store your password in plaintext and be compromised) and then it's game over. Password managers are the future. They can memorize unique passwords of any length and complexity for every website you use, and they can store the passwords with very strong encryption with 1 key that is memorized. That key is where a password like 'correcthorsebatterstaple' could be effectively used.
I remember wanting to sign up on a website that had the worst password "feature" ever : you typed your password in a plain textfield, and once you clicked away it was changed to a password field. Seeing as how this "feature" was on the main page I decided never to use this service and sent the website an e-mail saying that their password field is not clever but instead is a big fat counter-security measure.
Edit : I managed to find back what website it was : http://www.advirtus.com/ when you register it shows the password as you type it
1: what purpose do the stupid asterisks serve, anyway? I understand them on an ATM machine, but not on my desktop PC or phone.
2: Very frequently (like, maybe 50% of the time) when trying to type a password on my phone, I miss the little "key" and mistype, but can't see that I did. I have to make multiple tries at entering the password. This feature would prevent that.
So it looks like all upside, with no cost (when used only in appropriate contexts).
1. Shoulder surfing a password is way easier than other forms of cracking. Sure, you might not think people are looking at you as you type your password, but then this becomes a crime of opportunity...
2. There are technological ways around this, like finding a bigger keyboard for your phone, using password management software, imploring the website to make an app, practicing typing, etc. Also, BB/iOS/Android all show the most recent character you typed in a password field. Many soft keyboards give feedback about the most recent character you types as well.
I'm fine with my password fields being obfuscated.
I agree that this feature is good while working with a smartphone, but I'm pretty sure Android has a settings somewhere to always show the last letter you typed in every password field. I would be surprised if there wasn't a setting for that also on iOS.
The thing is, I think it makes perfect sense to implement this in certain situations, but at an OS or browser level, not in the website or inside an application. Passwords are something we have grown used to and we always expect them to behave the same way! If we were to change the way passwords are handled, it should be consistent across everything.
For example, browsers could implement password fields with a checkbox next to it that lets you show/hide password at your will. The fact that this website has only one setting (always show when in focus) is scaring me.
> "I would be surprised if there wasn't a setting for that also on iOS."
That's the default behavior for password fields in iOS.
Trick is, when you have a long password it takes far too long to shift focus from the keyboard to the text field to verify each character before moving on.
I'd very much like to have a client-side show/hide button for password fields.
I wish that was why I didn't know my password. The real reason is that 1Password manages that part of my life for me so all my passwords are long randomly generated strings that I don't know.
Four English words selected randomly from a large dictionary is certainly secure. But it's unwieldy to type 20+ character passwords. I prefer 10-digit random alpha-numeric passwords, although these are hard to remember and type. Best compromise in my opinion is to use a hashing function with a moderately difficult passphrase, e.g., Site_Password = Hash( Domain_Name || Passphrase).
This still creates a false sense of security since it seems (and I stress seems) to implicitly suggest you can use the same password on every site (I assume this since the argument for its use is the ease of remembering). If one site you visit handles passwords in plain text and it has your email, upon a breach all your accounts are effectively compromised.
as embarrassing as it may seem, although having read the good part of this thread, i still don't understand why a four word password, can be more random than something i would get like this:
~$ pwgen -s 8 C0olz5KM
Would anybody care to explain this to me, or at least point me to a good place where i can read up on this?
Assuming pwgen isn't actually defective (I haven't looked), you can get more entropy in a shorter password with pwgen. The above looks like 6 bits per character to me, so 48 bits in all. That's 4 bits greater --- 16 times better --- than Randall's estimate for "correct horse battery staple", which is much longer. But "correct horse battery staple office" gets you up to 55 bits. Is it going to be easier to remember "correct horse battery staple office" or "C0olz5KM"? And how about typing them without making errors?
If you're bi-lingual in a non-european language, transliterating obscure phrases from the other language could work well. For example, the poetic title திரிகூடராசப்பகவிராயர் would transliterate to thirikUdarAsappaKavirAyar. Add some subs & punctuations and I'm done - very rememberable (at least for me) :).
Case is only 8 more bits assuming you randomly assign case to all of your letters. More likely it will be only one more bit. Then requiring an extra character is equally good.
I just hate sites that won't accept spaces and restrict how long a password can be (usually to something relatively short like 8 or 12 characters). Also, many require you to use mixed case and/or numbers or "special" characters. I usually just use complete sentences.
Funny comic as usual, but the 20 years thing is probably invalid. How long would cracking tr0ub4dor&3 on a 486 take? Also I remember some systems didn't allow pass phrases back then. Windows NT in particular had a max password limit of 14 characters, iirc.
I always thought using two password fields with simple words would be much harder to break than one field only (which can be used to really strange passwords but also for simples ones as we all know). Someone care to calculate how much it would take to break it?
Well it depends how it's stored, but assuming a fairly standard setup it wouldn't particularly help.
The main issue with website security isn't people brute forcing the website login box, it's people cracking the hashes after stealing them. So if you had two easy to crack hashes stored in the database, you crack them both and off you go.
Oh, I was (like the article) assuming you would concatenate both words (add a space or something else in between if you want) and it would be all stored in just one field. What about it?
I tried to generate all possible permutations of 4 of the 2000 most popular words in the English language.
My computer failed miserable after about 2000000 permutations and considering there is 10^13, i wont be making a rainbow table for this new type of password.
I generally use gpw to generate long random but pronounceable passwords. Something like 'armsdaynistoppo' is fairly entropic, easy enough to remember, and when I'm used to it I can type it much faster than 4 random words.
This article is math true, however, hackers no longer use brute force attacks and the most popular method is to attack a weak website like for example a not very popular blog, then if they succesfully broke it they have a password and a email account from you and if they are very lucky you have the same password for the email account, so, they got you.
Therefore, nowadays it is safer to have different passwords for every site.
Personally, I love to use lastpasss for my personal use and keepass for the office to store and manage passwords. Obviously, the weakest link of the chain is my password for the password manager application.
Any of you use a different password manager?
You can add a variety of two factor authentication options to lastpass (phys OTP, yubikey).
You can also allow/disallow "offline" access to your lastpass account when using these two factor options (force second factor at all times or allow single factor if offline).
This is how I come up with passwords; I find a phrase that I can remember without too much trouble then I use the first letter of each word to make a password.
Phrase: Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone
Password: 3RftE-kuts,7ftD-lithos
Easy to remember and highly secure. I have been using this method for years.
Bonus example: Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty
4sasyaofbfotc,ann,ciL
Less secure then the last example but still strong. Especially if you use uncommon strings like the words to a song by a local band or a phrase from the newspaper or an unpopular book. That way even an attack targeting this method will take a long long time.
This is probably not as secure as the xkcd scheme if you don't make up the phrase yourself. See my comment above with calculations about a variant of this scheme. I suspect that both of your example phrases are among the million most quoted phrases in the English language, giving them entropy of under 20 bits.
This is the same method Apple officially recommended in their help for choosing a secure password—the example they gave was “Tnf,tfws95” (“That’s not flying, that’s falling with style”) followed by the year of Toy Story’s release (where the quote is from). I agree that it’s an excellent combo of passphrase and obfuscation.
Unfortunately, their documentation now[1] gives the same kind of example that XKCD points out will be exceedingly difficult to remember correctly.
Assuming that this method for generating passwords gets popular enough, brute force tools will begin to create an optimized attack for these passwords.
As there are so little words available, if I were to write a brute-forcing tool, I would try combinations of four words in my wordlist once I failed with my one-word-dictionary attack before I start trying out all characters.
But all is not lost: Either use more words or vary the amount of spaces you put between words. This way the dumb optimization "try four words delimited by space" wouldn't work on your password and they would have to go over to plain old brute forcing at which point, I agree, the longer, the better.
Properly executed, this will protect you against brute force attacks. No need to do nonsense like adding more spaces.
Of course XKCD botched it and said an inadequate minimum length...
Notable quote from the article: "This level of unpredictability assumes that a potential attacker knows both that Diceware has been used to generate the passphrase, the particular word list used, and exactly how many words make up the passphrase."
He calculated the entropy assuming that knowledge was had, I think he botched it in saying that 44 bits were enough. Of course if he were to recalculate it without those assumptions, but make it clear that he were doing that, then it would be better.
Best to error on the side of suggesting more. If the reader is technical enough to to realize you did that, then they're technical enough to estimate a more suitable value themselves.
Here's how I use diceware:
1) throw single die to determine number of languages
2) throw dice to determine which languages (number of iterations depends on the result of #1)
3) throw dice to determine language
4) throw all dice to determine first word in language
5) repeat #3, #4 until you've reached the desired number of words
But the combinatorial explosion starts putting that out of reach fairly quickly.
Let's say you're restricting yourself to a dictionary of only 2000 common words.
A single word, up against a 1000-tries-per-second attack, would therefore fall within 2 seconds. Clearly bad. Adding a second word gives 2000^2 combinations to test against, which would require about an hour. Still not good. A third word, pushing that up to 2000^3, takes us to about three months, which is probably acceptable in many cases. The fourth word, at 2000^4 combinations, gets us to 500 years, which is well beyond what most people are ever going to need for a web-based password.
Now, if you want to bring this into the realm of passwords in other contexts, which can perhaps be brute-forced with several-billion-tries-per-second attacks, this approach still works: you just need to add an extra word or two. At 2,000,000,000 per second the four word combination might only take a couple of hours, but a fifth word takes that up to half a year, and a sixth out to the thousand years mark, whilst still falling easily within the standard 7±1 memory limit and being much much much easier to remember than m(yV7&NlxAIZNx3>&@p&8/kX.
A password doesn't necessarily need to be something that is easy to remember. It just needs to be a unique token that is easy for you personally to present when needed.
I currently use a keepass file stored in my dropbox folder. I am not certain what the silver bullet to online authentication will look like. However I suspect it may not require you to remember more than one secure password, perhaps not even that.
Trust online is hard though, looking at the problems establishing trust online reminds me how clever human beings are, we sometimes make mistakes but we are pretty good at evaluating trustworthiness in the real world.
You are making the same error that led to the popularity of passwords based on a common word and some substitutions.
Your calculation is for 11 characters, each chosen completely at random out of 63 symbols. People don't chose a password that way - we typically can't generate or remember random symbols.
XKCD's calculation is for a common word + common symbol substitutions and additions: log2(#words) + log2(#capitalization options) + log2(#substitution options) + ...
You calculate password entropy of a password drawn from a set of equally likely passwords by taking the log_2 of the size of the set. (This is a convenient special case of the more general formula for entropy.) A convenient thing about this is that if you concatenate two things (in such a way that you can pull them apart again) the number of possibilities is the product of the possibilities for each of the things, so the entropy is the sum of the entropies.
The XKCD comic is only partially correct. Depending on what source you believe English text has about 0.6 to 2.3 bits of entropy per character. This means you need somewhere between 4.7 and 18.3 characters in each word to reach 11 bits of entropy per word.
Assuming entropy is closer to 2 bits per character this is a realistic situation. However, when you assume entropy is closer to 1 bit per character the words have to be too long to be realistic.
Randall is saying exactly that: Through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess
If you look at the source, their word list contains around 1600 words. That is just no where near enough. Using this would give you a very easy to crack password. You need to make up your own passwords with words you come up with.
1600 words is 10.6 bits per word. If you want to reach 70 bits (safe from offline attacks with custom hardware), that means you need 7 words, which is within most people's capacity to memorize. If you increase your wordlist to 65536 words, you can get 16 bits per word, but you have to include words like "lefeuvre", "aarau", and "aubagne". Then you can reach 70 bits in only 5 words. That's not worth it.
Inventing your own words is unlikely to produce very random words. You'll probably mostly invent the same few hundred nonsense words that any other speaker of your native language would invent.
In other words, you have no idea what you are talking about and should not have posted.
On what basis is that "very easy"? Four words from 1600 is 1600^4 permutations, which would take over 200 years to test at a 1000/second attack.
Sure, if we're talking about a different type of password, and 2-billion-tries-per-second attacks, it'll fall in about an hour, but simply pushing that out to six words from that dictionary will still stymie that level of attack for a couple of hundred years. The size of the dictionary is much less important than the length of phrase you generate from it.
DANGER: This gives no more entropy than what srand() uses, which (at least for GNU awk) is simply the current UNIX time, which (if we assume that when you generated the password is known to within one year) means only about 25 bits of entropy.
srand in awk is platform specific. on most recent is isn't a straight call to srand().
I have another version that I use that stuffs srand but in the end I figured srand from a 250k dictionary is still better than picking words out of your head from a ~1k dictionary
I think you've got it backwards: the entropy calculation here assumes that the attacker already knows the scheme. The 2^44 possible passwords are therefore a lower boundary for the entropy.
In practice the attacker must cast a wider net because he doesn't know exactly which word list you use, or if you are using a completely different password scheme. This increases the difficulty.
When picking a password, you don't just care about the entropy. You also care how far down the password guessing order it is.
People who want to guess a password don't just brute force at random. They use a guessing order that goes through more common classes of password first. So if correct horse battery staple becomes a popular password scheme, these will end up attacked before other password schemes.
(See http://www.schneier.com/essay-148.html)
Unless you're going to use a password safe full of nasty passwords, you should pick your passwords using an unpopular method.
The point is that this approach pushes brute-force guesses out into territory that makes it unlikely anyone will crack it even if they know exactly what scheme you're using.
People seem to be massively underestimating just how long it would take to brute-force four dictionary words in a row.
This scheme could be easily guessed by a dictionary attack that simply ran through combinations of dictionary words instead of individual characters.
If this became a popular scheme, the whole entropy argument goes out the door. It only has more entropy if we compare the two schemes on a character-by-character basis (~10 vs. ~25). Of course the longer string will appear to have more entropy.
But if a password guesser expects the pattern of the "four common words" scheme, as they might if it became popular, it's not nearly as entropic. A better comparison would be to consider each word as a single "character" from a 180,000 sized alphabet (for an English dictionary).
Calculate the entropy of that and you'll find it's in the same ballpark.
If you took the suggestion in your last sentence instead of offering it to the rest of us, you would see that the entire rest of your comment is incorrect.
I find the idea incredibly stupid. If I know someone who used that precise generator to produce his password. Then I know that the generator has less than 2000 words in the dictionnary. It then takes me only a few minutes to guess his password, rather than 550 years.
Conclusion: Don't ever use this password generator, write you own, and tell no-one about it.
You up for a challenge? I just generated a pass phrase with this generator, and hashed it with SHA-1 (echo -n ... | sha1sum), no salting or anything else special. Feel free to brute force it.
It's currently running at 3200000 tries per second on my Xeon machine. I am probably going to get bored before I find the right combination because I calculated it could take up to 52 days. :)
But anyways, it is still a lot less time than trying to bruteforce something like Tr0ub4dor&3 in my opinion.
It seems you like challenges, if I gave you a SHA1 hash of something similar to "Tr0ub4dor&3", would you be able to crack it (without rainbow table) under 52 days ? I don't think so.
Let's say the dictionary only has 1000 words in it. A phrase of four words in a row from that is still 1,000,000,000,000 possibilities, which is going to take you significantly more than a few minutes to work through at 1000 tries per second.
Not a good idea, sadly. In fact I'd go so far to say this is a really bad suggestion; because it gives a false sense of security.
There is potentially a lot less entropy in this password than "Tr0ub4d0r&3", assuming the hacker is smart enough to realise he can trivially test combinations of dictionary words in very short amount of time.
(EDIT: I'm way out of touch with this; it's not as trivial as perhaps I figured. See lower in the thread)
However; it is in the right direction - introducing some sort of extra entropy can invalidate that form of attack and make this as secure as XKCD suggests.
What do I currently do? I take a reasonable length common word, do a string/number replacement as so:
H4ck3r N3ws
And then repeat it 3 or 4 times:
H4ck3r N3ws H4ck3r N3ws H4ck3r N3ws H4ck3r N3ws
For extra entropy mix it up:
H4ck3r N3ws H4ck3r News H4cker News Hacker News
That's a simple example - so long as you have a reasonably random scheme then it is not easy to test against, but is fairly simple to remember.
Bingo :)
(EDIT: for the down voter(s) note: XKCD specifically says random common words - obscure words are another matter)
The comic is already taking worst case scenario into account, that the cracker knows that your password is made of words and which dictionary you used to generate the password.
In the comic he is using a 2048 word dictionary, which gives 11 bits of entropy per word (log2(2048)), with a password made up of four words that gives a total of 44 bits of entropy.
But if we were to assume that the cracker knows nothing about our password, say other than it being all lowercase a-z, then we have an entropy per character of log2(26) or 4.7. For the phrase "correct horse battery staple", which has a length of 28, the bit entropy of that phrase, under those conditions is 4.7 * 28 = 131.6. Which is nearly to the point of the cracker being more likely to find a collision.
> Start with: You don't know the dictionary I used, but have to use one that seems 'good enough' (i.e. a superset of mine, if possible).
People are likely to use a standard English dictionary. In my experience (which is exactly within this field) people use a fairly tight subset of the English vocabulary.
So I would be quite happy to test for a dictionary of, say, 100,000 words and be hopeful of a good hit rate (note that XKCD says common words, which is easily missed)
Our software has a test (which runs about third in its list of tests) which does dictionary combinations up to three words (two words is quite a commonly used password based on our statistics) with a dictionary size of <s>175,000</s>17,500. (Edit: sorry, apparently it is an order of magnitude smaller, I checked with one of the engineers :)) This includes English words plus a few commonly used foreign/slang terms. The hit rate on this is fairly high.
(we crack document/windows passwords mainly)
You could of course choose deliberately obscure words to invalidate this - but they aren't so easy to remember (so people will tend not to).
If someone is going out of their way to secure a password, sure, you're going to hit a brick wall. But what every password scheme tends to forget is the "human factor" whereby people not concentrating on being secure will introduce attack vectors.
If the dictionary really has 100 000 words, you're looking down the barrel of 52 bits of entropy for a three word phrase
In a more likely dictionary of the 5000 most commonly used words in the English language, you still get a three word pass phrase of about 40bits of entropy. Make that a four word passphrase, and you're back up around 52 bits.
This is simply incorrect. If you assume you really do have 100 000 "characters" in your alphabet this is correct. However, your alphabet follows a certain pattern: It's English text.
At that point its easier to brute force the individual characters. English text has about 1 to 2 bits of entropy per character. Lets assume 1.5 bits per character on average. That means that to really get 52 bits of entropy for a 3 word phrase you need to have at least 35 characters in your 3 word phrase, or about 12 per word.
Your password is only as strong as the weakest link. In this case English is easier to brute force than your 100 000 character alphabet.
It's simply false to assume that "yes no one three" has 44 bits of entropy because it got randomly selected out of a 2048 word dictionary.
Yeah, no. If I have a dictionary of 100000 words, then each word represents about 17 bits of entropy. If I have three words, that makes 3 x 17 = 51 bits of entropy.
You deny the fact that English text can be attacked separately from your dictionary. English text is very predictable, for example e is much more common and q is almost certainly followed by u.
I'm not making this up on my own either. Please check out http://en.wikipedia.org/wiki/Entropy_%28information_theory%2....
Let me quote the important part:
The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.[2]
Consider the following example: You have a wordlist of 100 000 words. It seems only normal that log(100 000)/log(2) is equal to 16.6 bits of entropy.
Now consider you take three words out of that list completely at random. You get the words "a no we". Assuming 16.6 bits of entropy per word you do indeed have to search through a space of 49.8 bits but only if you attack that via the dictionary
It is clear that in this case you can do a different attack. Instead of brute forcing the words you can brute force the characters on their own with a search space of a-z and space. This equals log(27^7)/log(2) or 33.2 bits. A lot less than 49.8 bits estimated when only considering a dictionary approach. In reality English text is so predictable that you don't have to search even close to 33.2 bits of entropy if you brute force it with an algorithm that is aware of English text. Assuming Shannon's 1.3 bits per character estimate this password has 9.1 bits of entropy.
I understand that this is an edge case with very short words. But I choose that to try and show that there are other ways to attack the password by using a 27 character dictionary. This is cold hard math and therefore much easier to accept than the magic entropy estimation of englist text. Once you see that this way can reduce your entropy calculation it's not that hard to accept that there might be more ways to reduce the entropy ever further.
1) the predictability of the distribution of characters in the English language has nothing to do with this type of password - the symbols aren't characters, but words.
2) that figure of entropy per character of 1.3 bits per character only applies to English text, and the figure is low because there are a bunch of small words, like "and" and "the" that are regularly repeated. The entropy per character for words containing 6 letters or more, not arranged in sentences is a lot higher, like about double if I recall correctly. So sure, just as I can expect to get brut-forced if I choose a pin of 0000, I can get brute forced if I choose a passphrase of 'and the in'. Good luck forcing "queens examine faulty charges" though.
I am merely suggesting that the entropy can be less than what is estimated by looking only at the dictionary.
Re 1: words still consist of characters
Re 2: Certainly correct, but to ignore the possibility of English words having less entropy than it appears at first is odd given the patterns English words often follow.
I'm interested in reading more about those entropy estimations, can you recall where you read about it? According to Applied Cryptography Shannon states that entropy per letter decreases as the text grows. Shannon estimates 2.3 bits per letter for chunks of 8 letters but it drops down to between 1.3 and 1.5 bits per character for 16 character chunks.
Applied Cryptography cites a paper by Shanon called "Predication and Entropy in Printed English" in the Bell System Technical Journal from 1951.
I Have not personally read it yet but will try to find it in the near future.
The mistake you are making is that you keep on wanting to treat a passphrase made up of words as being a 'text' in the sense that Shannon was using, but it is not a text in the Shannon sense of the word.
Shannon was analysing real world messages to arrive at that figure, not a string of random words. Here's a way of thinking it through that should help you see the problem clearly.
Let's imagine that Alice has just used a dictionary to generate a passphrase. Furthermore, let's imagine that the dictionary in question is a collection of 6 character or longer words pulled from "Pride and Prejudice". I'm pretty sure Jane Austen tops the 5000 words needed for such a dictionary.
Now, in the simple example, let's imagine that the passphrase is one word long. Bob is an attacker that knows the dictionary, He will guess the word in a maximum of 5000 tries. After 2500 he will have a 50% chance of having found the word.
Charlie, a second attacker, isn't going to use words as symbols, he's going to try and brute-force the word just by throwing random characters at the problem. He doesn't know the length of the word, so he's going to have to try all lengths of the word starting from one letter and working up. I trust that you can see that Charlie is going to need a lot more than 5000 guesses to find the passphrase.
David is a bit smarter than Charlie. He decides to use a Markov Chain of 3 character length to generate his guesses, so that the generated passphrases start resembling English words. The Markov chain was trained on text from the Sydney Morning Herald. David is going to do a lot better than Charlie, but that Markov Chain has to be capable of generating all of the words in my original dictionary, plus a bunch of other words, plus a bunch of gibberish non-words. He's clearly going to take more attempts than 5000 to find Alice's passphrase.
Evan is smarter still. He trains his Markov Chain using only words that are 6 characters or longer, and furthermore, he increases the chain length to 6 characters. He also teaches his Markov Chain that word seperators exist, and the Markov Chain generator is reset when a new word is started. Now we're talking - Evan's system will produce very few strings that are not actual words, but it will generate a bunch of words that were in the SMH, but not in "Pride And Prejudice", so he's going to still need more than 5000 guesses to be sure that he's found Alice's passphrase, so he still hasn't done better than treating the words as symbols.
There is one Markov Chain attack that gets nearly equal performance. Take Alice's dictionary, use it to train a Markhov Chain generator that knows about word separators, and that knows that the words don't have any probability links between them. But now your Markov Chain generator has just become a fancy way of picking words out of the dictionary. In other words, it has degenerated to a dictionary attack, not a brute force attack, and a dictionary attack is just another way of saying "treat the words as symbols" which means we're back to my original entropy calculations as being the optimal way of determining the entropy of the passphrase.
As I said at the start, your error is that you're trying to use randomly picked dictionary words as an English text in the Shannon sense of the word. Hopefully the example scenario that I just ran through will help you see why the distinction is important.
I understand randomly selected words are not the same as the English text Shannon is talking about. However my point is that entropy may be lower than what it appears to be. I'm not saying it is 0.6 bits per character (or any other number).
I'm saying that unless you words are long enough it is very likely that the entropy is lower than simply taking log(dictionary_size^word_per_passphrase)/log(2)
The references to Shannon and Applied Cryptography were only made as supporting evidence that entropy of English text is lower than what it appears. I'm not claiming their exact numbers apply in this situation.
If we can't agree on that, the we must simply agree to disagree.
* I'm not claiming their exact numbers apply in this situation.*
You see, the thing is, you kind of are making that claim. If entropy goes out to 3 bits per character, the one feeble point that you did have gets blown out of the water. The fact that you don't seem to understand that indicates that you really need to go back and reread a few books on Information Theory.
Look, for your own edification, try and come up with a scheme that will reliably beat a dictionary attack in terms of the number of attempts needed before finding a password taken from the dictionary. You could even write a simple program to test the idea. Take a dictionary of 5000 words with a length of 6 characters or longer (much shorter words than what your theory suggests should be secure). Select 100 words from the dictionary at random, and then run any attack of your devising against those words. If you can reliably get a better average number of attempts than 2500, I'll concede the point.
Until then, I'm here to tell you that you don't understand information theory as well as you seem to think you do.
I actually didn't ignore the 'common' limitation (and didn't downvote you - I'm actually interested how you come up with that).
Follow-up questions:
- What are the first tests, before this 3rd that tests for words? I assume tests for passwords of the first/left variety in the comic? Aren't they cheaper?
- 'Up to three words' is reducing the exponent of possible combinations by one. Length/number of words is relevant
Edit: Another issue. You say 'people forget the human factor', while you, yourself, propose something like '4 times Hack News with substitutions' as better. How is that including the 'human factor'?
You know what; it's been so long since I played around with this stuff (it's even a separate company now, that we just consult for) that I'm way out of touch with my thought process :)
You're right; there is nothing particularly wrong with the suggestion that makes it intrinsically very weak for most uses.
I'd best stop commenting before I make a total mess :)
The XKCD criticism of the 'bad' password has the same problems. How does the hacker know I have this kind of password (starting with a real english word, for example)?
It criticizes a particular way to choose passwords, leading to a result that seems 'secure' and even quite good to lots of people. One that easily satisfies braindead corporate password rules.
The entropy given there is based on that way to choose a password and even explained, graphically.
If you choose your password totally different and from different sets of characters, then the number will be off. That's not a surprise though?
This is a terrible idea. It's one thing to ask a person to remember which characters they replaced in a word. It's another thing entirely to ask them to remember three different ways they swapped characters. This is a recipe for having to brute-force your own password. It's also still not as secure as you might imagine. There's very little entropy added by swapping characters (even though I do it as well), because there are very few substitutions that people make. "Hacker News Hacker News Hacker News" is nearly as secure as your convoluted 3rd passphrase, but a lot easier to remember.
The repeat-three-times thing is probably not great advice, either. If this became popular, it would be trivial to add this to brute-forcing code, and it doesn't add as much entropy as adding just one extra character.
I don't think the XKCD suggestion is actually good, either. Open a book and pick a medium-length sentence. There's your passphrase: an n-gram chosen arbitrarily from the corpus of (probably) English literature containing mixed-case and punctuation. You've got a ton of entropy there.
Sorry, should have linked back to that myself. Still, 41 bits, while less than the 44 bits from the XKCD algorithm, is a lot more entropy than most passwords have.
Maybe you have an excellent memory, but I'd forget this in a week. "Now let's see, is it the third 'hacker' that has the 4 AND the 3 in it? Or the 2nd?"
You can add equivalent entropy just by adding a few random special characters, a number, and a letter.
Personally, I prefer to type my passwords until I remember them. Therefore, my metric is "easy to type and hard to guess." If it's easy to type, I remember it via muscle memory, which is unbelievably better than trying to remember abstract symbols.
Look, we're working with big numbers here. You need to do the math.
In this thread alone, I've seen suggestions to use a common dictionary word translated into another language, or written in l33tsp34k with some permutations. From a probabilistic perspective, these are still dictionary words, even though they look like gibberish. The same is true of the common method of typing a word with ones fingers displaced on the keyboard.
Conversely, I see a lot of argument that these XKCD passphrases would be easy to guess because they are made up of dictionary words. This misunderstands the math behind the situation. Even if an attacker knows that your password was generated via this method, and even if they know the word list you used, the password is still hard to guess. The difficulty grows exponentially with each word in the phrase, and that's pretty fast.
The key with passwords is not to create something that looks random -- something that if you showed it to another human being, they'd have a hard time deciphering. It's to create something that is random; literally a result of a throw of the dice for every new password.
Human beings are really bad at creating randomness. There's a demonstration done in an early statistics class in which the professor divides the class into two groups. He tells one to toss a coin a hundred times and record the sequence of heads and tails, while the others are to write down a sequence they think is random using their imagination. The papers are completed and mixed and then -- magically! -- he is able to sort them into the two types, easily and with high accuracy.
The lesson is this: even when you think you're being random, you probably aren't. You're probably using the same tricks everyone else is, and making the same mistakes.
I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.