I think people (here and below) are getting hung up on definite articles, but Zipf's Law makes no such observation. It says only that a word's frequency in a natural language corpus tends to be in inverse proportion to its rank in a frequency table.
In English, the most frequent words are articles, but the general observation about word frequency holds across languages (whether those languages have articles or not).
"The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in" seems like it falls somewhere between blindingly obvious, and entirely useless. Unless you have some clue what those words mean, how does that observation help you?
Just from skimming the wikipedia article, it doesn't seem useful for translating. But it is slightly stronger than "The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in." It tells you that, for example, the most popular word should be about twice as popular as the second most popular word.
It doesn't tell you what those words are, but it is a pretty specific observation about the frequency/rank relationship. So, as the wikipedia article liked about points out, it can tell us that the Voynich Manuscript was probably written in a language (of course, it could be a cypher of a real language or something made up, like elvish in Lord of the Rings, but it probably isn't just a random collection of symbols because it is unlikely that a random collection of symbols would happen to follow this distribution).
How does this definition use "word"? In analytic languages, most words always appear in the same form, so counting them is relatively easy. But for inflected languages, does this require being able to distinguish the roots of words in order to count them accurately?
It's not just about the presence or absence of articles and prepositions, but about different declensions of the same word. If this analysis requires knowing that homo, hominis, hominem and homine all refer to the same word and should be counted as one, how does it help with analyzing a text for which we don't know the grammar?
> does this require being able to distinguish the roots of words in order to count them accurately?
can be somewhat retrieved via leveisthein distance.
> declensions
is there a difference between an inflection and a declension?
In English, the most frequent words are articles, but the general observation about word frequency holds across languages (whether those languages have articles or not).