I think people (here and below) are getting hung up on definite articles, but Zi...

seoaeu · on May 15, 2022

"The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in" seems like it falls somewhere between blindingly obvious, and entirely useless. Unless you have some clue what those words mean, how does that observation help you?

bee_rider · on May 15, 2022

Just from skimming the wikipedia article, it doesn't seem useful for translating. But it is slightly stronger than "The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in." It tells you that, for example, the most popular word should be about twice as popular as the second most popular word.

It doesn't tell you what those words are, but it is a pretty specific observation about the frequency/rank relationship. So, as the wikipedia article liked about points out, it can tell us that the Voynich Manuscript was probably written in a language (of course, it could be a cypher of a real language or something made up, like elvish in Lord of the Rings, but it probably isn't just a random collection of symbols because it is unlikely that a random collection of symbols would happen to follow this distribution).

sramsay · on May 15, 2022

It doesn't (in this case), and I didn't say it did. And there's nothing "blindingly obvious" about the ubiquity of the Zipf curve.

tremon · on May 15, 2022

How does this definition use "word"? In analytic languages, most words always appear in the same form, so counting them is relatively easy. But for inflected languages, does this require being able to distinguish the roots of words in order to count them accurately?

It's not just about the presence or absence of articles and prepositions, but about different declensions of the same word. If this analysis requires knowing that homo, hominis, hominem and homine all refer to the same word and should be counted as one, how does it help with analyzing a text for which we don't know the grammar?

SemanticStrengh · on May 16, 2022

> does this require being able to distinguish the roots of words in order to count them accurately? can be somewhat retrieved via leveisthein distance.

> declensions is there a difference between an inflection and a declension?