Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Minoan language Linear A linked to Linear B in new research (greekreporter.com)
207 points by clouddrover on May 15, 2022 | hide | past | favorite | 90 comments


I spent basically the entire article trying to figure out what the "groundbreaking" research actually was... this is a pretty mangled press release rendering of the scientific research, even worse than the kinds you normally see from university research.

To its credit, this doesn't promise that we can (or will shortly be able to) actually read Linear A texts, and it actually explicates that we won't be able to do that. But that's pretty much the limit of credit due to this article.

Linear A's connection to Linear B has been hypothesized since... well, at least as far back as when I was taught it in school, which considering how long it takes textbooks to update themselves to state-of-the-art archaeology may as well be time immemorial.

What it looks like the actual "groundbreaking" research here is, based on the sleuthing done in the previous version of the article that leads to an academic review of the work in question (here: https://bmcr.brynmawr.edu/2021/2021.04.30/). The layman's version is that it's a detailed analysis of the structural elements of the script to propose how the (unknown) language was encoded into Linear A, combined with some analysis of how individual glyphs varied in time and space--and this results in the conclusion that Linear B is actually a version of a regional [script, not linguistic] dialect of Linear A that was used to write a different language.


this results in the conclusion that Linear B is actually a version of a regional [script, not linguistic] dialect of Linear A that was used to write a different language.

I'm having trouble understanding this. To me this says that Linear A and Linear B are different scripts that express different [spoken] languages, but might share common heritage. Which sounds obvious to me, not groundbreaking, unless they've now identified the common source.

Or are you saying that Linear A and Linear B might be the same script (as in phoneme->symbol encoding), but used to express a different language?


> Or are you saying that Linear A and Linear B might be the same script (as in phoneme->symbol encoding), but used to express a different language?

That was my understanding but I didn't think that was a new hypothesis - I remember reading years ago that they were a very similar alphabet (for lack of a better word) and that linear B was just the locals using their alphabet to write ancient Greek.

I believe something similar was done in Germany a few hundred years ago - the Bible was written in Hebrew script that encoded German words.


IMHO grandparent is saying Linear B is a different, yet related script that appears to be directly derived from a regional variant of Linear A.

A script can be used to write arbitrary languages, and does not imply a fully identical mapping to phonemes when used for these different languages. Examples include use of the Arabic script[0] to write Uighur, Urdu, or Farsi.

Generally an existing script is adopted for a new language through transliteration, often the addition of diacritics or new glyphs. Further examples would be Vietnamese or the numerous abugidas of Southeast Asia. When an existing script is used without adaptation by a speaker of another language attempting phonetic transliteration, it is generally very lossy.[2] This is why we have IPA, which is also lossy and not widely understood.[3]

[0] https://en.wikipedia.org/wiki/Arabic_script [1] https://en.wikipedia.org/wiki/Abugida [2] https://www.sinosplice.com/life/archives/2013/07/02/the-fore... [3] https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...


Now I'm imagining future archaeologists trying to figure out how modern Turkish relates to the other languages with a Latin script.


I too was scratching my head about as you put it , "what the groundbreaking research was". The academic review link that you posted was rather useful though it was a little technical.

I think the book needs to be read. But thanks a ton.


> and this results in the conclusion that Linear B is actually a version of a regional [script, not linguistic] dialect of Linear A that was used to write a different language.

I guess that's nice to know, but hasn't it always been known that linear A and B were "related". Regardless, Linear A still hasn't been deciphered and that's what's ultimately important.


Thank you for your take on it. I too was hoping for more insights here, having followed the Linear A / Linear B developments from my armchair for many years.


I was expecting a proposed language, like a dialect of luwian or something. This article is old news


Yeah, not a good article at all - I bailed half way through and went to the comments here in hope of a tldr/abstract…


What it specifically fails to deliver is any indication of how "the internet" had any role at all in the work or in any progress made.

The only suggestion of progress was that somebody finally noticed the LB symbols were mostly about the same as LA symbols, so they now can now pronounce the LA texts. There is no hint why that only just happened.


This article is a reasonable summary of the status quo in Linear A studies since 1956, but the reporter seemingly deliberately obfuscates the nature and scope of Dr. Ester Salgarella's new contribution.

It appears to be the creation of an online corpus in collaboration Dr. Simon Castellan, linked near the bottom of the article: https://sigla.phis.me/

This will be a great resource and is an important work, but it appears that today we are no closer to deciphering Linear A than we have ever been.


I don't know about "deliberately". Might be just a confused, inarticulate piece by a confused reporter.

When an article opens with "the Minoan language known as Linear A" (no, it's a script) and "Linear B developed later in the prehistoric period" (an oxymoron, prehistoric = pre-literary)… you know not to expect much.

Did anyone manage to parse out what is even being claimed in this article?


The author found a relationship between Linear B (the script) and Linear A (the script) to the point of being able to approximately pronounce Linear A. The actual language written in that Linear A script, Minoan, is still unknown, but this provides some important tools to better understand it.


> The author found a relationship between Linear B (the script) and Linear A (the script) to the point of being able to approximately pronounce Linear A.

Except that has been known for like... 50 years? 60 years?


Exactly as you say, the fact that many signs are shared by Linear B and Linear A has been known for at least a half of century.

What I understand from the review of the book is that after a more thorough analysis of the graphic variations of various Linear A signs, many more signs inherited by Linear B from Linear A have been identified than before.

Having better and more phonetic readings of Linear A texts increases the hope that the Minoan language could be identified and understood, even if this remains highly unlikely, unless more Linear A texts would be discovered.


If we don't know the language it's encoding, how do we know the Linear A pronunciation is correct, approximately or not? Is this done purely on the assumption that Linear A and Linear B might encode similar phonemes in a similar way?


Is it not fair to call an society where writing existed but we can't read it prehistoric? From our perspective, we have no written history. Or is the distinction that they themselves could read it and therefore acted with historical awareness?


Absolutely. If we can read their script, and they wrote a history, we can read that history. Without, we are reduced to relying on archaeology.

We can read historical texts from Egypt and Mesopotamia from the time the Minoans , er, "flourished". In exactly that sense they are not prehistoric. I think Egyptians mentioned them. (It seems odd if Egyptians did not mention Santorini blowing up and wiping them out, but maybe they did. Somebody must have mentioned "Thera".)


You'd have to give up the notion that "prehistoric" refers to some fixed moment in time, but other than that you are of course free to try to change it's meaning.


No. Was the Egyptian civilization (which lasted for 3000 years until the time of the Romans) prehistoric because we could not read the Egyptian hieroglyphics? Did it only cease to be prehistoric once we deciphered the language?


Yes, exactly. Before we have written history is prehistory. As we get more history, the boundary of prehistory moves back.

So most of the American remnants are prehistoric, despite being coeval with history in the "old world".


https://lineara.xyz/ is also worth a look


That's cool as hell (note that https://linearb.xyz also exists.)


can't they use the zipf law ? https://en.wikipedia.org/wiki/Zipf%27s_law The decreasing exponential law should allow to find "the" and some closed form POS words, so yeah determiners, prepositions and conjonctions.


You really think that in decades of linguists studying Linear A, no one has thought of trying Zipf's law?

If scientists have studied something for this long, and you come up with an idea that fits in a single paragraph, it's probably been tried and didn't work. Unless you're the field's leading expert in which case you would be off doing it, not posting it on HN :)

Edit: typos


Neither of these areas are my field, so I could be entirely misunderstanding this preprint [1] from 2021. The preprint mentions using Zipf's law in the objectives section on attempting to deciphering Linear A.

The literature survey section mentions there have been good results using computational methods in 2020 to automatically decipher Linear B. The discussion section mentions "To the best of our knowledge, this is the first study to discuss and show computational analysis of Linear A."

Again, neither of these are my fields, but it looks like if these linguists have tried to use Zipf's law or other computational methods unsuccessfully in deciphering Linear A, the results weren't published. (Or a poor literature survey, or other explanations...) I'm not an academic either, so I don't know what the practices are for publishing unsuccessful results.

[1] https://hal.archives-ouvertes.fr/hal-03207615/document


That looks like an extremely underwhelming bit of research.

Some thoughts:

1. Figure 2 is titled "Evolutionary Tree of ancient scripts reconstructed using Neighbor Joining algorithm in ClustalW2 (Daggumati et al., 2019)", however checking the references (a) that figure does not appear in the cited paper (b) the cited paper does not use ClustalW2 (c) the scripts in the cited paper do not match those in the figure.

2. Speaking of references, they include Classen, M., & Safrany, L. (1975). Endoscopic papillotomy and removal of gall stones. This seems like an odd choice. Why did they include this? I have no idea, I can't find any cite to it in the text.

3. Immediately after "When text is rendered by a computer, the characters in the text can not all be displayed, because no font that supports them is available to the computer" they write: "The Python package Matplotlib also does not render the font as of the current version 3.3.4." I would have thought that rendering with matplotlib would have counted as "render[ing] by a computer". Please also note Figure 5, which has a screenshot of a python script and a matplotlib plot; the former displays 5 (presumed) Linear A characters correctly while the latter has only boxes. Perhaps they meant 'there is no font in which all Linear A characters are present'. (And I should point out that in less than sixty seconds of searching I discovered that--as you might expect--matplotlib can use user-specified .ttf fonts)

(Sorry I know this point is long but another thing bothering me: why bother sticking with unicode codepoints when they could have instead just assigned integers to each character?)

4. For reasons that escape me, they chose to use a word cloud in Figure 15 ("Word cloud of locations the Linear A tablets were found by the number of symbols gathered from each") instead of, for example, a table. Why is 'HT' in there at least twice? Why is 'KH' in there at least six times?

I could go on but I'll stop here and say that if I were reviewing this for a journal it would be an immediate rejection.


Interesting. If they have only very recently tried Zipf's law then there may be some other more advanced stuff they haven't tried.

I'm thinking word embeddings. Like maybe you could do a word embedding based on cooccurence and look for similarly shaped clusters in Linear A and Early Greek.


It's showing that we had to wait for 2021 for them to try it.. Thanks for reporting anyway!


I understand the impulse to point out the obvious, but when the question is asked honestly rather than arrogantly or dismissively, it is even better to wait for someone to provide the specific answer; in this case, the reason that Zipf's law is of no help.


It wasn't my intent to be overly dismissive. But I see this sort of thing all the time, and I find this phenomenon interesting, so I wanted to engage with that aspect of it, specifically.

I agree with you in general though. Dismissing these things out of hand isn't helpful either. But multiple people had already made substantive replies to the actual content of their idea, anyway.


useless dismissal, I made a question not an affirmation. Besides it allow for an exploration of the search space of solutions, which stimulate the depth of the discussion and might allow finer grained questions that would then become possibly innovative


Hardly useless, since it apparently inspired people to dig deeper and find more info :)


edit: I hope this will make you think twice next time. Using zipf law for linear A has only been attempted for the first time in 2021 https://hal.archives-ouvertes.fr/hal-03207615/document so had I commented last year it would have been prio art. I agree the idea is not very original and yet we had to wait that much time for it to be experimented.


The total extant corpus of Linear A amounts to fewer than 10,000 characters (and this is, I believe, the largest corpus of any undeciphered script).

There's not enough text to do statistical analysis.


If all that text would have been that of a story, there would still have been a chance to decipher it.

Even worse than the small number of texts is that all, or almost all, are just bookkeeping records, so they contain few words besides numbers, symbols for useful goods, e.g. wine, olive oil, barley, wool and so on, and proper names of places or people.

So even if there might be a few hundreds of texts, most just reproduce the same phrases, only with different numbers and names substituted in them.

Any statistics on this handful of stereotype phrases will offer no information about the statistics of the words of the Minoan language as used in a normal conversation or story telling.


I'm skeptical of the claim that this would work at all even if there was a larger corpus. Let's say you had a million pages of classical Chinese text, but absolutely no context about what the text meant or was about. By looking at it closely and using statistical analysis you could certainly determine various rules of the grammar, and you might even be able to guess that certain characters are grammatical constructions representing things like conjunctions and prepositions. But this isn't really going to let you translate anything.


My guess is that if you had a really big, wide-ranging and high-quality corpus of a completely unknown human language then you probably would be able to decipher and translate it. If you could deduce or guess the grammatical structure the next step might be to look at which nouns can be subjects of which verbs, for example, and it might then be possible to guess which nouns refer to humans and which verbs describe actions that can normally only be performed by a human, and then ... well, there's lots of statistical stuff you can do with a really huge corpus ... It's an interesting problem to think about but it's not a problem we're ever likely to encounter in real life. It's more likely we'll discover a corpus of some alien, non-human language than a huge corpus of a completely unknown human language.


The Voynich manuscript is fairly extensive (over 200 pages), illustrated, and we still do not know what the heck of a language it is written in.

https://en.wikipedia.org/wiki/Voynich_manuscript


Yes, the Voynich manuscript is worth mentioning here, but it doesn't qualify as a really big, wide-ranging and high-quality corpus: it's about 35 thousand words (if the things that look like words are words), whereas a million words would still be a small corpus by today's standards; it's a single document (for example, it might contain no first-person pronouns, no future tenses, almost no mention of individual people, ...); and we have no idea about the quality (for example, perhaps it was copied by someone who didn't understand it and is such a bad copy that even someone who knows the language and the script would find it hard to understand). Still, it is definitely worth mentioning here.


After seeing your comment, I watched a TED Ed about it[1], and now I am convinced, for it to have survived for so long, it is a relic and time forgot why it was saved. It's Leonardo Da Vinci's childhood notebook, has to be, there is no other explanation for why it would have been kept since the 15th Century. So it's likely to be derivative of Italian, Latin, and other languages he spoke and wrote. No, wait... it very well could have been created by some other weird kid from the 15th Century no one's heard of. Come on.

[1] https://www.youtube.com/watch?v=8NS4CbBJQ84


...assuming it is really language at all, and not just deliberate gibberish.


entropy analysises seems to indicate it is natural language like so not too much redundancy and not too much noise/randomness.


hey this was really interesting to read. I like the aztec hypothesis, anyway hoax or not it seems it has received a tremendous amount of work at a time where half of those concepts hadn't even been discovered, except of course if they had conserved paper for 200 years, which I find not that unlikely tbh.


Zipf's law is a statement about the results of a frequency analysis. Frequency analysis has been a core part of Linear A efforts for a long time, as you'd expect considering how important it was to Ventris in deciphering Linear B.

See for example "On the Language of Linear A" from 1958 (https://gredos.usal.es/bitstream/handle/10366/73246/On_the_L...) or "Greek-like Elements in Linear A" from 1963 (https://grbs.library.duke.edu/article/viewFile/11991/4031). For something more recent, there's "Linear A and Linear B: Structural and contextual concerns" from 2017 (https://core.ac.uk/download/pdf/200196986.pdf). (One of the authors of that one, Steele, also has a blog: https://crewsproject.wordpress.com/tag/linear-a/)


That makes assumptions about the language, for the languages I know: German has three definite articles, Latin doesn't have any, so it is not obvious what looking for "the" would result in either.


I think people (here and below) are getting hung up on definite articles, but Zipf's Law makes no such observation. It says only that a word's frequency in a natural language corpus tends to be in inverse proportion to its rank in a frequency table.

In English, the most frequent words are articles, but the general observation about word frequency holds across languages (whether those languages have articles or not).


"The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in" seems like it falls somewhere between blindingly obvious, and entirely useless. Unless you have some clue what those words mean, how does that observation help you?


Just from skimming the wikipedia article, it doesn't seem useful for translating. But it is slightly stronger than "The most frequently appearing words in this pile of un-translateable text are the most common words in the language it is written in." It tells you that, for example, the most popular word should be about twice as popular as the second most popular word.

It doesn't tell you what those words are, but it is a pretty specific observation about the frequency/rank relationship. So, as the wikipedia article liked about points out, it can tell us that the Voynich Manuscript was probably written in a language (of course, it could be a cypher of a real language or something made up, like elvish in Lord of the Rings, but it probably isn't just a random collection of symbols because it is unlikely that a random collection of symbols would happen to follow this distribution).


It doesn't (in this case), and I didn't say it did. And there's nothing "blindingly obvious" about the ubiquity of the Zipf curve.


How does this definition use "word"? In analytic languages, most words always appear in the same form, so counting them is relatively easy. But for inflected languages, does this require being able to distinguish the roots of words in order to count them accurately?

It's not just about the presence or absence of articles and prepositions, but about different declensions of the same word. If this analysis requires knowing that homo, hominis, hominem and homine all refer to the same word and should be counted as one, how does it help with analyzing a text for which we don't know the grammar?


> does this require being able to distinguish the roots of words in order to count them accurately? can be somewhat retrieved via leveisthein distance.

> declensions is there a difference between an inflection and a declension?


Can we assume that any of those features would be part of this language?


a language without thoses would be hella weird and primitive, like stereotypical robotic talking. To answer you question, I don't know, do linear B have them?


Weird, perhaps; primitive, no.

One of the historical issues with linguistics is that it analyzed every language as if it were Classical Latin or Classical Greek, and if that language had elements that didn't work out... well, that can't be proper then, can it? You still see some residuum of this in English prescriptivist poppycock, like the prohibition against ending sentences in prepositions.

As linguists actually started inventorying world languages, it became more and more clear that there is a very wide dichotomy of grammatical features that don't necessarily translate well to familiar languages. There are vanishingly few features that are actually universal to all languages--the noun may well be the only universal part of speech. That a language doesn't choose to mark a feature in a particular way doesn't make it more primitive than another language. English doesn't have a numerical classifier... is it more primitive than an Australian Aboriginal language? Or is it more primitive than Japanese for not having a way to mark register (~ politeness)?

(FWIW, Linear B is used to write Mycenaean Greek, and this has been known for ~70 years.)


LOL, my native language (Czech) has no articles, but it is so flexive and permits so many subtle, meaning-carrying changes in sequence of words in a sentence that it is actually hard to carry over some of those subtleties into written English.

The only thing "robotic" about it is the fact that "robot" is a Czech word that was adopted worldwide.


Latin doesn't have articles (the/a), and frequently drops the verb. Aramaic encodes the article in a suffix. Arabic and Hebrew omit the vowels, leaving the interpretation depending on contextual clues. There are languages without auxiliary verbs. And there are tons of other constructions that English doesn't have.


Written Arabic and Hebrew often omit vowels; the actual spoken languages do not, of course.


The problem isn't in deciphering spoken Linear A.


There are languages used today that don't have a separate word for "the", such as Hebrew (which uses a prefix to denote "the"), or Chinese, which apparently doesn't use articles.[1]

Also, knowing what the most common words are wouldn't really help you much if you didn't know what the documents are about. For example, if they were trade records, they might contain a lot of text saying something like "X agrees to buy 20 pounds of olives from Y for $50 if delivered by next week". But if they were historical records of wars, other words may be more common.

[1] https://mylanguages.org/chinese_articles.php


> a language without thoses would be hella weird and primitive, like stereotypical robotic talking. To answer you question, I don't know, do linear B have them?

I think there are very few assumptions of the form "every reasonable language has […]" that hold up even for all current languages, let alone historical ones.


Anyone claiming "surely every language needs X" had better look at Riau Indonesian (https://en.wikipedia.org/wiki/Riau#Language) first to check if that language has X. If it doesn't, then X is almost certainly not required for communication.


Did you just call all Slavs "weird and primitive"?


In addition to the other comments on difficulties of such analyses, an additional difficulty may be the type of inscriptions in the corpus. We understand Linear B, for example, because it is early Greek. But the texts are not narrative prose or poetry: they're administrative records, mostly lists and inventories. If Linear A texts are of similar types, then trying to decipher the language from them alone may be challenging or impossible, unless it can be linked to a known language: the forms of speech used may simply be too limited.

Trying to understand English grammar by looking only at bare financial statements would likely be extremely hard.


The language might not have that. E.g. latin doesn't have a word that corresponds to "the". Languages may indicate conjunctions or some types of prepositions or determiners by changing the case of a noun or the conjungation of a verb.


I think the issue with Linear A is that the amount of preserved units of culture in that language is incredibly small, so whichever the statistics you can obtain from it are limited in use.


I wonder can they construct a deep learning model that encodes human languages from script shapes and then somehow figure out the God language from which Linear A script/language is derived.

There's too little data for Linear A. But it might be enough if there's a God language oracle waiting to be fed new descendant languages.


This is a God (language)-of-the-gaps argument: we can't figure out this rarely language, but maybe we can figure out an entirely unattested language instead, and also learn the correspondence between it and Linear A.

Deep learning can predict plenty of phenomena in the world, sure, but it needs data, not aspirations.


> figure out an entirely unattested language

I did not say that. Human languages evolve in similar ways, use similar vocabulary, grammar etc. Linguistics has already unraveled the structure of many languages and the structure of evolution of language through time.

I am not saying DL is THE approach to take, but given that there's only ~10k characters of Linear A, it is hard to tackle the problem without common representation of multiple languages that are close to it. That's the whole point of DL, how to build better and better representations, not how to accurately model uncertainty (which is what you get by doing statistics).

I would say XLM [0] builds a common representation of a collection of languages and then works better on machine translation for languages for which the data is scarce but that are related to the languages in the model. (what it also does is discover and represent the structure of part-of-speech, grammar, entities etc. without being told about those particular things)

Does there exist an abundance of data for languages close to Linear A? If not, then I admire the work of all that try to untangle this with their brains alone.

0: https://github.com/facebookresearch/XLM


> Does there exist an abundance of data for languages close to Linear A? If not, then I admire the work of all that try to untangle this with their brains alone.

In the article, Dr. Ester Salgarella says: "we have not yet identified the linguistic family the Minoan language belongs to (unless it has to be taken as an ‘isolated’ language)"

If we knew that the Minoan language belonged to some extant language family and we had an abundance of data, the mystery of Linear A would already have been solved decades ago.

In general, there's very little data for any of the Palaeo-European languages that got replaced by Indo-European languages.

Linguistic relatives of the Minoan language could have gone extinct when their speakers shifted to Greek or some other Indo-European language. It is also possible that other Minoan languages died out centuries or millenia before the arrival of Indo-Europeans. I don't believe we will ever know.


Yeah, I guess the conclusion is that humans are still much better than DL. :D


Article from 2021, duplicate of https://news.ycombinator.com/item?id=27191364


>I am afraid there is currently no exact translation of the sign-sequences (= words) attested on Linear A tablets (as well as other document types). This is primarily because we have not yet identified the linguistic family the Minoan language belongs to (unless it has to be taken as an ‘isolated’ language)

Seems as though they may have made some kind of advancement in the relationship between symbols, but as always we do not have nearly enough written material to approach deciphering.


Feels like a fun intro to comp.sci exercise would be taking texts from ancient languages and writing compression schemes, n-gram analyzers, regular expressions, and symbol call graphs for them. It's a bit like the apocryphal story of some old hackers in the 80s "decoding" a Chinese takeout menu (Jobs/Woz?), but it could get kids interested in archeology in a way that is smarter than that alien TV show.



wow has this been applied to linear A?? Nice to see google doing original work. Also the MIT omnipresence is humiliating for other universities, as usual.


Work has been done on using Markov models etc to predict missing symbols in these texts. But it feels like with all the data now available, and the fact that some signs' meanings are known, we must be able to at least reduce the constellation of possible meanings of some of the unknown signs. There are only so many things it was possible to say about olives in the ancient world, and presumably the semantic space wouldn't be so different to other vaguely contemporaneous languages (not just Linear B). Does anybody know of any work in this direction?


TED-Ed made a nice video explaining Linear A https://youtu.be/iePEw_cHp8s


I wonder if a machine learning model could shed some 𐝌 on the decipherment.


Linear A seems to derive from cretan hyeroglyphs https://en.wikipedia.org/wiki/Cretan_hieroglyphs What do cretan hyeroglyphs come from? And how does a population create a language? That's absurdly difficult to initiate.


I assume you mean "writing system" and not "language" here.

Writing systems were independently invented no fewer than three times (Mesopotamian, Chinese, and Mayan are unquestionably independent inventions) and probably more times, while also being reinvented from scratch numerous times after that (Cherokee being perhaps the most well-documented such reinvention--Sequoyah knew of writing from the Americans, but had no other conception of how it worked, and his documentation of the process of developing the Cherokee syllabry is a nice compression of the history of stages of writing systems). It does not seem to be a particularly challenging invention.

There appear to be two key hurdles that are required for the development of writing. The first is the creation of a systematic inventory of stylized representations of objects, for example knowing that this symbol represents "sun" and that one represents "eye". In particular, I'd draw the "systematic" inventory here as the challenge--merely representing concepts in visual drawings seems to be a pretty universal capability. The second hurdle is the re-encoding of (some of) these symbols to represent phonetic values in an abstract way. (Note that being able to represent any phonetic utterance of a language is the distinguishing characteristic between proto-writing and writing.)

If you actually want to know how language is created, well, there is a recent community of deaf people who spontaneously invented their own sign language de novo, which suggests that language is actually incredibly easy to invent.


"Writing systems were independently invented no fewer than three times (Mesopotamian, Chinese, and Mayan are unquestionably independent inventions)"

"Note that being able to represent any phonetic utterance of a language is the distinguishing characteristic between proto-writing and writing."

Huh... does Chinese meet that criteria? https://en.wikipedia.org/wiki/Logogram#Differences_in_proces...


The Chinese writing system is a syllabary, but with a large number of symbols that map to ~1200 spoken syllables, and memorized spelling rules for which to use, often with "just-so" stories about logograms as memory aids.

Strangely, many, perhaps most users of it think it is not a syllabary. This is largely because it is what they were told as children, and it makes little difference in use, within bounds of Mandarin.

People who speak other Chinese languages, which Mandarin speakers are taught to think of as dialects (i.e. basically the same as Mandarin but for variant pronunciation), know better. It is common among Mandarin speakers to believe they can read texts in these other "dialects", not realizing that what they read has been translated to Mandarin. (Writing "in dialect" is done very rarely, except for transcribed loan words, because it is not taught; the elaborate spelling rules for Mandarin would not work, as is.)

The CCP encourages this misapprehension.


Source please? I'd like to read more about this.


I picked it up mostly from Language Log, the U of Penn collective, and Victor Mair's postings there. No doubt he has a book.


Hm. " Chinese, they are fused with logographic elements used phonetically; such "radical and phonetic" characters make up the bulk of the script. Both languages relegated the active use of rebus to the spelling of foreign and dialectical words. "

I guess that counts.


I think after cave paintings and tally marks modern written language is fairly straightforward. Just requires a need to pack more information smaller which farming provides.


The less you know about something, the simpler it seems.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: