Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A large part of the natural language processing is indeed done by Mathematica, and last time I looked I believe there were about 15 million lines of Mathematica code in the main repository. Note that this massive number is largely the result of the multitude of Mathematica scripts used to insert raw data and relations into the database. Just based on glancing at folder sizes, I'd estimate that around 40% of the repository is code for parsing, so around 6 million lines of Mathematica code.

Note that lines of Mathematica code tend to do a lot of processing, so this would be the equivalent of many times more lines in another language. It is quite an interesting process hooking in a new feature to Wolfram Alpha, and some developers described it as the "mud bowl" because when you break things, you just had to throw more mud at it.

I'm not allowed to disclose details about the technology stack, but I can say that the database querying functionality was kept separate from the actual parsing and semantic analysis, and was implemented in a different language.

If you're interested in NLP, which by the way is a wonderful and exciting field with mysteries abound, Mathematica is indeed a great way to get started quickly. Although I recommend to everyone to do their work in an open-source-able way with a popular language like Python or Java, I built a Swahili translator during my freshman year of college with Mathematica. Here it is on github:

https://github.com/keshavsaharia/Swahili-Translator

Best of luck, feel free to shoot me an email at the email address listed on my github if you have any questions or need help getting started.



> I'm not allowed to disclose details about the technology stack, but I can say that the database querying functionality was kept separate from the actual parsing and semantic analysis, and was implemented in a different language.

I'm definitely curious about this. Are you allowed to disclose which language?


  Just thought about it,
  and I'd really like to share this detail, but I signed a
  very ambiguous non-disclosure
  agreement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: