Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would say that perhaps parsing logging works best when using awk, grep and the like, because that's more or less what they were designed for. But not everything is unix logs, and not everything is over 10GB. Having said that, python can absolutely handle 10GB data sets. In fact with things like PySpark, you can really go much bigger.


You're right - in the end my solution for this particular project was using grep and awk to parse the loglines into a CSV-ish format. That was then interpreted by Python and matplotlib to create beautiful graphs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: