Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python is slow - parsing 10GB of logging works best with awk, grep, etc.


Python isn't going to beat grep, but it beats awk in a lot of cases. (Cases that awk isn't well suited to, to be fair. Python doesn't beat awk for 99% of what people use awk for.)

It's faster than people think it is. Especially when you add in libraries like pandas, it's fantastic for data analysis.

Of course, by the time you get to using pandas, you have to have everything in memory.

This isn't true of python in general, though. For simpler tasks, you can easily write generators to read from stdin and write to stdout.

I'm not saying that it's better for things like log parsing, but for more complicated ascii formats, I'd far rather use python than awk.

That having been said, people who don't learn and use awk are missing out. It's a fantastic tool.

I've just seen one too many unreadable, 1000-line awk programs to do something that's a dozen lines of python.


I would say that perhaps parsing logging works best when using awk, grep and the like, because that's more or less what they were designed for. But not everything is unix logs, and not everything is over 10GB. Having said that, python can absolutely handle 10GB data sets. In fact with things like PySpark, you can really go much bigger.


You're right - in the end my solution for this particular project was using grep and awk to parse the loglines into a CSV-ish format. That was then interpreted by Python and matplotlib to create beautiful graphs.


I hear you on this. I'm very interested in non-performance based reasons. Some of the python libraries are optimized for big data too, no?

I guess the reason I ask is much of the "manipulate and check" that I do happens before I get things to where a one liner will work. That could very well be a programmer competency issue on my part though. :-)


Use pypy. At 10 GB the bottleneck will probably be the storage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: