More

aexl · on Feb 9, 2024

This sounds a lot like Mikhail Tal!

mtlmtlmtlmtl · on Feb 9, 2024

Not really. Mikhail Tal was easily one of the strongest calculators in chess history. Definitely the strongest in his time besides maybe Fischer.

The idea that Tal mostly made dubious sacrifices is largely a myth heavily based in a joke he himself made. In actual fact he always did deep calculation and knew that no easy refutation existed, and that he had a draw by perpetual check in hand(until beaten by Ding a few years ago, Tal actually had the record streak of unbeaten games in classical chess). He was making calculated risks knowing his opponents would not be likely to outcalculate him. He also had a very deep understanding of positional play, he just had a very different style of expressing it, relying more on positional knowledge to create sharp positions centered around material imbalance.

aexl · on Jan 9, 2024

Nice, you have experience in data frames in R, Python and Julia! Which one of those do you like the most? I know that the ecosystem isn't really comparable, but from your experience, which one is the best to work with for core operations, etc.?

nerdponx · on Jan 9, 2024

Not OP but R data.table + dplyr is an unbeatable combo for data processing. I handily worked with 1bn record time series data on a 2015 MBP.

The rest of the tidyverse stuff is OK (like forcats), but the overall ecosystem is a little weird. The focus on "tidy" data itself is nice up to a point, but sometimes you want to just move data around in imperative style without trying to figure out which "tidy verb" to use, or trying to learn yet another symbol interpolation / macro / nonstandard eval system, because they seem to have a new one every time I look.

Pandas is a real workhorse overall. Data.table is like a fast sports car with a very complicated engine, and Pandas is like a work van. It's a little of everything and not particularly excellent at anything and that's ok. Also its index/multiindex system is unique and powerful. But data.table always smoked it for single-process in-memory performance.

Until DuckDB and Polars, there was no Python equivalent of data.table at all. They're great when you want high performance, native Arrow (read: Parquet) support, and/or an interface that feels more like a programming library than a data processing tool. If you're coming from a programming background, or if you need to do some data processing or analytics inside of production system, those might be good choices. The Polars API will also feel very familiar to users of Spark SQL.

For geospatial data, Pandas is by far superior to all options due to GeoPandas and now SpatialPandas. There is an alpha-stage GeoPolars library but I have no idea who's working on it or how productive they will be.

If you had to learn one and only one, Pandas might still be the best option. Python is a much better general-purpose language than R, as much as I love R. And Pandas is probably the most flexible option. Its index system is idiosyncratic among its peers, but it's quite powerful once you get used to using it, and it enables some interesting performance optimization opportunities that help it scale up to data sets it otherwise wouldn't be able to handle. Pandas also has pretty good support for time series data, e.g. aggregating on monthly intervals. Pandas also has the most extensibility/customizability, with support for things like custom array back ends and custom data types. And its plotting methods can help make Matplotlib less verbose.

I've never gotten past "hello world" with Julia, not for lack of interest, but mostly for lack of time and need. I would be interested to hear about that comparison as well.

hpcjoe · on Jan 9, 2024

At a previous job, I regularly worked with dfs of millions to hundreds of millions of rows, and many columns. It was not uncommon for the objects I was working with to use 100+ GB ram. I coded initially in Python, but moved to Julia when the performance issues became to painful (10+ minute operations in Python that took < 10s in Julia).

DataFrames.jl, DataFramesMeta.jl, and the rest of the ecosystem are outstanding. Very similar to pandas, and much ... much faster. If you are dealing with small (obviously subjective as to the definition of small) dfs of around 1000-10000 rows, sticking with pandas and python is fine. If you are dealing with large amounts of real world time series data, with missing values, with a need for data cleanup as well as analytics, it is very hard to beat Julia.

FWIW, I'm amazed by DuckDB, and have played with it. The DuckDB Julia connector gives you the best of both worlds. I don't need DuckDB at the moment (though I can see this changing), and use Julia for my large scale analytics. Python's regex support is fairly crappy, so my data extraction is done using Perl. Python is left for small scripts that don't need to process lots of information, and can fit within a single terminal window (due to its semantic space handicap).

nerdponx · on Jan 9, 2024

That's a nice endorsement, I've always liked the idea of Julia as an R replacement. I'll definitely give that a shot when I have a chance.

Is there any kind of decent support for plotting with data frames? Or does Plots.jl work with it out of the box?

xgdgsc · on Jan 10, 2024

Plots.jl can work. You may also want to try https://makie.org/ and https://github.com/TidierOrg/TidierPlots.jl

sanderjd · on Jan 9, 2024

Ha I like your description of pandas as a work van. I totally have that same feel for it. It's great because it works, not because it's great :)

sanderjd · on Jan 9, 2024

I'm not the person you replied to, but I have experience with all of these. My background is computer science / software engineering, incorporating data analysis tools a few years into my career, rather than starting with a data analysis focus and figuring out tools to help me with that. In my experience, this seems to lead to different conclusions than the other way around.

tldr: Julia is my favorite.

I could never click with R. It is true that data.table and dplyr and ggplot are well done and I think we owe a debt of gratitude to the community that created them. But the language itself is ... not good. But that's just, like, my opinion!

Pandas I also have really never clicked with. But I like python a lot more than R, and pandas basically works. For what it's worth, the polars api style is more my thing. But most of the data scientists I work with prefer the pandas style, :shrug:.

But I really like this part of Julia. It feels more "native" to Julia than pandas does to python. More like data.table in R, but embedded in a, IMO, even better language than python. The only issue is that Julia itself remains immature in a number of ways and who knows whether it will ever overcome that. But I hope it does!

nerdponx · on Jan 9, 2024

I sympathize with anyone who doesn't like R. Even as a statistics/math DSL it's really wonky.

But it's a lot more fun when you realize that it's an homoiconic array language with true lazily-evaluated F-exprs (not Rebol/Tcl strings).

sanderjd · on Jan 9, 2024

I realized that (not in so many words...) pretty quickly and do not like it at all :)

aexl · on Dec 6, 2022

Problem with that is that you will probably have a downtime for several minutes (or at least many seconds) each day. That's not optimal for a site where at any time of the day thousands of people are playing chess...

aexl · on Dec 6, 2021

Somewhat offtopic, but sesse.net runs a Stockfish instance with the (probably) deepest analysis of live chess games: http://analysis.sesse.net/ I visit it daily for the Chess World Championship match.

aexl · on Oct 11, 2021

As well as Julia.

aexl · on Sept 26, 2021

I like the sound of hair dryers to focus, something like this: https://www.youtube.com/watch?v=KFL1w-Ptrks

aexl · on Aug 27, 2021

I have proposed this some years ago (https://github.com/ytdl-org/youtube-dl/issues/14646), but it didn't get much attention from the maintainers. It has been previously rejected in 2013 already (https://github.com/ytdl-org/youtube-dl/issues/1185).

colethedj · on Aug 27, 2021

yt-dlp does support "3rd party extractors" (in the form of plugins): https://github.com/yt-dlp/yt-dlp#plugins

aexl · on May 2, 2021

The "mozilla-unified" repository seems to be at 646.5k commits and the "chromium" repository at 999.5k commits.

aexl · on March 11, 2021

I have just listened to your album on YouTube and have to say: Excellent music, I love it!

elektronaut · on March 15, 2021

Thank you!

aexl · on Oct 2, 2020

These type of repositories have existed since the first Hacktoberfest. They might even be useful for beginners to learn how to prepare their first pull request. But the idea of Hacktoberfest is clearly not to submit all your pull requests to such a repository.

It even seems as though DigitalOcean is blocking these type of repositories. From the FAQ:

My pull request is marked as being on an excluded repository. What does this mean?

Unfortunately, your pull request was made on a repository that doesn’t align with the core values of Hacktoberfest. We’ve decided that pull requests made to this repository will not count toward completing the challenge.