Well, intelligence itself is hard to define. We'd consider pretty much all humans "intelligent" even though certain things for some humans are near impossible for others. The greatest common denominator of human intelligence that people generally seem to imply when they talk about AGI is a multivariate overlap of ability to learn, embodiment, abstract pattern recognition, visual and motion acuity, and emotional understanding. However many of those facets are hard to test.
Intelligence is something that has thousands of variables. It is a spectrum with many points of "emergence" where something near impossible before becomes possible. There are of course the many benchmarks we give LLM's, (MMLU, ARC, etc), but the more practical test is whether a model can completely replace a human in an economically viable activity.
Am I only one seeing the naked king here?