Many of the most popular benchmarks for AI models are outdated or poorly designed. Every time a new AI model is released, it’s typically touted as acing its performance against a series of benchmarks.