January 31, 2025 • This week was hard on the conflict-averse. But if you're up on nursery rhymes, prehistoric bodily fluids and Renaissance art, you'll get at least three right this week.
Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.