Image Credits:NanoStockk / Getty Images Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment called the Sunday Puzzle . While written to be solvable without too much foreknowledge, the brainteasers are usually challenging even for skilled contestants. That’s why some experts think they’re a promising way to test the limits of AI’s problem-solving abilities. In a recent study , a team of researchers hailing from Wellesley College, Oberlin College, the University of Texas at Austin, Northeastern University, Charles University, and startup Cursor created an AI benchmark using riddles from Sunday Puzzle episodes. The team says their test uncovered surprising insights, like that reasoning models — OpenAI’s o1, among others — sometimes “give up” and provide […]
Original web page at techcrunch.com