A recent study examined how well AI can solve visual puzzles typically used in human IQ assessments. The results were not impressive.

Are artificial intelligence (AI) systems capable of tackling cognitive challenges designed for human intelligence tests? The findings were mixed.

AI’s Challenges in Nonverbal Abstract Reasoning

Scientists from the USC Viterbi School of Engineering Information Sciences Institute (ISI) explored the ability of multi-modal large language models (MLLMs) to solve abstract visual tests usually given to humans.

Presented at the Conference on Language Modeling (COLM 2024) in Philadelphia recently, the study tested “the nonverbal abstract reasoning abilities of open-source and closed-source MLLMs” by checking if image-processing models could go beyond simple recognition and show reasoning skills when faced with visual puzzles.

“For instance, if a yellow circle changes into a blue triangle, can the model recognize and apply that transformation in a new context?” explained Kian Ahrabian, a research assistant on the project, as reported by Neuroscience News. This task challenges the model to use visual perception and logical reasoning similar to human thinking, presenting a more complex problem.

The researchers evaluated 24 different multimodal large language models (MLLMs) using puzzles based on Raven’s Progressive Matrices, a common test for abstract reasoning—but the AI models fell short of achieving success.

“They did really badly. They couldn’t figure out anything,” Ahrabian said. The models had trouble both understanding the visuals and interpreting patterns.

However, the results weren’t uniform. Overall, the study found that open-source models struggled more with visual reasoning puzzles than closed-source models like GPT-4V, though even those didn’t match human cognitive abilities. The researchers managed to improve some models’ performance using a technique called Chain of Thought prompting, which guides the model step-by-step through the reasoning part of the test.

Closed-source models are believed to perform better in such tests because they’re specially developed, trained with larger datasets, and benefit from private companies’ computing resources.”GPT-4V showed some capability with reasoning, but it’s still far from flawless,” Ahrabian observed.

 “We still have much to learn about what new AI models are capable of,” added Jay Pujara, research associate professor and author. “Without understanding these limitations, we can’t refine AI to be better, safer, and more practical. This paper sheds light on a crucial gap, highlighting where AI currently falls short.”

By identifying weaknesses in AI models’ reasoning abilities, research like this can help guide efforts to develop those skills in the future — to achieve human-level logic. But don’t worry: For now, they’re not comparable to human cognition.

Discover How AI Reasoning Could Shape the Future

Curious about AI’s limitations in visual reasoning and its journey toward human-like logic? Take your AI understanding to the next level with our Prompt Engineering for Leaders course. Perfect for non-technical leaders and innovators, this course empowers you to navigate AI capabilities confidently—even without a technical background.

Explore our course here and start mastering the world of AI today!