←── back to feed
/topics/arxiv-cs-ai-papers-may-26-2026

arXiv cs.AI papers May 26 2026

16 items1 sourcesupdated 22d agotrend 0

On May 26, 2026, arXiv's cs.AI section published 16 papers spanning reasoning efficiency in LLMs, agent system reliability, knowledge representation, and specialized applications including medical dialogue robustness, quantum computing integration, and assistive robotics. Topics ranged from confidence calibration and redundancy in chain-of-thought reasoning to runtime execution safety in autonomous agents and web agent skill modeling.

  • Confidence Calibration study shows LLMs overconfident on hard tasks but underconfident on easy ones; introduces LifeEval benchmark for difficulty-stratified evaluation
  • Reasoning redundancy paper quantifies unnecessary deliberation in long-chain-of-thought traces across reasoning-capable models at scale
  • Med-Stress framework reveals knowledge-robustness gap: nine frontier LLMs abandon correct diagnoses under escalating clinical pressure despite strong benchmarks
  • BODHI domain-knowledge prompting achieves 55.10% Pass@1 on OSV-Bench's 245 OS kernel specification generation tasks
  • DRIVE separates reasoning (abstract, transferable) from interaction (page-specific) knowledge for web agents under continual learning
[BLG]blog/rss16
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models
arXiv cs.AI · Sam Earle, Kay Arulkumaran, Andrew Dai, Akarsh Kumar, Julian Togelius, Sebastian Risi · 22d
Confidence Calibration in Large Language Models
arXiv cs.AI · Noam Michael, Daniel BenShushan, Jacob Bien, Don A. Moore · 22d
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
arXiv cs.AI · Zhiyuan Zhai, Xinkai You, Wenjing Yan, Xin Wang · 22d
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction
arXiv cs.AI · Gregory Magarshak · 22d
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs
arXiv cs.AI · Ya-Ting Yang, Quanyan Zhu · 22d
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game
arXiv cs.AI · Saad Mankarious · 22d
BODHI: Precise OS Kernel Specification Inference
arXiv cs.AI · Zhiming Chang, Ziyang Li · 22d
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
arXiv cs.AI · Boyu Xiao, Xiuqi Tian, Xuwen Song, Haochun Wang, Guanchun Song, Sendong Zhao, Bing Qin · 22d
Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model
arXiv cs.AI · Wang Rui, Lu Diannan · 22d
Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems
arXiv cs.AI · Marcelo Fernandez - TraslaIA · 22d
Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications
arXiv cs.AI · Takaaki Fujita, Florentin Smarandache · 22d
BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization
arXiv cs.AI · Bruno F. Louren\c{c}o, Hesham Morgan, Ana Ozaki, Aleksandar Pavlovi\'c, Emanuel Sallinger · 22d
Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors
arXiv cs.AI · Long Zhang, Zi-bo Qin, Wei-neng Chen · 22d
DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning
arXiv cs.AI · Xirui Liu, Sihang Zhou, Yanning Hou, Rong Zhou, Haoyuan Chen, Maolin He, Siwei Wang, Hao Chen, Jian Huang · 22d
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
arXiv cs.AI · Sebastien Kawada · 22d
MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics
arXiv cs.AI · Maissa Abir Smaili, Eren Sadikoglu, Ransalu Senanayake · 22d