/topics/arxiv-cs-ai-papers-may-26-2026

arXiv cs.AI papers May 26 2026

16 items●1 sources●updated 22d ago●trend 0

┌─ summary ─────────────────────────────┐

On May 26, 2026, arXiv's cs.AI section published 16 papers spanning reasoning efficiency in LLMs, agent system reliability, knowledge representation, and specialized applications including medical dialogue robustness, quantum computing integration, and assistive robotics. Topics ranged from confidence calibration and redundancy in chain-of-thought reasoning to runtime execution safety in autonomous agents and web agent skill modeling.

┌─ key points ──────────────────────────┐

Confidence Calibration study shows LLMs overconfident on hard tasks but underconfident on easy ones; introduces LifeEval benchmark for difficulty-stratified evaluation
Reasoning redundancy paper quantifies unnecessary deliberation in long-chain-of-thought traces across reasoning-capable models at scale
Med-Stress framework reveals knowledge-robustness gap: nine frontier LLMs abandon correct diagnoses under escalating clinical pressure despite strong benchmarks
BODHI domain-knowledge prompting achieves 55.10% Pass@1 on OSV-Bench's 245 OS kernel specification generation tasks
DRIVE separates reasoning (abstract, transferable) from interaction (page-specific) knowledge for web agents under continual learning

┌─ items (16) ──────────────────────────┐

[BLG]blog/rss16

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

arXiv cs.AI · Sam Earle, Kay Arulkumaran, Andrew Dai, Akarsh Kumar, Julian Togelius, Sebastian Risi · 22d

Confidence Calibration in Large Language Models

arXiv cs.AI · Noam Michael, Daniel BenShushan, Jacob Bien, Don A. Moore · 22d

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

arXiv cs.AI · Zhiyuan Zhai, Xinkai You, Wenjing Yan, Xin Wang · 22d

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction

arXiv cs.AI · Gregory Magarshak · 22d

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv cs.AI · Ya-Ting Yang, Quanyan Zhu · 22d

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

arXiv cs.AI · Saad Mankarious · 22d

BODHI: Precise OS Kernel Specification Inference

arXiv cs.AI · Zhiming Chang, Ziyang Li · 22d

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

arXiv cs.AI · Boyu Xiao, Xiuqi Tian, Xuwen Song, Haochun Wang, Guanchun Song, Sendong Zhao, Bing Qin · 22d

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

arXiv cs.AI · Wang Rui, Lu Diannan · 22d

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

arXiv cs.AI · Marcelo Fernandez - TraslaIA · 22d

Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications

arXiv cs.AI · Takaaki Fujita, Florentin Smarandache · 22d

BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization

arXiv cs.AI · Bruno F. Louren\c{c}o, Hesham Morgan, Ana Ozaki, Aleksandar Pavlovi\'c, Emanuel Sallinger · 22d

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors

arXiv cs.AI · Long Zhang, Zi-bo Qin, Wei-neng Chen · 22d

DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning

arXiv cs.AI · Xirui Liu, Sihang Zhou, Yanning Hou, Rong Zhou, Haoyuan Chen, Maolin He, Siwei Wang, Hao Chen, Jian Huang · 22d

Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

arXiv cs.AI · Sebastien Kawada · 22d

MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics

arXiv cs.AI · Maissa Abir Smaili, Eren Sadikoglu, Ransalu Senanayake · 22d