←── back to feed
/topics/arxiv-cs-ai-papers-may-26-2026
arXiv cs.AI papers May 26 2026
16 items●1 sources●updated 22d ago●trend 0
On May 26, 2026, arXiv's cs.AI section published 16 papers spanning reasoning efficiency in LLMs, agent system reliability, knowledge representation, and specialized applications including medical dialogue robustness, quantum computing integration, and assistive robotics. Topics ranged from confidence calibration and redundancy in chain-of-thought reasoning to runtime execution safety in autonomous agents and web agent skill modeling.
- Confidence Calibration study shows LLMs overconfident on hard tasks but underconfident on easy ones; introduces LifeEval benchmark for difficulty-stratified evaluation
- Reasoning redundancy paper quantifies unnecessary deliberation in long-chain-of-thought traces across reasoning-capable models at scale
- Med-Stress framework reveals knowledge-robustness gap: nine frontier LLMs abandon correct diagnoses under escalating clinical pressure despite strong benchmarks
- BODHI domain-knowledge prompting achieves 55.10% Pass@1 on OSV-Bench's 245 OS kernel specification generation tasks
- DRIVE separates reasoning (abstract, transferable) from interaction (page-specific) knowledge for web agents under continual learning
[BLG]blog/rss16
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models
Confidence Calibration in Large Language Models
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game
BODHI: Precise OS Kernel Specification Inference
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model
Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems
Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications
BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization
Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors
DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
MEMOR-E: In-Context and Fine-Tuned LLM Personalization for Alzheimer's Assistive Robotics