←── back to feed
/topics/arxiv-cs-ai-papers-may-29-2026
arXiv cs.AI papers May 29 2026
50 items●1 sources●updated 19d ago●trend 0
On May 29, 2026, arXiv's cs.AI category published 20 papers spanning reinforcement learning, language models, AI safety, and applications in engineering, education, and clinical research. Topics ranged from temporal-difference learning methods and diffusion model concept erasure to LLM evaluation frameworks, hallucination mitigation, and autonomous agent deployment challenges.
[BLG]blog/rss50
Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction
Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction
The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling
Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems
Review Arcade: On the Human Alignment and Gameability of LLM Reviews
Orthogonal Concept Erasure for Diffusion Models
Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes
VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis
BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation
Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild
When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis
Mind Your Tone: Does Tone Alter LLM Performance?
Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence
Differentiable Belief-based Opponent Shaping
Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
Robust and Efficient Guardrails with Latent Reasoning
Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics
The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure
Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
Beyond Consensus: Trace-Level Synthesis in Mixture of Agents
PRO-CUA: Process-Reward Optimization for Computer Use Agents
The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models
Governing Technical Debt in Agentic AI Systems
Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction
Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents
ReasonOps: Operator Segmentation for LLM Reasoning Traces
GTA: Generating Long-Horizon Tasks for Web Agents at Scale
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility
Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth
Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI
DenseSteer: Steering Small Language Models towards Dense Math Reasoning
Provably Secure Agent Guardrail
OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories
Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling
When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop
Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies
CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces
Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models
Rubric-Guided Process Reward for Stepwise Model Routing
ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression
PassNet: Scaling Large Language Models for Graph Compiler Pass Generation
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models
EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization
Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark
When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs