←── back to feed
/topics/arxiv-cs-ai-papers-june-12-2026

arXiv cs.AI papers June 12 2026

50 items1 sourcesupdated 4d agotrend 0

On June 12, 2026, arXiv's cs.AI section published 20 papers spanning agent frameworks, formal reasoning, safety evaluation, and multimodal learning. Topics include tool-use optimization for LLM agents, tree-search cognition layers, clinical LLM deployment, AGI definitions, and unlearning benchmarks for multimodal models.

  • ToolSense (arXiv:2606.12451) audits parametric tool retrieval in LLMs using virtual token encoding fine-tuned in two stages
  • Arbor (arXiv:2606.12563) introduces tree search as shared working memory across multi-agent systems in stateful action spaces
  • Pythagoras-Prover (arXiv:2606.12594) offers compute-efficient Lean theorem provers at 4B and 32B parameters plus diffusion-based variant
  • SciAgentArena (arXiv:2606.12736) benchmarks AI agents on real-world scientific tasks with interactive evaluation support
  • MLUBench (arXiv:2606.12809) provides large-scale benchmark with 127 examples for lifelong unlearning in multimodal LLMs
[BLG]blog/rss50
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs
arXiv cs.AI · Ashutosh Hathidara, Sai Shruthi Sistla, Sebastian Schreiber, Sahil Bansal · 5d
Arbor: Tree Search as a Cognition Layer for Autonomous Agents
arXiv cs.AI · Neha Prakriya, Chaojun Hou, Zheng Gong, Huasha Zhao, Xi Zhao, Mou Li, Zhenyu Gu, Emad Barsoum · 5d
Strategic Decision Support for AI Agents
arXiv cs.AI · Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani · 5d
Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation
arXiv cs.AI · Joshua Ong Jun Leang, Zheng Zhao, Mihaela C\u{a}t\u{a}lina Stoian, Qiyuan Xu, Haonan Li, Wenda Li, Shay B. Cohen, Eleonora Giunchiglia · 5d
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation
arXiv cs.AI · Mahmoud Srewa, Praneetsai Iddamsetty, Mohammad Abdullah Al Faruque, Salma Elmalaki · 5d
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms
arXiv cs.AI · Alan Cooney, David Africa, Geoffrey Irving · 5d
TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation
arXiv cs.AI · Siyu Li, Toan Tran, Lingyi Zhao, Khurram Shafique, Li Xiong · 5d
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
arXiv cs.AI · Kushal Raj Bhandari, Ling Yue, Ching-Yun Ko, Dhaval Patel, Shaowu Pan, Pin-Yu Chen, Jianxi Gao · 5d
From AGI to ASI
arXiv cs.AI · Tim Genewein, Matija Franklin, Alexander Lerchner, Laurent Orseau, Samuel Albanie, Adam Bales, Cole Wyeth, Stephanie Chan, Iason Gabriel, Joel Z. Leibo, Allan Dafoe, Marcus Hutter, Thore Graepel, Shane Legg · 5d
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System
arXiv cs.AI · Alyssa Unell, Miguel Fuentes, Brenna Li, Bridget Lin, Meena Jagadeesan, Sanmi Koyejo, Nigam Shah · 5d
Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI
arXiv cs.AI · J. E. Aguilera Briones · 5d
The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism
arXiv cs.AI · Nikolos Gurney, Stacy Marsella · 5d
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
arXiv cs.AI · Rafal Kocielnik, Pengrui Han, Peiyang Song, Myrl G. Marmarelis, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez · 5d
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales
arXiv cs.AI · Tianyu Liu, Allen Xin Wang, Antonia Panescu, Lisa Xinyi Chen, Wenxin Long, Xinyu Wei, Yueqian Jing, Ziyao Zeng, Jihang Chen, Sihan Jiang, Ziqing Wang, Siyi Gu, Siyu Chen, Xinyang Hu, Haoran Shao, Leqi Xu, Wangjie Zheng, Zhiyuan Cao, Ada Fang, Botao Yu, Kunyang Sun, Rex Ying, Arman Cohan, Qingyu Chen, Lingzhou Xue, Kaize Ding, Yuanqi Du, Wengong Jin, Zhuoran Yang, Marinka Zitnik, James Zou, Hua Xu, Hongyu Zhao · 5d
Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices
arXiv cs.AI · Farough Shayeste Roodi, Parham Zilouchian Moghaddam, Mahdi Mohammadi-nasab, Mehdi Modarressi, Mostafa Ersali Salehi Nasab, Masoud Daneshtalab · 5d
Prefill Awareness in Large Language Models
arXiv cs.AI · Andy Wang, Parv Mahajan, David Demitri Africa, Alexandra Souly, Jordan Taylor, Robert Kirk · 5d
Constructing Evaluation Datasets for Procedural Reasoning: Balancing Naturalness, Grounding, and Multi-Hop Coverage
arXiv cs.AI · Sarah Elshabrawy, Rahul K. Dass, Ashok K. Goel · 5d
A Tutorial on World Models and Physical AI
arXiv cs.AI · Il-Seok Oh · 5d
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements
arXiv cs.AI · Md Jafrin Hossain, Mohammad Arif Hossain, Weiqi Liu, Nirwan Ansari · 5d
MLUBench: A Benchmark for Lifelong Unlearning Evaluation in MLLMs
arXiv cs.AI · He Li, Haoang Chi, Qizhou Wang, Yunxin Mao, Zhiheng Zhang, Jie Tan, Tongliang Liu, Wenjing Yang, Bo Han · 5d
Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents
arXiv cs.AI · Yudong Zhang (Honor Device Co., Ltd), Lei Hu (Honor Device Co., Ltd), Daoyang Liu (The Chinese University of Hong Kong, Hong Kong, China), Jiawei Liu (Honor Device Co., Ltd), Yangfan Luo (Honor Device Co., Ltd), Xingyu Liu (Honor Device Co., Ltd), Zuojian Wang (Honor Device Co., Ltd), Zhilin Gao (Honor Device Co., Ltd) · 5d
GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models
arXiv cs.AI · Gabriel Diaz-Ireland, Diego Prieto-Herr\'aez, Mario Garc\'ia Peces, Javier Vel\'azquez, Devika Jain · 5d
Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics
arXiv cs.AI · Rasul Khanbayov, Hasan Kurban · 5d
Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement
arXiv cs.AI · Woong Shin, Craig A. Bridges, Marshall T. McDonnell, Rafael Ferreira da Silva · 5d
(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable
arXiv cs.AI · Chen Zhu, Xiaolu Wang, Weilong Zhang · 5d
WISE: A Long-Horizon Agent in Minecraft with Why-Which Reasoning
arXiv cs.AI · Renmin Cheng (The Hong Kong University of Science,Technology), Changhao Chen (The Hong Kong University of Science,Technology) · 5d
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks
arXiv cs.AI · Jingxuan Han, Wei Liu, Mingyang Zhu, Youpeng Wang, Ziwen Wang, Lin Qiu, Xuezhi Cao, Xunliang Cai, Zheren Fu, Licheng Zhang, Zhendong Mao · 5d
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
arXiv cs.AI · Xiaoxuan Wang, Haixin Wang, Alexander Taylor, Jason Cong, Yizhou Sun, Wei Wang · 5d
The Hidden Power of Scaling Factor in LoRA Optimization
arXiv cs.AI · Zicheng Zhang, Haoran Li, Jiaxing Wang, Guoqiang Gong, Anqi Li, Yudong Hu, Ting Xiong, Yurong Gao, Junxing Hu, Zhida Jiang, Yifeng Zhang, Pengzhang Liu, Qixia Jiang · 5d
Zero-source LLM Hallucination Detection with Human-like Criteria Probing
arXiv cs.AI · Jiahao Yang, Shuhai Zhang, Hailong Kang, Feng Liu, Qi Chen, Mingkui Tan · 5d
MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback
arXiv cs.AI · Zehong Wang, Yijun Ma, Connor R. Schmidt, Tianyi Ma, Weixiang Sun, Ziming Li, Xiaoguang Guo, Chuxu Zhang, Matthew J. Webber, Yanfang Ye · 5d
Iterating Toward Better Search: A Two-Agent Simulation Framework for Evaluating Agentic Search Architectures in E-Commerce
arXiv cs.AI · Jetlir Duraj, Jayanth Yetukuri, Shuang Zhou, Dhruv Varma, Rui Kong, Ishita Khan, Qunzhi Zhou · 5d
MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling
arXiv cs.AI · Wenbo Chen, Puheng Li, Mengyang Liu, Weijie Su, Tianpei Xie · 5d
PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization
arXiv cs.AI · Hao Jiang, Xin Li, Annan Wang, Zhi Yang, Haoxiang Zhang, Yichi Zhang, Weisi Lin · 5d
Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory
arXiv cs.AI · Zhibao Chen, Qian Cheng · 5d
OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models
arXiv cs.AI · Ibrahim Gulluk, Max Van Puyvelde, Olivier Gevaert · 5d
Multi-Modal Agents for Power Distribution Defect Detection: An Evaluation of Foundation Models
arXiv cs.AI · Quan Quan · 5d
A Mathematical Forum Platform for Collaborative Problem Solving and Dataset Generation for AI Reasoning
arXiv cs.AI · Akbar Erkinov, Nurmukhammad Abdurasulov · 5d
Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation
arXiv cs.AI · En-Ming Huang, Yu-Hung Kao, Ren-Hao Deng, Wei-Po Hsin, Yao-Ting Hsieh, Cheng Liang, Hsiang-Yu Tsou, Mu-Chi Chen, Yu-Kai Hung, Shao-Chun Ho, Po-Hsuang Huang, Shih-Hao Hung, H. T. Kung · 5d
APCyc: Property-Informed Design of Cyclic Peptides via Automated Cyclization
arXiv cs.AI · Yifan Zhao, Lang Qin, Jintai Chen · 5d
The Illusion of Multi-Agent Advantage
arXiv cs.AI · Prathyusha Jwalapuram, Hehai Lin, Chuyuan Li, Fangkai Jiao, Sudong Wang, Yifei Ming, Zixuan Ke, Chengwei Qin, Giuseppe Carenini, Shafiq Joty · 5d
Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer
arXiv cs.AI · Zhanglu Yan, Jiayi Mao, Kaiwen Tang, Fanfan Li, Gang Pan, Tao Luo, Bowen Zhu, Qianhui Liu, Weng-Fai Wong · 5d
SciR: A Controllable Benchmark for Scientific Reasoning in LLMs
arXiv cs.AI · Pierre Beckmann, Marco Valentino, Andre Freitas · 5d
Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior
arXiv cs.AI · Haowei Qian · 5d
Augmentation techniques for video surveillance in the visible and thermal spectral range
arXiv cs.AI · Vanessa Buhrmester, Ann-Kristin Grosselfinger, David Munch, Michael Arens · 5d
AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction
arXiv cs.AI · Fabien Maury (Imagine - U1163, HeKA | U1346), Sol\`ene Grosdidier (Imagine - U1163), Maud de Dieuleveult (Imagine - U1163), Adrien Coulet (HeKA | U1346) · 5d
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
arXiv cs.AI · Yuho Lee, Jisu Shin, Nicole Hee-Yeon Kim, Jihwan Bang, Juntae Lee, Kyuwoong Hwang, Fatih Porikli, Hwanjun Song · 5d
TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?
arXiv cs.AI · Dat Tien Nguyen, Thao Nguyen, Fadillah Adamsyah Maani, Huy M. Le, Muhammad Umer Sheikh, Numan Saeed, Muhammad Haris Khan, Salman Khan · 5d
Mental-R1: Aligning LLM Reasoning for Mental Health Assessment
arXiv cs.AI · Xin Wang, Boyan Gao, Yibo Yang, David A. Clifton · 5d
Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach
arXiv cs.AI · Ruichao Mao, Zhou Fang, Teng Guo, Hao Yang, Yaping Li, Shaohua Peng, Maji Huang, Xiaoyu Lin, Shuoyang Liu, Xuepeng Li, Yuyu Zhang, Hai Rao · 5d