←── back to feed
/topics/arxiv-cs-ai-papers-june-6-2026

arXiv cs.AI papers June 6 2026

50 items1 sourcesupdated 11d agotrend 0

On June 6, 2026, arXiv's cs.AI section published 20 papers spanning multi-agent communication efficiency, time series forecasting, AI evaluation benchmarks, program synthesis, medical literature summarization, and AI governance. Topics include LLM-based agents for long-horizon tasks, interpretability frameworks, quantization methods for efficient deployment, and technical verification of frontier AI training.

  • LeanMarathon: multi-agent harness for research-level Lean autoformalization using evolving blueprint abstraction with contract-scoped agents for construction, auditing, proving, and repair
  • SentinelBench: benchmark for evaluating long-running monitoring agents that sustain attention rather than continuous action, measuring performance on tasks spanning minutes to hours
  • SAGE-PTQ: ultra-low-bit quantization framework for LLMs that minimizes hidden scaling overhead by separating salient and unsalient weights using distributional statistics
  • Agents' Last Exam (ALE): benchmark evaluating AI agents on long-horizon, economically valuable real-world tasks with verifiable outcomes, addressing gap between benchmark performance and professional deployment
  • Zero-knowledge proof framework proposed for verifying frontier AI training compute without self-reporting, enabling technical verification for international AI governance agreements
[BLG]blog/rss50
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
arXiv cs.AI · Kokil Jaidka, Saifuddin Ahmed · 11d
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
arXiv cs.AI · Chen Huang, Yuhao Wu, Wenxuan Zhang · 11d
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
arXiv cs.AI · Shanhong Liu, Rui Cao, Pai Chet Ng, De Wen Soh · 11d
GITCO: Gated Inference-Time Context Optimization in TSFMs
arXiv cs.AI · Manya Pandey, Dhruv Kumar, Murari Mandal, Saurabh Deshpande · 11d
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
arXiv cs.AI · Nehal Afifi, Mehdi Khabou, Victor Mas, Jonas Hemmerich, Patric Grauberger, Stefan Dietrich, Volker Schulze, Sven Matthiesen · 11d
SentinelBench: A Benchmark for Long-Running Monitoring Agents
arXiv cs.AI · Matheus Kunzler Maldaner, Adam Fourney, Amanda Swearngin, Hussein Mozzanar, Gagan Bansal, Maya Murad, Rafah Hosn, Saleema Amershi · 11d
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
arXiv cs.AI · Jincheng Yu, Haoyang Li, Yiwen Liu, Shen Liu, Rachel Yuanbao Chen, C. Kent Kwoh, Hongxu Ding, Xiaoxiao Sun · 11d
Synthetic Contrastive Reasoning for Multi-Table Q&A
arXiv cs.AI · Ankit Pratap Singh, Xin Su, Phillip Howard · 11d
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
arXiv cs.AI · Srimonti Dutta, Akshata Kishore Moharir · 11d
Residual Modeling for High-Fidelity Learned Compression of Scientific Data
arXiv cs.AI · Liangji Zhu, Sanjay Ranka, Anand Rangarajan · 11d
LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
arXiv cs.AI · Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu · 11d
Harnessing Generalist Agents for Contextualized Time Series
arXiv cs.AI · Zihao Li, Kaifeng Jin, Yuanchen Bei, Jiaru Zou, Avaneesh Kumar, Xuying Ning, Yanjun Zhao, Mengting Ai, Baoyu Jing, Hanghang Tong, Jingrui He · 11d
Agents' Last Exam
arXiv cs.AI · Yiyou Sun, Xinyang Han, Weichen Zhang, Yuanbo Pang, Tianyu Wang, Yuhan Cao, Yixiao Huang, Chris Duroiu, Haoyun Zhang, Jeffrey Lin, Weishu Zhang, Tyler Zeng, Ying Yan, Bo Liu, Hanson Wen, Mingyang Xu, Xiaoyuan Liu, Zimeng Chen, Weiyan Shi, Amanda Dsouza, Vincent Sunn Chen, Patrick Bryant, Carl Boettiger, Yamini Rangan, Bradley Rothenberg, Kyle Steinfeld, Arvind Rao, Tapio Schneider, Georgios Yannakakis, Laure Zanna, Kaan Ozbay, Ida Sim, Tarek Zohdi, George Em Karniadakis, Jack Gallant, Teresa Head-gordon, Yushan Li, Wenxi Deng, Tao Sun, Huiqi Wang, Zhun Wang, Justin Xu, Chris Yuhao Liu, Yafei Cheng, Rongwang Hu, Aras Bacho, Shengcao Cao, Zengyi Qin, Yixiong Chen, Hengduan Fan, Hao Liu, Lin Zeng, Shashank Muralidhar Bharadwaj, Litian Gong, Yingxuan Yang, Maojia Song, Ruheng Wang, Zongzheng Zhang, Honglin Bao, Shuo Lu, Jianhong Tu, Zhonghua Wang, Zheng Zhang, Zijiao Chen, yanqiong Jiang, Zhendong Li, Bohan Lyu, Chang Ma, Peiran Xu, Benran Zhang, Shangding Gu, Haoyue Hua, Haoyang Li, Wanzhe Liao, Chengzhi Liu, Junbo Peng, Haoran Sun, Zechen Xu, Bo Chen, Jiayi Cheng, Yi Jiang, Keying Kuang, Yuan Li, Youbang Pan, Ziyan Rao, Alexander Schubert, Yifan Shen, Vincent Siu, Xiatao Sun, Kangqi Zhang, Xiaopan Zhang, Yuchen Zhu, Ishaan Singh Chandok, Lei Ding, Jingxuan Fan, Andrew Glover, Jiaming Hu, Yiran Hu, Wenbo Huang, Zixin Jiang, Haoran Jin, Lukas Kim, Ming Liu, Yang Liu, Alireza Rafiei, Xuhuan Shen, Kunyang Sun, Sophia Sun, Ting Sun, Eric Wang, Yixin Wang, Hanwen Xing, Sihan Xu, Yuzheng Xu, Zhongxing Xu, Zhiling Yan, Boqin Yuan, Ruiqi Zhang, Yifan Zhang, Zibo Zhao, Liana, Santanu Bosu Antu, Haoyue Bai, Carlo Bosio, Joseph Cavanagh, Patricia Cavazos-Rehg, Tianxing Chen, Xuewen Chen, Yipu Chen, Zhu Chenyu, Chen Dai, Stefano De Castro, Yunfu Deng, Kaustubh Dhole, Jiayuan Ding, Chenchen Du, Zhehang Du, Hao Fan, Run-ze Fan, Hengyu Fu, Shi Gu, Yifan Gu, Charlie Guo, Baihe Huang, Baixiang Huang, Rimika Jaiswal, Zhihan Jiang, Ran Jin, Erin Kasson, Xin Lan, Joseph Lee, Deren Lei, Chenyu Li, Daofeng Li, Haitao Li, Hongwei Li, Jingyan Li, Xiao Li, Yi Li, Yinsheng Li, Yuangang Li, Zhixu Li, Wenyu Liang, Longtai Liao, Kevin Qinghong Lin, AndyZeyi Liu, Che Liu, Jiaming Liu, Kaiyuan Liu, Xuan Liu, Pan Lu, Wenbo Lv, Yicheng Lv, Qiuyang Mang, Kyle Montgomery, Yuzhou Nie, Ruoxi Ning, Jorin Overwiening, Xu Pan, Layna Paraboschi, Core Francisco Park, Justin Purnomo, Swati Rajwal, Scott Rankin, Bixuan Ren, Yiren Rong, HaoYang Shang, Ventus Shaw, Fiona Shen, Jiawei Shen, Minqi Shi, Qiu Shi, Huaxiu Yao, Tianneng Shi, Jonah So, Vladislav Susoy, Hannah Szlyk, Haocheng Wang, Jialu Wang, Wei Wang, Xinyu Wang, Zehao Wang, Dowling Wong, Angela Wu, Dehao Wu, Fangyu Wu, Mengyuan "Millie" Wu, Yu Wu, Yuchen Wu, Yuhao Wu, Qingpo Wuwu, Weihang Xiao, Yongyi Xiong, Fan Xu, Ruiling Xu, Mingxuan Yan, Benjamin Yang, Jirong Yang, Sen Yang, Xiaoli Yang, Yushi Yang, Haoran Ye, Xiaohu Yu, Zhengming Yu, Chenlong Zhang, Chi Zhang, Hanning Zhang, Hanwen Zhang, Junge Zhang, Kunpeng Zhang, Song Zhang, Wenjin Zhang, Wenshuo Zhang, Ying Zhang, Yizhi Zhang, Brian Zhao, Qijian Zhao, Yimin Zhao, Yuhaohua Zheng, Liwei Zhou, Tianyue Zhou, Sichen Zhu, Siqi Zhu, Yan Zhu, Yishu Zhu, Jierui Zuo, Chonghao Cai, Helena Casademunt, Wenjia Chen, Benjamin Cheng, Nawen Deng, Rao Fu, Tianfu Fu, Yifan Han, Ren He, Zhenyu He, Qiao Jin, Lang Lang, Yuetai Li, Sylvia Liu, Lu Lu, Qing Lu, Subhabrata Mukherjee, Yunqi Ouyang, Yin Ren, Dawei Shi, Haoran Wu, Zhiyue Wu, Hannah Yao, Zhuoran Yi, Jenny Yu, Rhea Zhan, Hang Zhou, Blake Zhu, Junfan Zhu, Alan Yuille, Yang Liu, Russell Alan Poldrack, Jiachen Li, Zhenglu Li, Molei Tao, Jing Huang, Wenqi Shi, Costas Spanos, Lichao Sun, Chenguang Wang, Orson Xu, Zhen Dong, Hector Gomez, Aylin Caliskan, Ali Emami, Haimin Hu, Zhi Li, Lihui Liu, Murphy Niu, Yi Shao, Jianxin Sun, Mikko Tolonen, Ting Wang, Sanjiv Das, Yanjun Gao, Wenbo Guo, Erika J Schneider, Zhiyong Lu, Mark Mueller, Radha Poovendran, Somayeh Sojoudi, Dawn Song · 11d
Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution
arXiv cs.AI · Can Gurkan, Forrest Stonedahl, Uri Wilensky · 11d
A Motivational Architecture for Conversational AGI
arXiv cs.AI · Anna Mikeda, Ben Goertzel · 11d
Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers
arXiv cs.AI · Gianluca Guidi, Francesca Dominici, Tiziano Squartini, Callaway Sprinkle, Jonathan Gilmour, Kevin Butler, Eric Bell, Scott Delaney, Falco J. Bargagli-Stoffi · 11d
Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models
arXiv cs.AI · Rayyan Abdalla, Amir Hussein, Min Wu, Dinesh Manocha · 11d
Zero knowledge verification for frontier AI training is possible
arXiv cs.AI · Pierre Peign\'e, Ky Nguyen, Paul Wang · 11d
Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison
arXiv cs.AI · Alejandro Lozano, Keiko Ihara, Ping-Hao Yang, Carrie E. Robertson, Jennifer Stern, Allan Purdy, Hsiangkuo Yuan, Pengfei Zhang, Yulia Orlova, Olga Fermo, Jennifer Hranilovich, Fred Cohen, Todd J. Schwedt, Jenelle A. Jindal, Serena Yeung-Levy, Chia-Chun Chiang · 11d
Brick-Composer: Using MLLMs for Assembly with Diverse Bricks
arXiv cs.AI · Jiateng Liu, Bingxuan Li, Zhenhailong Wang, Rushi Wang, Kaiwen Hong, Cheng Qian, Jiayu Liu, Denghui Zhang, Katherine Driggs-Campbell, Manling Li, Heng Ji · 11d
Insurance of Agentic AI
arXiv cs.AI · Quanyan Zhu · 11d
Output Type Before Quality: A Standards-Derived XAI Admissibility Rubric for Autonomous-Driving Safety
arXiv cs.AI · Abhinaw Priyadershi, Mandar Pitale, Jelena Frtunikj, Maria Spence · 11d
PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage
arXiv cs.AI · Keqi Han, Ryan Young, Annabel Strauss, Lindsey Hughes, Katharine M. Nesbitt, Nicole Schueler, Che Ngufor, Carl Yang, Yuan Xue, Zhijun Yin · 11d
Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces
arXiv cs.AI · Nicol\'as Astorga, Nabeel Seedat, Mihaela van der Schaar · 11d
Severity-Aware Curriculum Learning with Multi-Model Response Selection for Medical Text Generation
arXiv cs.AI · Ahmed Alansary, Molham Mohamed, Ali Hamdi · 11d
EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts
arXiv cs.AI · Yiming Lu, Sihang Zeng, Zhengxu Tang, Max Lau, Fei Liu, Wei Jin · 11d
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization
arXiv cs.AI · Kuangshi Ai, Haichao Miao, Kaiyuan Tang, Shusen Liu, Chaoli Wang · 11d
When Should We Protect AI? A Precautionary Framework for Consciousness Uncertainty
arXiv cs.AI · Anna Mikeda · 11d
Individual Gain, Collective Loss: Metacognitive Adaptation in AI-Assisted Creativity
arXiv cs.AI · Anna Mikeda · 11d
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
arXiv cs.AI · Taewon Yun, Hyeonseong Park, Jeonghwan Choi, Hayoon Park, Yeeun Choi, Hwanjun Song · 11d
GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection
arXiv cs.AI · Paulo Ricardo Ferreira Neves, Edson Rodrigues da Cruz Filho, Paulo Henrique Eleuterio Falsetti, Jo\~ao Vitor Pavan, Ian Degaspari, Henrique Vieira Laturrague, Patrick Vieira Laturrague, Guilherme Nielsen Dias, Marccello Wilson Perez Berto, Gustavo Voltani Von Atzingen · 11d
Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization
arXiv cs.AI · Ayano Hiranaka, Ya-Chuan Hsu, Stefanos Nikolaidis, Erdem B{\i}y{\i}k, Daniel Seita · 11d
Multilingual Fine-Tuning via Localized Gradient Conflict Resolution
arXiv cs.AI · Long P. Hoang, Yiran Zhao, Wei Lu, Wenxuan Zhang · 11d
Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack
arXiv cs.AI · Long P. Hoang, Hai V. Le, Shaoyang Xu, Wei Lu, Wenxuan Zhang · 11d
Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
arXiv cs.AI · Bonan Shen, Youting Wang, Dingyan Shang, Tao Ning · 11d
Evaluation of LLMs for Mathematical Formalization in Lean
arXiv cs.AI · Tyson Klingner, Drew Bladek, Escher Crawford, Bohao Chen, Ariel Fu, Kaira Nair, Jarod Alper, Giovanni Inchiostro, Vasily Ilin · 11d
Answer Presence Drives RAG Rewriting Gains
arXiv cs.AI · Yuejie Li, Yueying Hua, Ke Yang, Li Zhang, Yueping He, Yueping He, Ruiqi Li, Bolin Chen, Tao Wang, Bowen Li, Chengjun Mao · 11d
FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG
arXiv cs.AI · Zhe Yu, Wenpeng Xing, Tiancheng Zhao, Mohan Li, Changting Lin, Meng Han · 11d
Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?
arXiv cs.AI · Jingheng Ye, Huiqi Zou, Simon Yu, Weiyan Shi · 11d
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
arXiv cs.AI · Parth Asawa, Christopher M. Glaze, Gabriel Orlanski, Ramya Ramakrishnan, Benji Xu, Asim Biswal, Vincent Sunn Chen, Frederic Sala, Matei Zaharia, Joseph E. Gonzalez · 11d
Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows
arXiv cs.AI · Yuhang Fu, Ruishan Fang, Jiaqi Shao, Huiyu Zheng, Zhengtao Zhu, Bing Luo, Tao Lin · 11d
Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio
arXiv cs.AI · Fangbo Tu, Junhua Zhao, Chi Liu, Xin Chen, Haifeng Wu, Jian Wan, Srinivasan Manoharan · 11d
AdaMEM: Test-Time Adaptive Memory for Language Agents
arXiv cs.AI · Yunxiang Zhang, Yiheng Li, Ali Payani, Lu Wang · 11d
PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation
arXiv cs.AI · Nicolas Bougie, Xiaotong Ye, Gian Maria Marconi, Narimasa Watanabe · 11d
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
arXiv cs.AI · Haoyu Zhou, Qing Qing, Caichong Li, Qixin Zhang, Yongcheng Jing, Ziqi Xu, Juncheng Hu, Xikun Zhang, Renqiang Luo · 11d
Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
arXiv cs.AI · Muhammad Talha Sharif, Abdul Rehman · 11d
DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance
arXiv cs.AI · Yansi Li, Zhuosheng Zhang · 11d
When AI Says It Feels
arXiv cs.AI · Shin-nosuke Ishikawa, Seiya Ikeda, Hirotsugu Ohba · 11d
Class-Specific Branch Attention for Mitigating Gradient Interference under Class Imbalance
arXiv cs.AI · Arush Singhal, Umang Soni · 11d
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents
arXiv cs.AI · Wenxuan Wang, Haoyu Sun, Fukuan Hou, Mingyang Song, Weinan Zhang, Yu Cheng, Yang Yang · 11d