←── back to feed
/topics/arxiv-cs-cl-papers-june-5-2026

arXiv cs.CL papers June 5 2026

50 items1 sourcesupdated 11d agotrend 0

On June 5, 2026, arXiv's cs.CL section published 20 papers spanning model collapse dynamics, self-supervised learning objectives, medical LLM fine-tuning, multimodal safety benchmarks, streaming ASR punctuation, interpretability frameworks, long-context memory, sycophancy audits, reasoning distillation, persuasion tracing, personalization, language processing dynamics, reasoning trace analysis, early failure detection, schema discovery, machine translation complexity, and multilingual coreference resolution.

  • Bilayer SIR/SIRS framework models cross-contamination of synthetic data across AI ecosystem populations (arXiv:2606.05168)
  • JEPA-inspired hybrid pre-training combines latent-space prediction with masked language modeling for deeper semantic representations (arXiv:2606.05173)
  • GRPO post-training with variance-aware rubric rewards applied to Qwen2.5-3B for heart-focused medical QA (arXiv:2606.05174)
  • MCBench introduces 1,196 multimodal safety scenarios across vision, audio, and text for Omni LLMs (arXiv:2606.05177)
  • LANTERN memory layer recovers 78.3% of ground-truth facts in compacted conversations with <25ms latency per turn (arXiv:2606.05182)
  • Sycophancy audit across Gemini 2.0, 2.5, 3.0 variants reveals social-compliance behaviors beyond binary false outputs (arXiv:2606.05183)
[BLG]blog/rss50
Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
arXiv cs.CL · Xiangyu Wang · 12d
Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning
arXiv cs.CL · Aimen Boukhari · 12d
Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
arXiv cs.CL · Arash Ahmadi, Parisa Masnadi, Sarah Sharif, Charles Nicholson, David Ebert, Mike Banad · 12d
Generic Triple-Latent Compression with Gated Associative Retrieval
arXiv cs.CL · Liu Xiao · 12d
PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis
arXiv cs.CL · Lucas Tamic, Ilan Jaffeux-Cheniout, Xavier Marjou · 12d
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models
arXiv cs.CL · Manh Luong, Tamas Abraham, Junae Kim, Amar Kaur, Rollin Omari, Gholamreza Haffari, Trang Vu, Lizhen Qu, Dinh Phung · 12d
Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems
arXiv cs.CL · Sungmook Woo, Hyungu Kang, Chanwoo Kim · 12d
From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment
arXiv cs.CL · Ivo Bueno, Babette B\"uhler, Philipp Stark, Tim F\"utterer, Ulrich Trautwein, Dorottya Demszky, Heather Hill, Enkelejda Kasneci · 12d
Multi-Granularity Reasoning for Natural Language Inference
arXiv cs.CL · Chunling Xi, Di Liang · 12d
LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations
arXiv cs.CL · Rahul Subramani · 12d
The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models
arXiv cs.CL · Patrick Keough · 12d
LoRi: Low-Rank Distillation for Implicit Reasoning
arXiv cs.CL · Ryan Solgi, Jiayi Tian, Zheng Zhang · 12d
A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing
arXiv cs.CL · Jared Moore, Noah Goodman, Nick Haber, Max Kleiman-Weiner · 12d
Self-supervised User Profile Generation for Personalization
arXiv cs.CL · Clark Mingxuan Ju, Yuwei Qiu, Tong Zhao, Neil Shah · 12d
Trajectory Dynamics in Language Model Hidden States Predict Human Processing Costs Beyond Surprisal
arXiv cs.CL · Elan Barenholtz · 12d
ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces
arXiv cs.CL · Jinu Lee, Shivam Agarwal, Amruta Parulekar, Siddarth Madala, Dilek Hakkani-Tur, Julia Hockenmaier · 12d
When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories
arXiv cs.CL · Avinash Baidya, Xinran Liang, Ruocheng Guo, Xiang Gao, Kamalika Das · 12d
Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval
arXiv cs.CL · Padmaja Jonnalagedda, Yuguang Yao, Xiang Gao, Hilaf Hasson, Kamalika Das · 12d
ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation
arXiv cs.CL · Joseph Marvin Imperial, Junhong Liang, Belal Shoer, Abdullah Barayan, Rodrigo Wilkens, Omar Mussa, Dawn Knight, Eug\'enio Ribeiro, Ekaterina Kochmar, Sowmya Vajjala, Fernando Alva-Manchego, Harish Tayyar Madabushi · 12d
Multilingual Coreference Resolution via Cycle-Consistent Machine Translation
arXiv cs.CL · Adriana-Valentina Costache, Eduard Poesina, Silviu-Florin Gheorghe, Paul Irofti, Radu Tudor Ionescu · 12d
Localizing Prompt Ambiguity in Large Language Models with Probe-Targeted Attribution
arXiv cs.CL · Govind Ramesh, Yao Dou, Wei Xu · 12d
MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization
arXiv cs.CL · Ahmed Alansary, Ali Hamdi · 12d
CHASE: Adversarial Red-Blue Teaming for Improving LLM Safety using Reinforcement Learning
arXiv cs.CL · Rahul Markasserithodi, Aditya Joshi, Yuekang Li, Ishmanbir Singh, Chris Yoo, Alan Niu · 12d
Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach
arXiv cs.CL · Nadine Yasser Abdelhalim, Emmanuel Akinrintoyo, Nicole Salomons · 12d
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
arXiv cs.CL · Woojung Song, Nalim Kim, Sangjun Song, Chaewon Heo, Jongwon Lim, Yohan Jo · 12d
AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents
arXiv cs.CL · Yang Li, Jiaxiang Liu, Jiang Cai, Mingkun Xu · 12d
InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization
arXiv cs.CL · Xueyang Wu, Siyuan Liu, Kezhuo Yang, Guang Ling · 12d
Using Large Language Models to Support High Volume Application Review for an Undergraduate Research Program
arXiv cs.CL · Varun Aggarwal, Kay Kobak, John Howarter · 12d
Domain-Aware Mispronunciation Detection and Diagnosis Using Language-Specific Statistical Graphs
arXiv cs.CL · Huu Tuong Tu, Hanh Nguyen, Thien Van Luong, Nguyen Tien Cuong, Vu Huan, Nguyen Thi Thu Trang · 12d
TensorBench: Benchmarking Coding Agents on a Compiler-Based Tensor Framework
arXiv cs.CL · Bobby Yan, Fredrik Kjolstad · 12d
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training
arXiv cs.CL · Yongwei Zhou, Juncheng Diao, Junlin Shang, Peiguang Li, Rongxiang Weng · 12d
What's in a Name? Morphological Shortcuts by LLMs in Pharmacology
arXiv cs.CL · Kaijie Mo, Thomas Yang, Chantal Shaib, Qing Yao, William Rudman, Ramez Kouzy, Kanishka Misra, Byron C. Wallace, Junyi Jessy Li · 12d
An ERP Study on Recursive Locative Processing in Mandarin-Speaking Children with Autism
arXiv cs.CL · Xiaoyi Wang, Chenxi Fu, Ziman Zhuang, Caimei Yang · 12d
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
arXiv cs.CL · Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji · 12d
When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer
arXiv cs.CL · Zhen Sun, Yifan Liao, Zhicong Huang, Jiaheng Wei, Cheng Hong, Yutao Yue, Xinlei He · 12d
Bootstrapping Semantic Layer from Execution for Text-to-SQL
arXiv cs.CL · Youngwon Lee, Jaejin Kim, Seung-won Hwang · 12d
QueryAgent-R1: Bridging Query Generation and Product Retrieval for E-Commerce Query Recommendation
arXiv cs.CL · Dike Sun, Zheng Zou, Jingtong Zang, Qi Sun, Huaipeng Zhaoand Tao Luo, Xiaoyi Zeng · 12d
Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models
arXiv cs.CL · Hancheol Park, Geonho Lee, Tairen Piao, Tae-Ho Kim · 12d
Rethinking LoRA Memory Through the Lens of KV Cache Compression
arXiv cs.CL · Chunsheng Zuo, Liaoyaqi Wang, William Jurayj, William Fleshman, Benjamin Van Durme · 12d
Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems
arXiv cs.CL · Yingzhuo Liu · 12d
Interpreting Style Representations via Style-Eliciting Prompts
arXiv cs.CL · Junghwan Kim, David Jurgens · 12d
Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding
arXiv cs.CL · Qiuyu Tian, Fengyi Chen, Yiding Li, Youyong Kong, Fan Guo, Yuyao Li, Jinjing Shen, Zhijing Xie, Yiyun Luo, Xin Zhang, Yingce Xia, Zequn Liu · 12d
AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding
arXiv cs.CL · Runheng Liu, Jincheng Xie, Wen Hu, Xingchen Xiao, Heyan Huang · 12d
PlanBench-V: A Spatial Planning Map Benchmark for Vision-Language Models
arXiv cs.CL · Minxin Chen, He Zhu, Junyou Su, Wen Wang, Yijie Deng, Wenjia Zhang · 12d
MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA
arXiv cs.CL · Kaifeng Chen, Hongtao Liu, Qiyao Peng, Jian Yang, Yongqiang Liu, Xiaochen Zhang, Qing Yang · 12d
CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement
arXiv cs.CL · Hong Qian, Yuanhao Liu, Zihan Zhou, Zongbao Zhang, Hanjie Ge, Haotian Shi, Liang Dou, Xiangfeng Wang, Jingwen Yang, Aimin Zhou · 12d
Can LLMs Be Constrained to the Past? Improving Knowledge Cutoff through Recall-Based Prompting
arXiv cs.CL · Michiro Asai, Ailiang Lin, Yu Kishimoto, Takao Obi, Satoshi Kosugi, Kotaro Funakoshi, Manabu Okumura · 12d
ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL
arXiv cs.CL · Zhaorui Yang, Huawei Zheng, Sen Yang, Yuhui Zhang, Haoxuan Li, Zhizhen Yu, Xuan Yi, Chen Hou, Defeng Xie, Chao Hu, Minfeng Zhu, Dazhen Deng, Haozhe Feng, Danqing Huang, Yingcai Wu, Peng Chen, Wei Chen · 12d
Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads
arXiv cs.CL · Ruoxi Sun, Quantong Qiu, Juntao Li, Zecheng Tang, Yihang Lou, Min Zhang · 12d
Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs
arXiv cs.CL · Gio Paik, Hyunseo Shin, Soungmin Lee · 12d