←── back to feed
/topics/arxiv-cs-cl-papers-may-29-2026

arXiv cs.CL papers May 29 2026

50 items1 sourcesupdated 17d agotrend 0

On May 29, 2026, 20 papers were posted to arXiv's cs.CL section covering advances in LLM interpretability, specialized language models, retrieval-augmented generation, safety evaluation, and agent consistency. Topics range from mechanistic interpretability frameworks and multimodal defect detection to Arabic-specialized sub-1B models, function-calling data generation, and streaming bias evaluation protocols.

  • MechELK framework uses mechanistic interpretability to extract latent knowledge from LLMs beyond surface outputs
  • RightNow-Arabic-0.5B-Turbo: 518M-parameter Arabic-specialized model built via vocabulary injection on Qwen2.5-0.5B
  • Benchmarking study evaluated 14 open-source safety guard models on 79,331 samples across 8 NIST AI Risk Framework categories
  • GenesisFunc automated pipeline generates function-calling training data with improved tool scalability and quality control
  • GPF-LiveNews streaming protocol audits group-conditioned framing across 42 identity labels on emerging news events
[BLG]blog/rss50
Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment
arXiv cs.CL · Tao Wang, Lipeng Zhu, Jiayong Li, Feng Gao, Siwen Liang · 19d
What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs
arXiv cs.CL · Mohamed Abdelwahab, Michelle Yu Collins, Sihan Chen, Yi Cheng Zhao, Zafarullah Mahmood, Jiading Zhu, Soliman Ali, Jonathan Rose · 19d
A Modular Architecture for Typologically Controlled Lexicon Generation
arXiv cs.CL · Sankalp Tattwadarshi Swain, Dhruv Kumar · 19d
MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models
arXiv cs.CL · Ji-jun Park, Soo-joon Choi, Jiwon Jeong, Taeyang Yoon, Ju-Wan Lee · 19d
From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale
arXiv cs.CL · Rohan Mahapatra · 19d
RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment
arXiv cs.CL · Jaber Jaber, Osama Jaber · 19d
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
arXiv cs.CL · Yujie Feng, Jian Li, Zhihan Zhou, Pengfei Xu, Yujia Zhang, Xiaoyu Li, Xiaohui Zhou, Alan Zhao, Xi Chen, Xiao-Ming Wu · 19d
Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
arXiv cs.CL · Ritvik Rastogi, Vishal Singh, Tejas Chaudhari, Sandeep Varma · 19d
Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation
arXiv cs.CL · Reetu Raj Harsh, Bhaskarjit Sarmah, Stefano Pasquali · 19d
S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering
arXiv cs.CL · Encheng Su, Jinouwen Zhang, Jianyu Wu, Qiucheng Yu, Chen Tang, Pengze Li, Lintao Wang, Yizhou Wang, Xinzhu Ma, Shixiang Tang, Aoran Wang · 19d
A comparative study of transformer-based embeddings for topic coherence
arXiv cs.CL · Alex Ding, Tarun Rapaka, Willy Rodriguez, Jason Yang · 19d
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
arXiv cs.CL · Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik · 19d
Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning
arXiv cs.CL · Gus Lathouwers, Wieke Harmsen, Catia Cucchiarini, Helmer Strik · 19d
GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
arXiv cs.CL · Hao-Xiang Xu, Chong Deng, Jiaqing Liu, Wen Wang, Qian Chen, Lujia Bao, Xiangang Li, Zhen-Hua Ling · 19d
No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand
arXiv cs.CL · Jimin Jung, MyoungJin Kim, Jaehyung Seo, Heuiseok Lim · 19d
SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation
arXiv cs.CL · Gyumin Kim, Juhwan Park, Jaeha Kim, Seunggyun Han, Kyungrak Son, Ikbeom Jang · 19d
Specialty-Specific Medical Language Model for Immune-Mediated Diseases
arXiv cs.CL · Veysel Kocaman, Gursev Pirge, Yigit Gul, Ace Vo, Zhenya Nargizyan, David Talby · 19d
How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines
arXiv cs.CL · Abel Yagubyan · 19d
Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning
arXiv cs.CL · Dong Liu, Yanxuan Yu, Ying Nian Wu · 19d
GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models
arXiv cs.CL · Mohd Ariful Haque, Fahad Rahman, Kishor Datta Gupta, Roy George · 19d
Large language models reorganize representational geometry during in-context learning
arXiv cs.CL · Hua-Dong Xiong, Li Ji-An, Robert C. Wilson, Kwonjoon Lee, Xue-Xin Wei · 19d
From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization
arXiv cs.CL · Yutong Qu, Wei Zhang · 19d
GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human
arXiv cs.CL · Yihang Lin, Yunze Gao, Zeyang Lin, Dongbo Li, Kun Peng, Chenglong Song, Yue Liu · 19d
Hallucination Detection-Guided Preference Optimization for Clinical Summarization
arXiv cs.CL · Shamanth Kuthpadi Seethakantha, Dung Ngoc Thai, Vara Prasad Gudi, Simran Tiwari, Rami Matar, Avijit Mitra, Wenlong Zhao, Wael Salloum, Andrew McCallum · 19d
Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models
arXiv cs.CL · Xinyuan Cheng, Beiduo Chen, Philipp Mondorf, Barbara Plank · 19d
The Trust Paradox: How CS Researchers Engage LLM Leaderboards
arXiv cs.CL · Pouya Sadeghi, Anamaria Crisan, Jimmy Lin · 19d
Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization
arXiv cs.CL · Aarik Gulaya · 19d
Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction
arXiv cs.CL · Yuchun Zou, Junhong Tong, Jun Li · 19d
Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation
arXiv cs.CL · Xinming Yang, Jun Li · 19d
LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English
arXiv cs.CL · Lauren Levine, Amir Zeldes · 19d
Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies
arXiv cs.CL · Abhilekh Borah · 19d
Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception
arXiv cs.CL · Neemias da Silva, Myriam Delgado, Rodrigo Minetto, Daniel Silver, Thiago H Silva · 19d
Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text
arXiv cs.CL · Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu · 19d
Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG
arXiv cs.CL · Yubo Li, Rema Padman, Ramayya Krishnan · 19d
SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation
arXiv cs.CL · Xinyu Wang, Hanwei Wu, Zhenghan Tai, Sicheng Lyu, Qincheng Lu, Ziyu Zhao, Jijun Chi, Jingrui Tian, Xiao-Wen Chang, Ziyang Song · 19d
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
arXiv cs.CL · Volodymyr Ovcharov · 19d
Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches
arXiv cs.CL · Ting Gong, Shangquan Sun · 19d
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents
arXiv cs.CL · Aditya Nawal, Manit Baser, Mohan Gurusamy · 19d
Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment
arXiv cs.CL · Laerdon Kim, Vivian Nguyen, Cristian Danescu-Niculescu-Mizil · 19d
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources
arXiv cs.CL · Jinheon Baek, Soyeong Jeong, Sangwoo Park, Woongyeong Yeo, Minki Kang, Patara Trirat, Heejun Lee, Sung Ju Hwang · 19d
DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents
arXiv cs.CL · Rongsheng Zhang, Jiji Tang, Junnan Ren, Zuyi Bao, Weijie Chen, Ruofan Hu, Zhou Zhao, Tangjie Lv, Yan Zhang · 19d
Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits
arXiv cs.CL · Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang, Haozheng Luo, Tianfan Fu, Aarthy Nagarajan · 19d
Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization
arXiv cs.CL · Yun Wang, Xin Xia, Xuansheng Wu, Xiaoming Zhai, Ninghao Liu · 19d
Prompt-Level Reward Specifications for Open-Ended Post-Training
arXiv cs.CL · Zijun Weng, Xiaohui Hu, Shuangyong Song, Yongxiang Li, Kaidong Yu, Xuanjing Huang · 19d
Accommodation Goes Both Ways: Studying Linguistic Convergence Between Humans and Language Models
arXiv cs.CL · Terra Blevins · 19d
MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs
arXiv cs.CL · Daeyong Kwon, Qiyu Wu, Shinobu Kuriya, Junghyun Koo, Shuyang Cui, Zhi Zhong, Wei-Hsiang Liao, Hiromi Wakaki, Yuki Mitsufuji · 19d
GrepSeek: Training Search Agents for Direct Corpus Interaction
arXiv cs.CL · Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung, Razieh Rahimi, Fernando Diaz, Hamed Zamani · 19d
PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration
arXiv cs.CL · Shuyu Zhang, Yaqi Shi, Lu Wang · 19d
FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning
arXiv cs.CL · Juneyoung Park, Seongbae Lee, Han-Sang Lee, Kyuho Lee, Minjae Kim, Seungheon Hyeon, Kiduk Kwon, Seongwan Kim, Jaeho Lee · 19d
Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective
arXiv cs.CL · Shenghao Ye, Yuxiang Wang, Yu Guo, Dong Jin, Shuangwu Chen, Jian Yang · 19d