←── back to feed

/topics/arxiv-cs-cl-papers-may-29-2026

arXiv cs.CL papers May 29 2026

50 items●1 sources●updated 17d ago●trend 0

┌─ summary ─────────────────────────────┐

On May 29, 2026, 20 papers were posted to arXiv's cs.CL section covering advances in LLM interpretability, specialized language models, retrieval-augmented generation, safety evaluation, and agent consistency. Topics range from mechanistic interpretability frameworks and multimodal defect detection to Arabic-specialized sub-1B models, function-calling data generation, and streaming bias evaluation protocols.

┌─ key points ──────────────────────────┐

MechELK framework uses mechanistic interpretability to extract latent knowledge from LLMs beyond surface outputs
RightNow-Arabic-0.5B-Turbo: 518M-parameter Arabic-specialized model built via vocabulary injection on Qwen2.5-0.5B
Benchmarking study evaluated 14 open-source safety guard models on 79,331 samples across 8 NIST AI Risk Framework categories
GenesisFunc automated pipeline generates function-calling training data with improved tool scalability and quality control
GPF-LiveNews streaming protocol audits group-conditioned framing across 42 identity labels on emerging news events

┌─ items (50) ──────────────────────────┐

[BLG]blog/rss50

Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

arXiv cs.CL · Tao Wang, Lipeng Zhu, Jiayong Li, Feng Gao, Siwen Liang · 19d

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

arXiv cs.CL · Mohamed Abdelwahab, Michelle Yu Collins, Sihan Chen, Yi Cheng Zhao, Zafarullah Mahmood, Jiading Zhu, Soliman Ali, Jonathan Rose · 19d

A Modular Architecture for Typologically Controlled Lexicon Generation

arXiv cs.CL · Sankalp Tattwadarshi Swain, Dhruv Kumar · 19d

MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models

arXiv cs.CL · Ji-jun Park, Soo-joon Choi, Jiwon Jeong, Taeyang Yoon, Ju-Wan Lee · 19d

From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale

arXiv cs.CL · Rohan Mahapatra · 19d

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

arXiv cs.CL · Jaber Jaber, Osama Jaber · 19d

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

arXiv cs.CL · Yujie Feng, Jian Li, Zhihan Zhou, Pengfei Xu, Yujia Zhang, Xiaoyu Li, Xiaohui Zhou, Alan Zhao, Xi Chen, Xiao-Ming Wu · 19d

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

arXiv cs.CL · Ritvik Rastogi, Vishal Singh, Tejas Chaudhari, Sandeep Varma · 19d

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

arXiv cs.CL · Reetu Raj Harsh, Bhaskarjit Sarmah, Stefano Pasquali · 19d

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

arXiv cs.CL · Encheng Su, Jinouwen Zhang, Jianyu Wu, Qiucheng Yu, Chen Tang, Pengze Li, Lintao Wang, Yizhou Wang, Xinzhu Ma, Shixiang Tang, Aoran Wang · 19d

A comparative study of transformer-based embeddings for topic coherence

arXiv cs.CL · Alex Ding, Tarun Rapaka, Willy Rodriguez, Jason Yang · 19d

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

arXiv cs.CL · Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik · 19d

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

arXiv cs.CL · Gus Lathouwers, Wieke Harmsen, Catia Cucchiarini, Helmer Strik · 19d

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

arXiv cs.CL · Hao-Xiang Xu, Chong Deng, Jiaqing Liu, Wen Wang, Qian Chen, Lujia Bao, Xiangang Li, Zhen-Hua Ling · 19d

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

arXiv cs.CL · Jimin Jung, MyoungJin Kim, Jaehyung Seo, Heuiseok Lim · 19d

SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation

arXiv cs.CL · Gyumin Kim, Juhwan Park, Jaeha Kim, Seunggyun Han, Kyungrak Son, Ikbeom Jang · 19d

Specialty-Specific Medical Language Model for Immune-Mediated Diseases

arXiv cs.CL · Veysel Kocaman, Gursev Pirge, Yigit Gul, Ace Vo, Zhenya Nargizyan, David Talby · 19d

How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

arXiv cs.CL · Abel Yagubyan · 19d

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

arXiv cs.CL · Dong Liu, Yanxuan Yu, Ying Nian Wu · 19d

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

arXiv cs.CL · Mohd Ariful Haque, Fahad Rahman, Kishor Datta Gupta, Roy George · 19d

Large language models reorganize representational geometry during in-context learning

arXiv cs.CL · Hua-Dong Xiong, Li Ji-An, Robert C. Wilson, Kwonjoon Lee, Xue-Xin Wei · 19d

From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization

arXiv cs.CL · Yutong Qu, Wei Zhang · 19d

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

arXiv cs.CL · Yihang Lin, Yunze Gao, Zeyang Lin, Dongbo Li, Kun Peng, Chenglong Song, Yue Liu · 19d

Hallucination Detection-Guided Preference Optimization for Clinical Summarization

arXiv cs.CL · Shamanth Kuthpadi Seethakantha, Dung Ngoc Thai, Vara Prasad Gudi, Simran Tiwari, Rami Matar, Avijit Mitra, Wenlong Zhao, Wael Salloum, Andrew McCallum · 19d

Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

arXiv cs.CL · Xinyuan Cheng, Beiduo Chen, Philipp Mondorf, Barbara Plank · 19d

The Trust Paradox: How CS Researchers Engage LLM Leaderboards

arXiv cs.CL · Pouya Sadeghi, Anamaria Crisan, Jimmy Lin · 19d

Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

arXiv cs.CL · Aarik Gulaya · 19d

Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction

arXiv cs.CL · Yuchun Zou, Junhong Tong, Jun Li · 19d

Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation

arXiv cs.CL · Xinming Yang, Jun Li · 19d

LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English

arXiv cs.CL · Lauren Levine, Amir Zeldes · 19d

Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies

arXiv cs.CL · Abhilekh Borah · 19d

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

arXiv cs.CL · Neemias da Silva, Myriam Delgado, Rodrigo Minetto, Daniel Silver, Thiago H Silva · 19d

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

arXiv cs.CL · Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu · 19d

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

arXiv cs.CL · Yubo Li, Rema Padman, Ramayya Krishnan · 19d

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

arXiv cs.CL · Xinyu Wang, Hanwei Wu, Zhenghan Tai, Sicheng Lyu, Qincheng Lu, Ziyu Zhao, Jijun Chi, Jingrui Tian, Xiao-Wen Chang, Ziyang Song · 19d

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

arXiv cs.CL · Volodymyr Ovcharov · 19d

Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches

arXiv cs.CL · Ting Gong, Shangquan Sun · 19d

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents

arXiv cs.CL · Aditya Nawal, Manit Baser, Mohan Gurusamy · 19d

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

arXiv cs.CL · Laerdon Kim, Vivian Nguyen, Cristian Danescu-Niculescu-Mizil · 19d

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

arXiv cs.CL · Jinheon Baek, Soyeong Jeong, Sangwoo Park, Woongyeong Yeo, Minki Kang, Patara Trirat, Heejun Lee, Sung Ju Hwang · 19d

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

arXiv cs.CL · Rongsheng Zhang, Jiji Tang, Junnan Ren, Zuyi Bao, Weijie Chen, Ruofan Hu, Zhou Zhao, Tangjie Lv, Yan Zhang · 19d

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

arXiv cs.CL · Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang, Haozheng Luo, Tianfan Fu, Aarthy Nagarajan · 19d

Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization

arXiv cs.CL · Yun Wang, Xin Xia, Xuansheng Wu, Xiaoming Zhai, Ninghao Liu · 19d

Prompt-Level Reward Specifications for Open-Ended Post-Training

arXiv cs.CL · Zijun Weng, Xiaohui Hu, Shuangyong Song, Yongxiang Li, Kaidong Yu, Xuanjing Huang · 19d

Accommodation Goes Both Ways: Studying Linguistic Convergence Between Humans and Language Models

arXiv cs.CL · Terra Blevins · 19d

MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

arXiv cs.CL · Daeyong Kwon, Qiyu Wu, Shinobu Kuriya, Junghyun Koo, Shuyang Cui, Zhi Zhong, Wei-Hsiang Liao, Hiromi Wakaki, Yuki Mitsufuji · 19d

GrepSeek: Training Search Agents for Direct Corpus Interaction

arXiv cs.CL · Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung, Razieh Rahimi, Fernando Diaz, Hamed Zamani · 19d

PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration

arXiv cs.CL · Shuyu Zhang, Yaqi Shi, Lu Wang · 19d

FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning

arXiv cs.CL · Juneyoung Park, Seongbae Lee, Han-Sang Lee, Kyuho Lee, Minjae Kim, Seungheon Hyeon, Kiduk Kwon, Seongwan Kim, Jaeho Lee · 19d

Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective

arXiv cs.CL · Shenghao Ye, Yuxiang Wang, Yu Guo, Dong Jin, Shuangwu Chen, Jian Yang · 19d