←── back to feed
/topics/arxiv-cs-cl-papers-may-29-2026
arXiv cs.CL papers May 29 2026
50 items●1 sources●updated 17d ago●trend 0
On May 29, 2026, 20 papers were posted to arXiv's cs.CL section covering advances in LLM interpretability, specialized language models, retrieval-augmented generation, safety evaluation, and agent consistency. Topics range from mechanistic interpretability frameworks and multimodal defect detection to Arabic-specialized sub-1B models, function-calling data generation, and streaming bias evaluation protocols.
- MechELK framework uses mechanistic interpretability to extract latent knowledge from LLMs beyond surface outputs
- RightNow-Arabic-0.5B-Turbo: 518M-parameter Arabic-specialized model built via vocabulary injection on Qwen2.5-0.5B
- Benchmarking study evaluated 14 open-source safety guard models on 79,331 samples across 8 NIST AI Risk Framework categories
- GenesisFunc automated pipeline generates function-calling training data with improved tool scalability and quality control
- GPF-LiveNews streaming protocol audits group-conditioned framing across 42 identity labels on emerging news events
[BLG]blog/rss50
Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment
What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs
A Modular Architecture for Typologically Controlled Lexicon Generation
MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models
From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale
RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation
S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering
A comparative study of transformer-based embeddings for topic coherence
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning
GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand
SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation
Specialty-Specific Medical Language Model for Immune-Mediated Diseases
How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines
Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning
GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models
Large language models reorganize representational geometry during in-context learning
From Data to Insights: Exploring Program-of-Thoughts Prompting for Chart Summarization
GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human
Hallucination Detection-Guided Preference Optimization for Clinical Summarization
Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models
The Trust Paradox: How CS Researchers Engage LLM Leaderboards
Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization
Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction
Error as a Lens: Probing LLM Reasoning through Synthetic Misconception Generation
LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English
Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies
Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception
Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text
Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG
SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation
UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning
Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents
Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources
DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents
Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits
Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization
Prompt-Level Reward Specifications for Open-Ended Post-Training
Accommodation Goes Both Ways: Studying Linguistic Convergence Between Humans and Language Models
MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs
GrepSeek: Training Search Agents for Direct Corpus Interaction
PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration
FoRA: Fisher-orthogonal Rank Adaptation for Parameter-Efficient Fine-Tuning
Rethinking Stepwise Model Routing: A Cost-Efficient Table Reasoning Perspective