←── back to feed
/topics/arxiv-cs-cl-papers-june-11-2026
arXiv cs.CL papers June 11 2026
50 items●1 sources●updated 6d ago●trend 0
On June 11, 2026, 20 new papers appeared on arXiv's cs.CL track covering diverse topics in language models and NLP: quality evaluation frameworks for decentralized inference, retrieval-augmented generation improvements, jailbreak detection across languages, structured sequence generation, safety data extraction benchmarks, fine-tuning methods, biomedical reasoning, multimodal reasoning with process rewards, and multilingual safety evaluation.
[BLG]blog/rss50
PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM Inference
The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content
NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track
Detecting AI-Generated Content on Social Media with Multi-modal Language Models
One Jailbreak, Many Tongues: Learning Language-Insensitive Intention Representations for Multilingual Jailbreak Detection
LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis
Benchmarking Large Language Models for Safety Data Extraction
Compatibility-Aware Dynamic Fine-Tuning for Large Language Models
BioDivergence: A Benchmark and Evaluation Framework for Hidden Contextual Contradictions in Biomedical Abstracts
ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward
T2MM: An LLM Supported Architecture For Inquiry-Based Modeling
Calibration Drift Under Reasoning: How Chain-of-Thought Budgets Induce Overconfidence in Large Language Models
EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA
Beyond Compaction: Structured Context Eviction for Long-Horizon Agents
Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents
LifeSentence: Language models can encode human life course trajectories from longitudinal panel data
A Geometric Profile of Semantic Information in Text: Frame-Conditional Uniqueness and a Trade-Off Triangle for Scalar Summaries
Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs
Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite
Sch\"utzen: Evaluating LLM Safety in Bulgarian and German Contexts
When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval
The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales
When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis
Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering
Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining
Scenario-based Probing and Steering Cultural Values in Large Language Models--Extended Version
Context-Aware Multimodal Claim Verification in Spoken Dialogues
SOMA-SQL: Resolving Multi-Source Ambiguity in NL-to-SQL via Synthetic Log and Execution Probing
Agent Skill Evaluation and Evolution: Frameworks and Benchmarks
AI Coding Agents Can Reproduce Social Science Findings
AI Coding Agents in Social Science: Methodologically Diverse, Empirically Consistent, Interpretively Vulnerable
APEX: Automated Prompt Engineering eXpert with Dynamic Data Selection
The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes
Hubs or Fringes: Pretraining Data Selection via Web Graph Centrality
When Roleplaying, Do Models Believe What They Say?
SAGE: Answer-Conditioned Uncertainty Targets for Verbal Uncertainty Alignment
ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories
Measuring language complexity from hierarchical reuse of recurring patterns
Pretrained self-supervised speech models can recognize unseen consonants
Teaching Diffusion to Speculate Left-to-Right
When is Your LLM Steerable?
Multi-Agent Reasoning with Adaptive Worker Allocation for Stance Detection
Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models
Improving Cross-Format Robustness in Language Models with Multi-Format Training
Can AI Reason Like an Urban Planner? Benchmarking Large Language Models Against Professional Judgment
UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness
Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
Substrate Asymmetry in User-Side Memory: A Diagnostic Framework
Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild