←── back to feed
/topics/arxiv-cs-cl-papers-june-8-2026
arXiv cs.CL papers June 8 2026
50 items●1 sources●updated 9d ago●trend 0
On June 8, 2026, arXiv's cs.CL section published 20 papers spanning multilingual factual consistency, LLM personalization, reasoning failure diagnosis, retrieval-augmented generation, and cultural alignment. Topics ranged from cross-lingual QA datasets and web agent architectures to behavioral biometrics in prompts, evidence utilization diagnostics, and tone-aware health communication systems.
[BLG]blog/rss50
Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning
Re-Centering Humans in LLM Personalization
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs
How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures
CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures
The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment
What Do People Actually Want From AI? Mapping Preference Plurality
HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule
Signal-Driven Observation for Long-Horizon Web Agents
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles
Modular Monolingual Adaptation using Pretrained Language Models
When to Think Deeply: Inhibitory Deliberation for LLM Reasoning
Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection
PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs
A Four-Condition Diagnostic Protocol for Evidence Utilization in Long-Context and Retrieval-Augmented Language Models
When Better Codebooks Are Not Enough: Predictive Performance and Behavioral Reliability in LLM Political Event Coding
Explain Like I'm 5 or Whatever I Choose: Evaluating the Interactive Potential of Language Model Responses
TA-RAG: Tone-Aware Retrieval-Augmented Generation for Peer-Support Health Communication
Korean Culture into LLM Alignment: Toward Cultural Coherence
Quantifying Media Representation Dynamics Across 25 Years of News Reporting on Policing-related Deaths
Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards
The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models
Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces
CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification
Interpreting Brain Responses to Language with Sparse Features from Language Models
Are Large Language Models Suitable for Graph Computation? Progress and Prospects
An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection
EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering
ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning
Didact: A Cross-Domain Capability Discovery System for Defence
Auditing Training Data in Domain-adapted LLMs: LoRA-MINT
OpenHalDet: A Unified Benchmark for Hallucination Detection across Diverse Generation Scenarios
Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments
Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition
Principles of Concept Representation in Sentence Encoders
MADE: Beyond Scoring via a Multilingual Agentic Diagnosing Engine for Fine-Grained Evaluation Insights
Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling
TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents
Modeling semantic association in self-paced reading with language model embeddings
mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?
SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices
Style or Content? Evaluating Style Classifiers with Controlled Content Overlap
Learning Perspectivist Social Meaning via Demographic-Conditioned Fusion Embeddings
Explicit Evidence Grounding via Structured Inline Citation Generation
UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding
Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning
Adversarial Creation and Detection of AI-Generated Social Bot Content