←── back to feed
/topics/arxiv-cs-lg-papers-june-4-2026
arXiv cs.LG papers June 4 2026
49 items●1 sources●updated 13d ago●trend 0
On June 4, 2026, arXiv's cs.LG category published 20 papers spanning reinforcement learning, transformer optimization, LLM quantization and compression, time-series analysis, and physics-informed machine learning. Key contributions include frameworks for continual RL deployment, novel quantization methods with continuous bit-width control, transformer architecture variants, and techniques for efficient LLM inference on edge devices.
- IEEE P3109 standard defines parameterized binary floating-point formats optimized for machine learning with variable width, precision, and infinity handling
- SDPG combines self-distillation with policy gradients using full-vocabulary on-policy supervision for sparse-reward reinforcement learning
- LiftQuant enables continuous bit-width LLM quantization via lift-then-project mechanism, bridging deployment gap beyond rigid integer bit-widths
- RUBAS introduces rubric-based RL for agent safety, decomposing behavior into tool-use, argument, response safety, and helpfulness dimensions
- Muon optimizer with Newton-Schulz orthonormalization adopted by state-of-the-art open-source LLMs; spectral scaling laws analyzed for momentum matrices
[BLG]blog/rss49
Novel Aspects of IEEE SA P3109 Arithmetic Formats for Machine Learning
Position: Deployed Reinforcement Learning should be Continual
Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent
Do Transformers Need Three Projections? Systematic Study of QKV Variants
Inverse Critical Experiment Design via Gradient Optimization and a Multigroup Attention-Based Neural Network Architecture
Self-Distilled Policy Gradient
Bayes-Sufficient Representations in Supervised Learning
Unlocking Feature Learning in Gated Delta Networks at Scale
LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection
RUBAS: Rubric-Based Reinforcement Learning for Agent Safety
A Goal-Set Characterization of Task Composition in the Boolean Task Algebra
Spectral Scaling Laws of Muon
LLM Compression with Jointly Optimizing Architectural and Quantization choices
TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection
Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting
Large Language Models Hack Rewards, and Society
Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials
Building The Ph(ysical)AI Layer Of Machine Intelligence
Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
Stationarity-Aware Retrieval-Augmented Time Series Forecasting
Physics-Informed Machine Learning for Short-Term Flood Prediction
EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction
ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models
Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning
When Autoregressive Consistency Hurts Safety Alignment
Low-rank Distributional Matrix Completion
KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models
Exact Unlearning in Reinforcement Learning
Dual Advantage Fields
Metric-Aware Hybrid Forecasting for the CTF4Science Lorenz Challenge
Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval
A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support
Edge of Stability Selectively Shapes Learning Across the Data Distribution
Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data
RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
Derivative Informed Learning of Exchange-Correlation Functionals
The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning
Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling
Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models
PE-MHL: Physics-Encoded Modular Hybrid Layers for Scalable Learning of Complex Systems
Offline-to-Online Learning in Linear Bandits
Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian Models
Latent Anchor-Driven Test Generation for Deep Neural Networks
Testing Neural Networks via Bayesian-Guided Exploration of Decision Landscapes
OpenRFM: Dissecting Relational In-Context Learning
Neural Galerkin Normalizing Flows for Bayesian Inference of Diffusions with Inaccessible Boundaries