←── back to feed
/topics/arxiv-cs-lg-papers-june-4-2026

arXiv cs.LG papers June 4 2026

49 items1 sourcesupdated 13d agotrend 0

On June 4, 2026, arXiv's cs.LG category published 20 papers spanning reinforcement learning, transformer optimization, LLM quantization and compression, time-series analysis, and physics-informed machine learning. Key contributions include frameworks for continual RL deployment, novel quantization methods with continuous bit-width control, transformer architecture variants, and techniques for efficient LLM inference on edge devices.

  • IEEE P3109 standard defines parameterized binary floating-point formats optimized for machine learning with variable width, precision, and infinity handling
  • SDPG combines self-distillation with policy gradients using full-vocabulary on-policy supervision for sparse-reward reinforcement learning
  • LiftQuant enables continuous bit-width LLM quantization via lift-then-project mechanism, bridging deployment gap beyond rigid integer bit-widths
  • RUBAS introduces rubric-based RL for agent safety, decomposing behavior into tool-use, argument, response safety, and helpfulness dimensions
  • Muon optimizer with Newton-Schulz orthonormalization adopted by state-of-the-art open-source LLMs; spectral scaling laws analyzed for momentum matrices
[BLG]blog/rss49
Novel Aspects of IEEE SA P3109 Arithmetic Formats for Machine Learning
arXiv cs.LG · Andrew Fitzgibbon, Christoph M. Wintersteiger, Jeffrey Sarnoff · 13d
Position: Deployed Reinforcement Learning should be Continual
arXiv cs.LG · Parnian Behdin, Kevin Roice, Golnaz Mesbahi · 13d
Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent
arXiv cs.LG · Ahanaf Hasan Ariq · 13d
Do Transformers Need Three Projections? Systematic Study of QKV Variants
arXiv cs.LG · Ali Kayyam, Anusha Madan Gopal, M Anthony Lewis · 13d
Inverse Critical Experiment Design via Gradient Optimization and a Multigroup Attention-Based Neural Network Architecture
arXiv cs.LG · Will Savage, Logan Burnett, Dean Price · 13d
Self-Distilled Policy Gradient
arXiv cs.LG · Yifeng Liu, Shiyuan Zhang, Yifan Zhang, Quanquan Gu · 13d
Bayes-Sufficient Representations in Supervised Learning
arXiv cs.LG · Vasileios Sevetlidis · 13d
Unlocking Feature Learning in Gated Delta Networks at Scale
arXiv cs.LG · Yifeng Liu, Quanquan Gu · 13d
LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection
arXiv cs.LG · Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du · 13d
RUBAS: Rubric-Based Reinforcement Learning for Agent Safety
arXiv cs.LG · Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang · 13d
A Goal-Set Characterization of Task Composition in the Boolean Task Algebra
arXiv cs.LG · Eduardo Terr\'es-Caballero, Herke van Hoof · 13d
Spectral Scaling Laws of Muon
arXiv cs.LG · Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar · 13d
LLM Compression with Jointly Optimizing Architectural and Quantization choices
arXiv cs.LG · Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha · 13d
TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection
arXiv cs.LG · Xiancheng Wang, Zhibo Zhang, Ran Li, Rui Wang, Minghang Zhao, Shisheng Zhong, Lin Wang · 13d
Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting
arXiv cs.LG · Federico Zucchi, Yi Xie, Chao Zhang, Keyuan Luo, Thomas Lampert, Ziyue Li · 13d
Large Language Models Hack Rewards, and Society
arXiv cs.LG · Wei Liu, Xinyi Mou, Hanqi Yan, Zhongyu Wei, Yulan He · 13d
Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials
arXiv cs.LG · Joanna Zou, Fraser Birks, Dallas Foster, Youssef Marzouk · 13d
Building The Ph(ysical)AI Layer Of Machine Intelligence
arXiv cs.LG · Ulbert Jose Botero, Liam Smith, Brooks Olney, Pooya Khorrami, Steven Kusiak, Watson Jia, Sage Trudeau, Daniel Capecci · 13d
Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification
arXiv cs.LG · Neeti Pokharna, Olivier Jeunen, Yatharth Saraf, Aleksei Ustimenko · 13d
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
arXiv cs.LG · Giuseppe Franco, Ian Colbert, Pablo Monteagudo-Lago, Felix Marty, Nicholas Fraser · 13d
Stationarity-Aware Retrieval-Augmented Time Series Forecasting
arXiv cs.LG · Shiqiao Zhou, Holger Sch\"oner, Zipeng Wu, Edouard Fouch\'e, IAG Wilson, Shuo Wang · 13d
Physics-Informed Machine Learning for Short-Term Flood Prediction
arXiv cs.LG · Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni · 13d
EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
arXiv cs.LG · Guilin Zhang, Chuanyi Sun, Shahryar Sarkani, John M. Fossaceca · 13d
When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction
arXiv cs.LG · Tyler Crosse, Alan Nadelsticher Ruvalcaba, Dustin Khang LeDuc, Thomas Trask, Nicholas Lytle, David Joyner · 13d
ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models
arXiv cs.LG · Sotirios Vavaroutas, Yu Yvonne Wu, Ali Etemad, Cecilia Mascolo · 13d
Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning
arXiv cs.LG · Dimitris Michailidis, Sennay Ghebreab, Fernando P. Santos · 13d
When Autoregressive Consistency Hurts Safety Alignment
arXiv cs.LG · Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu · 13d
Low-rank Distributional Matrix Completion
arXiv cs.LG · Jiayi Wang, Raymond K. W. Wong · 13d
KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models
arXiv cs.LG · Youqi Wu, Mohammad Jalali, Farzan Farnia · 13d
Exact Unlearning in Reinforcement Learning
arXiv cs.LG · Thanh Nguyen-Tang, Raman Arora · 13d
Dual Advantage Fields
arXiv cs.LG · Alexey Zemtsov, Maxim Bobrin, Alexander Nikulin, Dmitry V. Dylov, Fakhri Karray, Vladislav Kurenkov, Martin Tak\'a\v{c}, Arip Asadulaev · 13d
Metric-Aware Hybrid Forecasting for the CTF4Science Lorenz Challenge
arXiv cs.LG · Cen Lu · 13d
Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval
arXiv cs.LG · Christian Lysenst{\o}en · 13d
A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support
arXiv cs.LG · Ioanna Gemou, Matteo Gamba, Randall Balestriero, Ritambhara Singh · 13d
Edge of Stability Selectively Shapes Learning Across the Data Distribution
arXiv cs.LG · Shauna Kwag, Anakha Ganesh, Tomaso Poggio, Pierfrancesco Beneventano · 13d
Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data
arXiv cs.LG · Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao · 13d
RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training
arXiv cs.LG · Rachit Bansal, Clara Mohri, Tian Qin, David Alvarez-Melis, Sham Kakade · 13d
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
arXiv cs.LG · Saket Tiwari, Tejas Kotwal, George Konidaris · 13d
Derivative Informed Learning of Exchange-Correlation Functionals
arXiv cs.LG · Eike S. Eberhard, Luca A. Thiede, Abdul Aldossary, Andreas Burger, Nicholas Gao, Vignesh Bhethanabotla, Al\'an Aspuru-Guzik, Stephan G\"unnemann · 13d
The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning
arXiv cs.LG · Justinas Zaliaduonis, Patrick Putzky, Till Richter, Sergios Gatidis · 13d
Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling
arXiv cs.LG · Yifan Wang, Jinyi Mu, Mayank Jobanputra, Yu Wang, Ji-Ung Lee, Soyoung Oh, Isabel Valera, Vera Demberg · 13d
Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models
arXiv cs.LG · Alessio Barboni, Massimiliano Lupo Pasini, Bishal Lakha, Edoardo Serra · 13d
PE-MHL: Physics-Encoded Modular Hybrid Layers for Scalable Learning of Complex Systems
arXiv cs.LG · Ismail Hassaballa, Mircea Lazar · 13d
Offline-to-Online Learning in Linear Bandits
arXiv cs.LG · Kushagra Chandak, Toshinori Kitamura, Xiaoqi Tan · 13d
Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian Models
arXiv cs.LG · Jun Hu · 13d
Latent Anchor-Driven Test Generation for Deep Neural Networks
arXiv cs.LG · Bin Duan, Matthew B. Dwyer, Guowei Yang · 13d
Testing Neural Networks via Bayesian-Guided Exploration of Decision Landscapes
arXiv cs.LG · Bin Duan, Meiru Che, Guowei Yang · 13d
OpenRFM: Dissecting Relational In-Context Learning
arXiv cs.LG · Zhikai Chen, Junyu Yin, Jialiang Gu, Siheng Xiong, Xiaoze Liu, Ruowang Zhang, Keren Zhou, Kai Guo · 13d
Neural Galerkin Normalizing Flows for Bayesian Inference of Diffusions with Inaccessible Boundaries
arXiv cs.LG · Riccardo Saporiti, Fabio Nobile · 13d