Qiang HE

Bochum, Germany

I’m currently a ~~first~~ second-year PhD student at Ruhr-University Bochum, having moved here in November 2023 with my supervisor Setareh Maghsudi from Tuebingen. I am co-supervised by Prof. Tianyi Zhou at University of Maryland. I eared my Master’s degree in Theory and Method of Artificial Intelligence from Institute of Automation, Chinese Academy of Sciences. I work closely with Prof. Meng Fang at University of Liverpool.

For masters student in RUB: We could provide master’s thesis topic, which focuses on (deep) reinforcement learning or deep RL for large language models. Please contact Setareh and me if you are interested in our work.

I am actively looking for research collaborations! For master/undergrad students looking for research experience or PhD students looking for collaborations, feel free to drop me an email.

Research Interests: Reinforcement Learning, Human-AI Alignment, Large Language Models

I'm broadly interested in reinforcement learning, large language models, and machine learning. Currently, my research aims to i) understand the structural information of deep RL & LLMs and how to leverage it to improve agent performance in the wild (e.g., dealing with biased, noisy, or redundant data, or extrapolating to unseen tasks/environments), ii) develop controllable AI in both training and inference/adaptation; and iii) theory and real-world application of Human-AI alignment. And Yes we are developing these methods for RL and LLMs.

My working title of PhD thesis is "Towards Trustworthy Reinforcement Learning: Structural Analysis, Control Mechanisms, and Human-AI Alignment".

Our research is built upon empirical and theoretical analysis of the learning dynamics, utilizing tools from stochastic processes, functional analysis, algebra, optimization, information theory, and large language models. Our goal is to develop efficient, stable, trustworthy agents based on coevolution between humans and agents.

Contact information

Email: qianghe97 AT gmail DOT com, Qiang DOT He AT ruhr-uni-bochum DOT de. Since I have left Tuebingen, my Tuebingen e-mail is not available. Please contact me via Gmail or Bochum mail.
WeChat: pposac

Professional Service

Reviewer for NeurIPS, DMLR, ICPR

news

May 6, 2024	I attent the ICLR’24. Feel free to chat with me! Check Poster 1 and Poster 2.
May 3, 2024	I attent the AISTATS’24. Feel free to chat with me!
May 2, 2024	Our paper “Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment” is accepted by ICML 2024! We introduce Shūkai, a game agent trained in the game Naruto Mobile. This work is the first example of a deep RL agent deployed in a commercial fighting game, and has been deployed for a year.
Jan 17, 2024	2 papers accepted to ICLR 2024 and one of them is spotlight. Thank my supervisor and collaborators for their help!
Nov 2, 2023	I officially move to Ruhr-University Bochum with my supervisor.

selected publications

ICML’24
Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Chen Zhang , Qiang He , Yuan Zhou , and 4 more authors

In Forty-first International Conference on Machine Learning , 2024

Abs Bib

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent’s behavior with human expectations. Shūkai’s ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 15% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.
@inproceedings{ICML24, title = {Advancing {DRL} Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment}, author = {Zhang, Chen and He, Qiang and Zhou, Yuan and Liu, Elvis S. and Wang, Hong and Zhao, Jian and Wang, Yang}, booktitle = {Forty-first International Conference on Machine Learning}, year = {2024}, url = {https://openreview.net/forum?id=eN1T7I7OpZ}, }
Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

Qiang He , Tianyi Zhou , Meng Fang , and 1 more author

Twelfth International Conference on Learning Representations, 2024

Abs Bib PDF Code

Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent’s performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our anonymous code is available at https://anonymous.4open.science/r/BEER-3C4B.
@article{ICLR2024-BEER, author = {He, Qiang and Zhou, Tianyi and Fang, Meng and Maghsudi, Setareh}, title = {Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation}, journal = {Twelfth International Conference on Learning Representations}, year = {2024}, }
Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

Yucheng Yang , Tianyi Zhou , Qiang He , and 3 more authors

Spotlight, Twelfth International Conference on Learning Representations, 2024

Spotlight

Abs Bib PDF

Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task’s policy. Our new theoretical analysis shows that the diversity and separatability of learned skills are fundamentally critical to downstream task adaptation but MISL does not necessarily guarantee them. To improve MISL, we propose a novel disentanglement metric LSEPIN and build an information-geometric connection between LSEPIN and downstream task adaptation cost. For better geometric properties, we investigate a new strategy that replaces the KL divergence in information geometry with Wasserstein distance. We extend the geometric analysis to it, which leads to a novel skill-learning objective WSEP. It is theoretically justified to be helpful to task adaptation and it is capable of discovering more initial policies for downstream tasks than MISL. We further propose a Wasserstein distance-based algorithm PWSEP can theoretically discover all potentially optimal initial policies.
@article{ICLR2024-spotlight, author = {Yang, Yucheng and Zhou, Tianyi and He, Qiang and Han, Lei and Pechenizkiy, Mykola and Fang, Meng}, title = {Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning}, journal = {Spotlight, Twelfth International Conference on Learning Representations}, year = {2024}, highlight = {Spotlight}, }
Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control

Chao Li , Chen Gong , Qiang He , and 1 more author

Thirty-seventh Conference on Neural Information Processing Systems, 2023

Abs Bib PDF

The combination of deep reinforcement learning (DRL) with ensemble methods has been proved to be highly effective in addressing complex sequential decision-making problems. This success can be primarily attributed to the utilization of multiple models, which enhances both the robustness of the policy and the accuracy of value function estimation. However, there has been limited analysis of the empirical success of current ensemble RL methods thus far. Our new analysis reveals that the sample efficiency of previous ensemble DRL algorithms may be limited by sub-policies that are not as diverse as they could be. Motivated by these findings, our study introduces a new ensemble RL algorithm, termed \textbfTrajectories-awar\textbfE \textbfEnsemble exploratio\textbfN (TEEN). The primary goal of TEEN is to maximize the expected return while promoting more diverse trajectories. Through extensive experiments, we demonstrate that TEEN not only enhances the sample diversity of the ensemble policy compared to using sub-policies alone but also improves the performance over ensemble RL algorithms. On average, TEEN outperforms the baseline ensemble DRL algorithms by 41% in performance on the tested representative environments.
@article{NIPS23, author = {Li, Chao and Gong, Chen and He, Qiang and Hou, Xinwen}, title = {Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control}, journal = {Thirty-seventh Conference on Neural Information Processing Systems}, year = {2023}, }
ECML’23
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Qiang He , Meng Fang , Tianyi Zhou , and 1 more author

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Abs Bib PDF Code

We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at this https URL.
@article{ecml23, author = {He, Qiang and Fang, Meng and Zhou, Tianyi and Maghsudi, Setareh}, title = {Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning}, journal = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases}, volume = {abs/2205.14557}, year = {2023}, url = {https://arxiv.org/abs/2306.16750}, doi = {10.48550/arXiv.2205.14557}, eprinttype = {arXiv}, eprint = {2205.14557}, }
CVPR’23
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

Qiang He , Huangyuan Su , Jieyu Zhang , and 1 more author

The Thirty-Fourth IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Abs Bib PDF Code

Deep reinforcement learning (DRL) gives the promise that an agent learns good policy from high-dimensional information, whereas representation learning removes irrelevant and redundant information and retains pertinent information. In this work, we demonstrate that the learned representation of the Q-network and its target Q-network should, in theory, satisfy a favorable distinguishable representation property. Specifically, there exists an upper bound on the representation similarity of the value functions of two adjacent time steps in a typical DRL setting. However, through illustrative experiments, we show that the learned DRL agent may violate this property and lead to a sub-optimal policy. Therefore, we propose a simple yet effective regularizer called Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations. And we provide the convergence rate guarantee of PEER. Implementing PEER requires only one line of code. Our experiments demonstrate that incorporating PEER into DRL can significantly improve performance and sample efficiency. Comprehensive experiments show that PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of our knowledge, PEER is the first work to study the inherent representation property of Q-network and its target. Our code is available at this https URL.
@article{cvpr2023, author = {He, Qiang and Su, Huangyuan and Zhang, Jieyu and Hou, Xinwen}, title = {Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning}, journal = {The Thirty-Fourth IEEE/CVF Conference on Computer Vision and Pattern Recognition}, volume = {abs/2205.14557}, year = {2023}, url = {https://openaccess.thecvf.com/content/CVPR2023/papers/He_Frustratingly_Easy_Regularization_on_Representation_Can_Boost_Deep_Reinforcement_Learning_CVPR_2023_paper.pdf}, doi = {10.48550/arXiv.2205.14557}, eprinttype = {arXiv}, eprint = {2205.14557}, }