zikele

zikele

人生如此自可乐

超越ReLU:切比雪夫-DQN以增强深度Q网络

2508.14536v1

中文标题#

超越 ReLU:切比雪夫 - DQN 以增强深度 Q 网络

英文标题#

Beyond ReLU: Chebyshev-DQN for Enhanced Deep Q-Networks

中文摘要#

深度 Q 网络(DQN)的性能在很大程度上依赖于其底层神经网络准确近似动作价值函数的能力。 标准的函数逼近器,如多层感知机,可能难以高效地表示许多强化学习问题中固有的复杂价值景观。 本文介绍了一种新架构,即切比雪夫 - DQN(Ch-DQN),它将切比雪夫多项式基整合到 DQN 框架中,以创建更有效的特征表示。 通过利用切比雪夫多项式的强大函数逼近特性,我们假设 Ch-DQN 可以更高效地学习并实现更高的性能。 我们在 CartPole-v1 基准测试中评估了我们提出的模型,并将其与参数数量相当的标准 DQN 进行比较。 我们的结果表明,具有适度多项式次数(N=4)的 Ch-DQN 实现了显著更好的渐近性能,比基线高出约 39%。 然而,我们也发现多项式次数的选择是一个关键的超参数,因为高次数(N=8)可能会对学习产生不利影响。 这项工作验证了在深度强化学习中使用正交多项式基的潜力,同时突出了模型复杂性所涉及的权衡。

英文摘要#

The performance of Deep Q-Networks (DQN) is critically dependent on the ability of its underlying neural network to accurately approximate the action-value function. Standard function approximators, such as multi-layer perceptrons, may struggle to efficiently represent the complex value landscapes inherent in many reinforcement learning problems. This paper introduces a novel architecture, the Chebyshev-DQN (Ch-DQN), which integrates a Chebyshev polynomial basis into the DQN framework to create a more effective feature representation. By leveraging the powerful function approximation properties of Chebyshev polynomials, we hypothesize that the Ch-DQN can learn more efficiently and achieve higher performance. We evaluate our proposed model on the CartPole-v1 benchmark and compare it against a standard DQN with a comparable number of parameters. Our results demonstrate that the Ch-DQN with a moderate polynomial degree (N=4) achieves significantly better asymptotic performance, outperforming the baseline by approximately 39%. However, we also find that the choice of polynomial degree is a critical hyperparameter, as a high degree (N=8) can be detrimental to learning. This work validates the potential of using orthogonal polynomial bases in deep reinforcement learning while also highlighting the trade-offs involved in model complexity.

文章页面#

超越 ReLU:切比雪夫 - DQN 以增强深度 Q 网络

PDF 获取#

查看中文 PDF - 2508.14536v1

智能达人抖店二维码

抖音扫码查看更多精彩内容

加载中...
此文章数据所有权由区块链加密技术和智能合约保障仅归创作者所有。