Article citationsMore>>
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., et al. (2009) Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 14-18 June 2009, 993-1000.
https://doi.org/10.1145/1553374.1553501
has been cited by the following article:
-
TITLE:
A Unified Gradient Temporal Difference Learning Algorithm for Off-Policy Learning
AUTHORS:
Yafei Zhao, Long Yang
KEYWORDS:
Reinforcement Learning, Off-Policy Learning
JOURNAL NAME:
Journal of Applied Mathematics and Physics,
Vol.14 No.6,
June
24,
2026
ABSTRACT: In this paper, we propose a unification of gradient temporal difference (GTD) learning algorithm
GQ(
σ,λ
)
for off-policy learning. The proposed
GQ(
σ,λ
)
ranges from gradient
Tree?Backup(
λ
)
to
GQ(
λ
)
when
σ
ranges from 0 to 1. We investigate the structure of TD fixed-point of
GQ(
σ,λ
)
, and prove
GQ(
σ,λ
)
converges to its TD fixed-point with probability one. Furthermore, we prove that
GQ(
σ,λ
)
converges to an arbitrarily small neighborhood of the optimal solution with probability one. Empirical results show the
GQ(
σ,λ
)
with a value
σ∈(
0,1
)
that creates a mixture of
GQ(
λ
)
and gradient
Tree?Backup(
λ
)
achieves a better performance than both the extreme end
σ=0
and
σ=1
.