Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., et al. (2009) Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 14-18 June 2009, 993-1000. - References

Article citationsMore>>

Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., et al. (2009) Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 14-18 June 2009, 993-1000.
https://doi.org/10.1145/1553374.1553501

has been cited by the following article:

TITLE: A Unified Gradient Temporal Difference Learning Algorithm for Off-Policy Learning

AUTHORS: Yafei Zhao, Long Yang

KEYWORDS: Reinforcement Learning, Off-Policy Learning

JOURNAL NAME: Journal of Applied Mathematics and Physics, Vol.14 No.6, June 24, 2026

ABSTRACT: In this paper, we propose a unification of gradient temporal difference (GTD) learning algorithm GQ( σ,λ ) for off-policy learning. The proposed GQ( σ,λ ) ranges from gradient Tree?Backup( λ ) to GQ( λ ) when σ ranges from 0 to 1. We investigate the structure of TD fixed-point of GQ( σ,λ ) , and prove GQ( σ,λ ) converges to its TD fixed-point with probability one. Furthermore, we prove that GQ( σ,λ ) converges to an arbitrarily small neighborhood of the optimal solution with probability one. Empirical results show the GQ( σ,λ ) with a value σ∈( 0,1 ) that creates a mixture of GQ( λ ) and gradient Tree?Backup( λ ) achieves a better performance than both the extreme end σ=0 and σ=1 .

	[email protected]
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals by Subject

Publish with us

Article citationsMore>>

Home

About SCIRP

Service

Policies