Yu, H.Z. (2017) On Convergence of Some Gradient-Based Temporal-Differences Algorithms for Off-Policy Learning. arXiv 1712.09652. - References - Scientific Research Publishing

Article citationsMore>>

Yu, H.Z. (2017) On Convergence of Some Gradient-Based Temporal-Differences Algorithms for Off-Policy Learning. arXiv: 1712.09652.

has been cited by the following article:

TITLE: A Unified Gradient Temporal Difference Learning Algorithm for Off-Policy Learning

AUTHORS: Yafei Zhao, Long Yang

KEYWORDS: Reinforcement Learning, Off-Policy Learning

JOURNAL NAME: Journal of Applied Mathematics and Physics, Vol.14 No.6, June 24, 2026

ABSTRACT: In this paper, we propose a unification of gradient temporal difference (GTD) learning algorithm GQ( σ,λ ) for off-policy learning. The proposed GQ( σ,λ ) ranges from gradient Tree?Backup( λ ) to GQ( λ ) when σ ranges from 0 to 1. We investigate the structure of TD fixed-point of GQ( σ,λ ) , and prove GQ( σ,λ ) converges to its TD fixed-point with probability one. Furthermore, we prove that GQ( σ,λ ) converges to an arbitrarily small neighborhood of the optimal solution with probability one. Empirical results show the GQ( σ,λ ) with a value σ∈( 0,1 ) that creates a mixture of GQ( λ ) and gradient Tree?Backup( λ ) achieves a better performance than both the extreme end σ=0 and σ=1 .

Contact us

	[email protected]
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Copyright © 2006-2026 Scientific Research Publishing Inc. All Rights Reserved.

Top