Xu, T.Y., Zou, S.F. and Liang, Y.B. (2019) Two Time-Scale Off-Policy TD Learning Non-Asymptotic Analysis over Markovian Samples. arXiv 1909.11907. - References - Scientific Research Publishing

Article citationsMore>>

Xu, T.Y., Zou, S.F. and Liang, Y.B. (2019) Two Time-Scale Off-Policy TD Learning: Non-Asymptotic Analysis over Markovian Samples. arXiv: 1909.11907.

has been cited by the following article:

TITLE: A Unified Gradient Temporal Difference Learning Algorithm for Off-Policy Learning

AUTHORS: Yafei Zhao, Long Yang

KEYWORDS: Reinforcement Learning, Off-Policy Learning

JOURNAL NAME: Journal of Applied Mathematics and Physics, Vol.14 No.6, June 24, 2026

ABSTRACT: In this paper, we propose a unification of gradient temporal difference (GTD) learning algorithm GQ( σ,λ ) for off-policy learning. The proposed GQ( σ,λ ) ranges from gradient Tree?Backup( λ ) to GQ( λ ) when σ ranges from 0 to 1. We investigate the structure of TD fixed-point of GQ( σ,λ ) , and prove GQ( σ,λ ) converges to its TD fixed-point with probability one. Furthermore, we prove that GQ( σ,λ ) converges to an arbitrarily small neighborhood of the optimal solution with probability one. Empirical results show the GQ( σ,λ ) with a value σ∈( 0,1 ) that creates a mixture of GQ( λ ) and gradient Tree?Backup( λ ) achieves a better performance than both the extreme end σ=0 and σ=1 .

Contact us

	[email protected]
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Copyright © 2006-2026 Scientific Research Publishing Inc. All Rights Reserved.

Top