V

VR_REINFORCE

Variance-Reduced REINFORCE-type Policy Gradient Methods