Training the Policy (Sample + Update Diagram)

views

This video goes through the diagram that displays our current process for how we train the policy. Training the policy includes: sampling experience (trajectories) from the replay buffer and updating the network weights based on minimizing the loss between the target and current actor-critic networks.
Miro Board (with original diagram)

Zoom Recording ID: 91354111057 UUID: nLK7vNc1TTix/khiR+TTSA== Meeting Time: 2021-06-23T17:00:43Z