- This video goes through the diagram that displays our current process for how we train the policy. Training the policy includes: sampling experience (trajectories) from the replay buffer and updating the network weights based on minimizing the loss between the target and current actor-critic networks.
- Miro Board (with original diagram)
Zoom Recording ID: 91354111057
UUID: nLK7vNc1TTix/khiR+TTSA==
Meeting Time: 2021-06-23T17:00:43Z
…Read more
Less…