m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1576}
hide / / print
ref: -0 tags: GFlowNet Bengio probabilty modelling reinforcement learing date: 10-29-2023 19:17 gmt revision:3 [2] [1] [0] [head]

GFlowNet Tutorial

  • It's basically like RL, only treating the reward as a scaled unnormalized probability or 'flow'.
  • Unlike RL, GFN are constructive, only add elements (actions), which means the resulting graph is either a DAG or tree. (No state aliasing)
  • Also unlike RL / REINFORCE / Actor-critic, the objective is to match forward and reverse flows, both parameterized by NNs. Hence, rather than BPTT or unrolls, information propagation is via the reverse policy model. This forward-backward difference based loss is reminiscent of self-supervised Barlow Twins, BYOL, Siamese networks, or [1][2]. Bengio even has a paper talking about it [3].
    • The fact that it works well means that it must be doing some sort of useful regularization, which is super interesting.
    • Or it just means there are N+1 ways of skinning the cat!
  • Adopting a TD(λ) TD(\lambda) approach of sampling trajectories for reward back-propagation improves convergence/generalization. Really not that different from RL..
  • At least 4 different objectives (losses):
    • Matching per-state in and out flow
    • Matching per-state forward and backward flow
    • Matching whole-trajectory forward and backward flow
    • Subsampling whole-trajectory and matching their flow.