Implements the Adam optimizer
This is currently quite slow on the CPU and WASM backends. On the GPU backend, one update step is only slightly slower than an update step of SGD and will converge a lot quicker.
Zeros all gradients of the model parameters. This should be called after each optimization step, when the gradients are no longer needed.
Generated using TypeDoc
Implements the Adam optimizer
This is currently quite slow on the CPU and WASM backends. On the GPU backend, one update step is only slightly slower than an update step of SGD and will converge a lot quicker.