pbloem ,
@pbloem@sigmoid.social avatar

@futurebird @dr2chase @baldur It's low as in the number of bits you use to represent the number, so it's indeed about fewer trustworthy significant digits.

The key is to use it only for parts of the network that are linear, like matrix multiplications, since the errors caused by low-precision don't blow up there.

It definitely matters after training too (during "inference"). After a network is trained, you can get down to an even lower number of bits, like 4. This helps with speed and battery.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • test
  • worldmews
  • mews
  • All magazines