mixedmath

Explorations in math and programming
David Lowry-Duda



At today's Paper Talk session cohosted by IDI, I'm giving a talk today on the paper "The Free Transformer" by Fleuret (preprint available here). This paper describes a modification of the standard decoder-only transformer that learns to condition token generation on random latent variables learned without supervision.

We describe some of the intuition and background for the new architecture.

My talk is a RevealJS presentation, and is available here.

This architecture is very straightforward to implement and there are some natural followup experiments to do. On the other hand, after some quiet experiments on my own, I think this new architecture isn't going to take over the world. It's very interesting though!.

If you are interested in discussing this further, let me know.


Leave a comment

Info on how to comment

To make a comment, please send an email using the button below. Your email address won't be shared (unless you include it in the body of your comment). If you don't want your real name to be used next to your comment, please specify the name you would like to use. If you want your name to link to a particular url, include that as well.

bold, italics, and plain text are allowed in comments. A reasonable subset of markdown is supported, including lists, links, and fenced code blocks. In addition, math can be formatted using $(inline math)$ or $$(your display equation)$$.

Please use plaintext email when commenting. See Plaintext Email and Comments on this site for more. Note also that comments are expected to be open, considerate, and respectful.

Comment via email