the map interpretation of attention

August 19, 2020 at 10:30 AM | categories: talk, three_strikes_rule

based on my three strikes rule i wrote a talk on the map interpretation of neural attention.

it starts with the first NLP problem i saw where attention was used, goes through the way i think about attention in terms of soft lookup in a map, shows how this soft lookup solves that NLP problem, and finishes with the small mods required to turn it in the building block for the transformer architecture.

here's a recording; check it out!