News:

"It is a more blessed thing to give, rather than to receive." β€”Acts 20:35

Main Menu

Transformers for πŸ‡»πŸ‡¦β†’πŸ‡¬πŸ‡§ NMT

Started by Geremia, April 11, 2026, 01:52:05 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Geremia

Excellent explanation of Transformers, cited in Natural Language Processing in Action Β§9.2.2 as "a mind-expanding walk through the modern GPT architecture", "3Blue1Brown visualizations and explanations by Grant Sanderson":
(from full Neural Nets playlist)

Geremia

AquinasLatinEnglish / AquinasLatinEnglishModel uses Transformers and Byte-Pair Encoding.

original Transformers paper:
  • Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. "Attention Is All You Need." arXiv:1706.03762. Preprint, arXiv, August 2, 2023 [1st ed.: 2017].

original byte-pair encoding (BPE) paper:

Geremia


Geremia

Quote from: Geremia on April 18, 2026, 11:25:11 PMoriginal Transformers paper:
  • Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. "Attention Is All You Need." arXiv:1706.03762. Preprint, arXiv, August 2, 2023 [1st ed.: 2017].
Another good explanation of Transformers: