News:

"Omnis enim res quæ dando non deficit, dum habetur et non datur, nondum habetur quomodo habenda est." ("For a possession which is not diminished by being shared with others, if it is possessed and not shared, is not yet possessed as it ought to be possessed.") —St. Augustine, De doctrina Christiana lib. 1 cap. 1

Main Menu

Transformers for πŸ‡»πŸ‡¦β†’πŸ‡¬πŸ‡§ NMT

Started by Geremia, April 11, 2026, 01:52:05 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Geremia

Excellent explanation of Transformers, cited in Natural Language Processing in Action Β§9.2.2 as "a mind-expanding walk through the modern GPT architecture", "3Blue1Brown visualizations and explanations by Grant Sanderson":
(from full Neural Nets playlist)

Geremia

AquinasLatinEnglish / AquinasLatinEnglishModel uses Transformers and Byte-Pair Encoding.

original Transformers paper:
  • Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. "Attention Is All You Need." arXiv:1706.03762. Preprint, arXiv, August 2, 2023.

original byte-pair encoding (BPE) paper: