Embeddings
← Back to Text Processing Pipeline
Dense vector representations of tokens that capture semantic meaning. The foundation of how neural models understand text.
Types
- Word2Vec — static word embeddings, skip-gram and CBOW
- GloVe — global vectors from co-occurrence statistics
- FastText — subword embeddings, handles OOV words
- Contextual Embeddings — different vector per context (BERT, GPT output embeddings)
Related
- Vocabulary (each token gets an embedding)
- Retrieval-Augmented Generation (uses embeddings for retrieval)