Vision Transformers

← Back to Computer Vision

Applying the Transformer architecture to image patches rather than text tokens. ViT (Vision Transformer) splits images into patches, embeds them, and processes with standard Transformer blocks. Competitive with CNNs at scale, especially with large datasets.

Transformers (underlying architecture)
CNN Architectures (what ViT competes with)

computer-vision vision-transformers vit

Software Engineering KB

Explorer

Vision Transformers

Vision Transformers

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

Vision Transformers

Vision Transformers

Related

Graph View

Table of Contents

Backlinks