vit#
Based on vit from blip code base
Classes
- class models.coda_prompt_utils.vit.Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]#
Bases:
Module
- class models.coda_prompt_utils.vit.Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]#
Bases:
Module
- class models.coda_prompt_utils.vit.Mlp(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]#
Bases:
ModuleMLP as used in Vision Transformer, MLP-Mixer and related networks
- class models.coda_prompt_utils.vit.VisionTransformer(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, representation_size=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=None, ckpt_layer=0)[source]#
Bases:
ModuleVision Transformer A PyTorch impl of : An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale -