vit#

Classes

class models.coda_prompt_utils.vit.Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]#

Bases: Module

forward(x, register_hook=False, prompt=None)[source]#
get_attention_map()[source]#
get_attn_gradients()[source]#
save_attention_map(attention_map)[source]#
save_attn_gradients(attn_gradients)[source]#
class models.coda_prompt_utils.vit.Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]#

Bases: Module

forward(x, register_hook=False, prompt=None)[source]#
class models.coda_prompt_utils.vit.Mlp(in_features, hidden_features=None, out_features=None, act_layer=<class 'torch.nn.modules.activation.GELU'>, drop=0.0)[source]#

Bases: Module

MLP as used in Vision Transformer, MLP-Mixer and related networks

forward(x)[source]#
class models.coda_prompt_utils.vit.VisionTransformer(img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, representation_size=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=None, ckpt_layer=0)[source]#

Bases: Module

Vision Transformer A PyTorch impl of : An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale -

forward(x, register_blk=-1, prompt=None, q=None, train=False, task_id=None)[source]#
no_weight_decay()[source]#