Home¶
Beyond cls Exploring the true potential of Masked Image Modeling representations
Deformable DETR Deformable Transformers for End to End Object Detection
FlexTok Resampling Images into 1D Token Sequences of Flexible Length
From Pixels to Components Eigenvector Masking for Visual Representation Learning
Guillotine Regularization Why removing layers is needed to improve generalization in Self Supervised Learning
How Does SimSiam Avoid Collapse Without Negative Samples A Unified Understanding with Self supervised Contrastive Learning
Learning Representations on the Unit Sphere Investigating Angular Gaussian and von Mises Fisher Distributions for Online Continual Learning
Near, far Patch ordering enhances vision foundation models' scene understanding
On the duality between contrastive and non contrastive self supervised learning
Patch Wise Self Supervised Visual Representation Learning A Fine Grained Approach
PatchRot A Self Supervised Technique for Training Vision Transformers
Scaling and Benchmarking Self Supervised Visual Representation Learning
Self supervised learning of Split Invariant Equivariant representations
Self supervised learning of intertwined content and positional features for object detection
Toward a Geometrical Understanding of Self supervised Contrastive Learning
Variance Covariance Regularization Enforces Pairwise Independence in Self Supervised Representations
DropPos Pre Training Vision Transformers by Reconstructing Dropped Positions
HoPE A Novel Positional Encoding Without Long Term Decay for Enhanced Context Awareness and Extrapolation
How JEPA Avoids Noisy Features The Implicit Bias of DeepLinear Self Distillation Networks
Self Supervised Learning from Images with a Joint Embedding Predictive Architecture
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
nGPT Normalized Transformer with Representation Learning on the Hypersphere
Revealing the Utilized Rank of Subspaces of Learning in Neural Networks
Memorization Through the Lens of Curvature of Loss Function Around Samples
AWQ Activation aware Weight Quantization for LLM Compression and Acceleration
Exact Conversion of In Context Learning to Model Weights in Linearized Attention Transformers
Hydra Bidirectional State Space Models Through Generalized Matrix Mixers
Battle of the Backbones A Large Scale Comparison of Pretrained Models across Computer Vision Tasks
On Good Practices for Task Specific Distillation of Large Pretrained Visual Models
ViDT An Efficient and Effective Fully Transformer based Object Detector
LRP QViT Mixed Precision Vision Transformer Quantization via Layer wise Relevance Propagation
SimPLR A Simple and Plain Transformer for Scaling Efficient Object Detection and Segmentation
A survey of quantization methods for efficient neural network inference
Building on Efficient Foundations Effectively Training LLMs with Structured Feedforward Layers
EfficientViT SAM Accelerated Segment Anything Model Without Accuracy Loss
Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization
Model Compression in Practice Lessons Learned from Practitioners Creating On device Machine Learning Experiences
ProxylessNAS Direct Neural Architecture Search on Target Task and Hardware
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels
MobileCLIP Fast Image Text Models through Multi Modal Reinforced Training
Retrospective EIE Efficient Inference Engine onSparse and Compressed Neural Network
Parameter Efficient Fine tuning of Self supervised ViTs without Catastrophic Forgetting
Parameter Efficient Fine Tuning for Pre Trained Vision Models A Survey
SAM CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding
Simultaneous linear connectivity of neural networks modulo permutation
Surgical DINO Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Talaria Interactively Optimizing Machine Learning Models for Efficient Inference
Block Transformer Global to Local Language Modeling for Fast Inference
Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
A Hierarchy of Graph Neural Networks Based on Learnable Local Features
G SGD Optimizing ReLU Neural Networks in its Positively Scale Invariant Space
Harmonics of Learning Universal Fourier Features Emerge in Invariant Networks
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task specific Models
Neural Mechanics Symmetry and Broken Conservation Laws in Deep Learning Dynamics
On the Symmetries of Deep Learning Models and their Internal Representations
OpenELM An Efficient Language Model Family with Open source Training and Inference Framework
Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems
Vision Mamba Efficient Visual Representation Learning with Bidirectional State Space Model
Scaling (Down) CLIP A Comprehensive Analysis of Data, Architecture, and Training Strategies
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
An image is worth 16x16 words Transformers for image recognition at scale
Approximation Generalization Trade offs under (Approximate) Group Equivariance
ConViT Improving Vision Transformers with Soft Convolutional Inductive Biases
Fast, Expressive SE(n) Equivariant Networks through Weight Sharing in Position Orientation Space
MobileViT light weight, general purpose, and mobile friendly vision transformer
Relaxing Equivariance Constraints with Non stationary Continuous Filters
Self Supervised Detection of Perfect and Partial Input Dependent Symmetries
Exploiting Redundancy Separable Group Convolutional Networks on Lie Groups