Home¶
DropPos Pre Training Vision Transformers by Reconstructing Dropped Positions
HoPE A Novel Positional Encoding Without Long Term Decay for Enhanced Context Awareness and Extrapolation
How JEPA Avoids Noisy Features The Implicit Bias of DeepLinear Self Distillation Networks
Self Supervised Learning from Images with a Joint Embedding Predictive Architecture
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
nGPT Normalized Transformer with Representation Learning on the Hypersphere
Revealing the Utilized Rank of Subspaces of Learning in Neural Networks
Memorization Through the Lens of Curvature of Loss Function Around Samples
AWQ Activation aware Weight Quantization for LLM Compression and Acceleration
Exact Conversion of In Context Learning to Model Weights in Linearized Attention Transformers
Hydra Bidirectional State Space Models Through Generalized Matrix Mixers
Battle of the Backbones A Large Scale Comparison of Pretrained Models across Computer Vision Tasks
On Good Practices for Task Specific Distillation of Large Pretrained Visual Models
ViDT An Efficient and Effective Fully Transformer based Object Detector
LRP QViT Mixed Precision Vision Transformer Quantization via Layer wise Relevance Propagation
SimPLR A Simple and Plain Transformer for Scaling Efficient Object Detection and Segmentation
A survey of quantization methods for efficient neural network inference
Building on Efficient Foundations Effectively Training LLMs with Structured Feedforward Layers
EfficientViT SAM Accelerated Segment Anything Model Without Accuracy Loss
Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization
Model Compression in Practice Lessons Learned from Practitioners Creating On device Machine Learning Experiences
ProxylessNAS Direct Neural Architecture Search on Target Task and Hardware
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels
MobileCLIP Fast Image Text Models through Multi Modal Reinforced Training
Retrospective EIE Efficient Inference Engine onSparse and Compressed Neural Network
Parameter Efficient Fine tuning of Self supervised ViTs without Catastrophic Forgetting
Parameter Efficient Fine Tuning for Pre Trained Vision Models A Survey
SAM CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding
Simultaneous linear connectivity of neural networks modulo permutation
Surgical DINO Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Talaria Interactively Optimizing Machine Learning Models for Efficient Inference
Block Transformer Global to Local Language Modeling for Fast Inference
Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
A Hierarchy of Graph Neural Networks Based on Learnable Local Features
G SGD Optimizing ReLU Neural Networks in its Positively Scale Invariant Space
Harmonics of Learning Universal Fourier Features Emerge in Invariant Networks
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task specific Models
Neural Mechanics Symmetry and Broken Conservation Laws in Deep Learning Dynamics
On the Symmetries of Deep Learning Models and their Internal Representations
OpenELM An Efficient Language Model Family with Open source Training and Inference Framework
Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems
Vision Mamba Efficient Visual Representation Learning with Bidirectional State Space Model
Scaling (Down) CLIP A Comprehensive Analysis of Data, Architecture, and Training Strategies
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
An image is worth 16x16 words Transformers for image recognition at scale
Approximation Generalization Trade offs under (Approximate) Group Equivariance
ConViT Improving Vision Transformers with Soft Convolutional Inductive Biases
Fast, Expressive SE(n) Equivariant Networks through Weight Sharing in Position Orientation Space
MobileViT light weight, general purpose, and mobile friendly vision transformer
Relaxing Equivariance Constraints with Non stationary Continuous Filters
Self Supervised Detection of Perfect and Partial Input Dependent Symmetries
Exploiting Redundancy Separable Group Convolutional Networks on Lie Groups