Second Brain
Neel Nanda
Initializing search
Second Brain
Home
000 Zettelkasten
000 Zettelkasten
2D Convolutions
Ahead of Time (AOT) Compilation
Are less inductive biases better or worse?
Bit Palettization
Block Expansion
Convergence rate and Hessian spectra
Depthwise separable convolutions
Do Vision Foundation models exist?
Effect of weight symmetries on training dynamics
Equivariance Initialization
Grokking
Group Axioms
Group direct product
Hardware specific structured pruning
Input dependent convolutions
K Means based Quantization
KV Cache
Linear Quantization
LoRa Adapter
Masked Image Modelling
Maximal pruning and functional recovery
Mean Attention Distance
Multiple global minima
Neural Network Quantization
Non translationally equivariant convolutions
Positive Logic Programs
Priors over Neural Network weights
PyTorch Functionalization
PyTorch Quantization for TensorRT
Representation (Group Theory)
Residual stream
100 Reference notes
100 Reference notes
101 Literature
101 Literature
A Brief Review of Hypernetworks in Deep Learning
A ConvNet for the 2020s
A Hierarchy of Graph Neural Networks Based on Learnable Local Features
A Mathematical Framework for Transformer Circuits
A general theory of correct, incorrect, and extrinsic equivariance
A survey of quantization methods for efficient neural network inference
AWQ Activation aware Weight Quantization for LLM Compression and Acceleration
Adapting Vision Foundation Models for Plant Phenotyping
An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
An image is worth 16x16 words Transformers for image recognition at scale
Apple Intelligence Foundation Language Models
Approximately equivariant networks for imperfectly symmetric dynamics
Approximation Generalization Trade offs under (Approximate) Group Equivariance
Autoequivariant Network Search via Group Decomposition
Battle of the Backbones A Large Scale Comparison of Pretrained Models across Computer Vision Tasks
Block Transformer Global to Local Language Modeling for Fast Inference
BoxeR Box Attention for 2D and 3D Transformers
Building on Efficient Foundations Effectively Training LLMs with Structured Feedforward Layers
CKConv Continuous Kernel Convolution For Sequential Data
Color Equivariant Convolutional Networks
Color Space Transformation Network
ConViT Improving Vision Transformers with Soft Convolutional Inductive Biases
DETRs Beat YOLOs on Real time Object Detection
DETRs with Collaborative Hybrid Assignments Training
DINOv2 Learning Robust Visual Features without Supervision
Deep Learning Book
DenseNets Reloaded Paradigm Shift Beyond ResNets and ViTs
Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
EVA 02 A Visual Representation for Neon Genesis
Early Convolutions Help Transformers See Better
Efficient Equivariant Transfer Learning from Pretrained Models
Efficient Modulation for Vision Networks
EfficientViT SAM Accelerated Segment Anything Model Without Accuracy Loss
Emergent Equivariance in Deep Ensembles
Emerging Properties in Self Supervised Vision Transformers
End to End Object Detection with Transformers
Equi Tuning Group Equivariant Fine Tuning of Pretrained Models
Equivariance with Learned Canonicalization Functions
Equivariance aware architectural optimization of neural networks
Exact Conversion of In Context Learning to Model Weights in Linearized Attention Transformers
Exploiting Redundancy Separable Group Convolutional Networks on Lie Groups
Exploring Plain Vision Transformer Backbones for Object Detection
Fast, Expressive SE(n) Equivariant Networks through Weight Sharing in Position Orientation Space
FlexiViT One Model for All Patch Sizes
G SGD Optimizing ReLU Neural Networks in its Positively Scale Invariant Space
Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization
Harmonics of Learning Universal Fourier Features Emerge in Invariant Networks
How do vision transformers work?
Hydra Bidirectional State Space Models Through Generalized Matrix Mixers
Hyperspherical Variational Auto Encoders
Improving Convergence and Generalization Using Parameter Symmetries
In Search of Projectively Equivariant Networks
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task specific Models
LRP QViT Mixed Precision Vision Transformer Quantization via Layer wise Relevance Propagation
Learned Gridification for Efficient Point Cloud Processing
Learning Partial Equivariances from Data
Learning both Weights and Connections for Efficient Neural Networks
Learning with Unmasked Tokens Drives Stronger Vision Learners
Llama 2 Open Foundation and Fine Tuned Chat Models
LoRA Low Rank Adaptation of Large Language Models
Mamba Linear Time Sequence Modeling with Selective State Spaces
Memorization Through the Lens of Curvature of Loss Function Around Samples
Mixture of LoRa Experts
MobileCLIP Fast Image Text Models through Multi Modal Reinforced Training
MobileViT light weight, general purpose, and mobile friendly vision transformer
Model Compression in Practice Lessons Learned from Practitioners Creating On device Machine Learning Experiences
Neural Mechanics Symmetry and Broken Conservation Laws in Deep Learning Dynamics
On Good Practices for Task Specific Distillation of Large Pretrained Visual Models
On the Relationship between Self Attention and Convolutional Layers
On the Symmetries of Deep Learning Models and their Internal Representations
OpenELM An Efficient Language Model Family with Open source Training and Inference Framework
Optimal Brain Damage
Optimization Dynamics of Equivariant and Augmented Neural Networks
Parameter Efficient Fine tuning of Self supervised ViTs without Catastrophic Forgetting
Parameter Efficient Fine Tuning for Pre Trained Vision Models A Survey
Progress measures for grokking via mechanistic interpretability
Provably Strict Generalisation Benefit for Equivariant Models
ProxylessNAS Direct Neural Architecture Search on Target Task and Hardware
R MAE Regions Meet Masked Autoencoders
Refusal in Language Models Is Mediated by a Single Direction
Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems
Relaxing Equivariance Constraints with Non stationary Continuous Filters
Retrospective EIE Efficient Inference Engine onSparse and Compressed Neural Network
Revealing the Utilized Rank of Subspaces of Learning in Neural Networks
Rewrite the Stars
SAM CLIP Merging Vision Foundation Models towards Semantic and Spatial Understanding
Scaling (Down) CLIP A Comprehensive Analysis of Data, Architecture, and Training Strategies
Segment Anything
Self Supervised Detection of Perfect and Partial Input Dependent Symmetries
SimPLR A Simple and Plain Transformer for Scaling Efficient Object Detection and Segmentation
Simultaneous linear connectivity of neural networks modulo permutation
Stand Alone Self Attention in Vision Models
Surgical Fine Tuning Improves Adaptation to Distribution Shifts
Surgical DINO Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Symmetries in Overparametrized Neural Networks A Mean Field View
Talaria Interactively Optimizing Machine Learning Models for Efficient Inference
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
The Lie derivative for measuring learned equivariance
The Unreasonable Ineffectiveness of the Deeper Layers
TiC CLIP Continual Training of CLIP models
Training quantized nets A deeper understanding
Understanding Deep Learning Chapter 10
Understanding Deep Learning Chapter 20
Understanding symmetries in deep networks
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
ViDT An Efficient and Effective Fully Transformer based Object Detector
Vision Mamba Efficient Visual Representation Learning with Bidirectional State Space Model
Vision Transformers Need Registers
What Do Self Supervised Vision Transformers Learn?
nGPT Normalized Transformer with Representation Learning on the Hypersphere
102 Authors
102 Authors
Albert Gu
Alex Flinth
Alexander Kirillov
Alexey Dosovitskiy
Ananya Kumar
Andreas Loukas
Andreas Savakis
Angela Fan
Annie S. Chen
Antonio Orvieto
Ardavan Pedram
Armand Joulin
Attila Lengyel
Boris Ginsburg
Boshi Wang
Byeongho Heo
Caglar Gulcehre
Carmen Amo Alonso
Cees G. M. Snoek
Chelsea Finn
Cheng Ping Hsieh
Chong Wang
Christopher Olah
Daniel M. Roy
Daniel Ulbricht
David M. Knigge
David W. Romero
Diane Larlus
Donghyun Kim
Dongyoon Han
Duy Kien Nguyen
Edward J. Hu
Edward Z. Yang
Eric Mintun
Erik J. Bekkers
Eshan Verma
Fahim Tajwar
Fartash Faghri
Francisco Massa
Fred Hohman
Furu Wei
Gabriel Synnaeve
Gintare Karolina Dziugaite
Hadi Pouransari
Han Cai
Hanzi Mao
Haoxiang Wang
Hervé Jegou
Huaxiu Yao
Hugo Touvron
Huizi Mao
Ilya Loshchilov
Isha Garg
Ishan Misra
Jan E. Gerken
Javier Maass Martinez
Jean Baptiste Cordonnier
Jeff Pool
Jesse Cai
Jing Pu
Joaquin Fontbona
John Denker
John Tran
Julien Mairal
Juliette Marrie
Kaiming He
Kamyar Azizzadenesheli
Kaushik Roy
Lawrence Chan
Lucius Bushnaq
Maciej Wołczyk
Mahmoud Assran
Marc Finzi
Mark A. Horowitz
Martin Jaggi
Martin R. Oswald
Mathilde Caron
Maxime Oquab
Mehrdad Farajtabar
Michael Arbel
Mohammad Rastegari
Namuk Park
Navin Ranjan
Neel Nanda
Nicolas Carion
Nicolas Usunier
Oncel Tuzel
Patrick Forré
Pavan Kumar Anasosalu Vasu
Percy Liang
Piotr Bojanowski
Raviteja Vemulapalli
Razvan Pascanu
Robin Walters
Rose Yu
Ross Girshick
Rui Wang
Ruoming Pang
Sachin Mehta
Sangdoo Yun
Sanghyuk Chun
Sara Solla
Sergey Zagoruyko
Shaohan Huang
Simeng Sun
Simon J.D. Prince
Skander Moalla
Soham De
Song Han
Songkuk Kim
Sourya Basu
Stéphane d'Ascoli
Sukjun Hwang
Taekyung Kim
Tete Xiao
Thomas Kipf
Tim R. Davidson
Tom Gunter
Tom Lieberum
Vaibhav Aggarwal
William J. Dally
Wonjae Kim
Xiang Yue
Xingyu Liu
Xinlei Chen
Xiuying Wei
Xu Ma
Xun Wu
Yanghao Li
Yann LeCun
Yelong Shen
Yoonho Lee
Zeyuan Allen Zhu
Zhuoyang Zhang
Ziaoyi Zhang
Zirui Wang
103 Affiliations
103 Affiliations
Anthropic
Apollo Research
Apple
CLAIRE
Carnegie Mellon University
Chalmers University of Technology
EPFL
FAIR
Google DeepMind
Google
IBM Research
INRIA
MIT
McGill University
Meta
Microsoft
Mila Quebec AI Institute
NVIDIA
Naver AI Lab
Naver Cloud AI
Naver Labs Europe
New York University
Northeastern University
OpenAI
Princeton University
PyTorch
Rochester Institute of Technology
Stanford
TU Delft
Tsinghua University
UC Berkeley
UC San Diego
UC Santa Barbara
UCLA
University of Amsterdam
University of Chile
University of Illinois at Urbana Champaign
University of Oxford
Vector Institute
Vrije Universiteit Amsterdam
Yonsei University
104 Other
104 Other
EPFL CS439 Optimization for Machine Learning
GPU mode Sparsity
Introducing Apple’s On Device and Server Foundation Models
Introduction to Quantization on PyTorch
Let's talk about the Python Dispatcher
MIT 65940 TinyML and Efficient Deep Learning Computing
Optimizing Vision Transformer Model for Deployment
PyTorch ExecuTorch Export IR Specification
PyTorch ExecuTorch How ExecuTorch works?
PyTorch ExecuTorch Quantization Overview
PyTorch Functionalization in PyTorch Everything you need to know
PyTorch PyTorch 2 Export Post Training Quantization
PyTorch Quantization
PyTorch Compilers What makes PyTorch beloved makes it hard to compile
PyTorch Conference 2024 Fast Sparse Vision Transformers with minimal accuracy loss
PyTorch Conference 2024 What’s new in torch.export?
PyTorch Conference 2024
PyTorch Eager Mode Quantization TensorRT Acceleration
PyTorch internals
Quantized Transfer Learning for Computer Vision Tutorial
Reinforcement Learning An Introduction Chapter 10
Reinforcement Learning An Introduction Chapter 11
Reinforcement Learning An Introduction Chapter 13
Reinforcement Learning An Introduction Chapter 16
Reinforcement Learning An Introduction Chapter 2
Reinforcement Learning An Introduction Chapter 3
Reinforcement Learning An Introduction Chapter 4
Reinforcement Learning An Introduction Chapter 5
Reinforcement Learning An Introduction Chapter 6
Reinforcement Learning An Introduction Chapter 7
Reinforcement Learning An Introduction Chapter 9
Reinforcement Learning An Introduction
TinyML and Efficient Deep Learning Computing Lecture 12
TinyML and Efficient Deep Learning Computing Lecture 3
TinyML and Efficient Deep Learning Computing Lecture 5
TinyML and Efficient Deep Learning Computing Lecture 6
TinyML and Efficient Deep Learning Computing
Tweet Stable Diffusion XL on iPhone with Core ML!
Neel Nanda
Properties
affiliation
Google DeepMind
,
Anthropic