MoSiC - Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning |
2025 |
- paper
- ssl
- dense_ssl
- computer_vision
|
|
NeoBabel - A Multilingual Open Tower for Visual Generation |
2025 |
|
|
KV Cache Steering for Inducing Reasoning in Small Language Models |
2025 |
- paper
- llm
- efficient_dl
- reasoning
|
|
Lost in Time - A New Temporal Benchmark for VideoLLMs |
2025 |
|
|
An Image is Worth More Than 16x16 Patches - Exploring Transformers on Individual Pixels |
2024 |
- paper
- dl_theory
- vit
- computer_vision
|
|
SimPLR - A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation |
2024 |
- paper
- object_detection
- computer_vision
- vit
|
|
Low-Resource Vision Challenges for Foundation Models |
2024 |
- paper
- efficient_dl
- foundation_models
- computer_vision
|
|
PIN - Positional Insert Unlocks Object Localisation Abilities in VLMs |
2024 |
- paper
- multimodal
- object_localisation
|
|
R-MAE - Regions Meet Masked Autoencoders |
2023 |
|
|
Learning Unseen Modality Interaction |
2023 |
|
|
Self-Guided Diffusion Models |
2023 |
- paper
- computer_vision
- diffusion
|
|
Unlocking Slot Attention by Changing Optimal Transport Costs |
2023 |
- paper
- computer_vision
- video
|
|
BoxeR - Box-Attention for 2D and 3D Transformers |
2021 |
- paper
- transformers
- object_detection
|
|