| MoSiC - Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning |
2025 |
- paper
- ssl
- dense_ssl
- computer_vision
|
|
| NeoBabel - A Multilingual Open Tower for Visual Generation |
2025 |
|
|
| KV Cache Steering for Inducing Reasoning in Small Language Models |
2025 |
- paper
- llm
- efficient_dl
- reasoning
|
|
| Lost in Time - A New Temporal Benchmark for VideoLLMs |
2025 |
|
|
| An Image is Worth More Than 16x16 Patches - Exploring Transformers on Individual Pixels |
2024 |
- paper
- dl_theory
- vit
- computer_vision
|
|
| SimPLR - A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation |
2024 |
- paper
- object_detection
- computer_vision
- vit
|
|
| Low-Resource Vision Challenges for Foundation Models |
2024 |
- paper
- efficient_dl
- foundation_models
- computer_vision
|
|
| PIN - Positional Insert Unlocks Object Localisation Abilities in VLMs |
2024 |
- paper
- multimodal
- object_localisation
|
|
| R-MAE - Regions Meet Masked Autoencoders |
2023 |
|
|
| Learning Unseen Modality Interaction |
2023 |
|
|
| Self-Guided Diffusion Models |
2023 |
- paper
- computer_vision
- diffusion
|
|
| Unlocking Slot Attention by Changing Optimal Transport Costs |
2023 |
- paper
- computer_vision
- video
|
|
| BoxeR - Box-Attention for 2D and 3D Transformers |
2021 |
- paper
- transformers
- object_detection
|
|