This thesis explores the integration of attention mechanisms to enhance image matching, a fundamental task in computer vision. The analysis of different matching paradigms led to the design of the innovative SAM and BEAMER architectures, incorporating novel structured-attention and beam-attention modules. These models have demonstrated their ability to accurately handle a wide range of visual perturbations while maintaining a balance between performance and computational costs. Our work thus opens new perspectives for various applications, including 3D scene reconstruction and understanding, augmented reality, and autonomous navigation.