Hyperspectral image classification using hybrid convolutional-based cross-patch retentive network

Arya R.K.; Peddi R.; Srivastava R.

doi:https://doi.org/10.1016/j.cviu.2025.104382

Hyperspectral image classification using hybrid convolutional-based cross-patch retentive network

Authors

Abstract

Vision transformer (ViT) is a widely used method to capture long-distance dependencies and has demonstrated remarkable results in classifying hyperspectral images (HSIs). Nevertheless, the fundamental component of ViT, self-attention, has difficulty striking a balance between global modeling and high computational complexity across entire input sequences. Recently, the Retentive Network (RetNet) was developed to address this issue, claiming to be more scalable and efficient than standard transformers. However, RetNet struggles to capture local features such as traditional transformers. This paper proposes a RetNet-based novel hybrid convolutional-based cross-patch retentive network (HCCRN). The proposed HCCRN model comprises a hybrid convolutional-based feature extraction (HCFE) module, a weighted feature tokenization module, and a cross-patch retentive network (CRN) module. The HCFE architecture combines four 2D convolutional layers and residual connections with a 3D convolutional layer to extract high-level fused spatial–spectral information and capture low-level spectral features. This hybrid method solves the vanishing gradient issue and comprehensively represents intricate spatial–spectral interactions by enabling hierarchical learning of spectral context and spatial dependencies. To further maximize processing efficiency, the acquired spatial–spectral data are transformed into semantic tokens by the tokenization module, which feeds them into the CRN module. CRN enriches feature representations and increases accuracy by utilizing a multi-head cross-patch retention mechanism to capture numerous semantic relations between input tokens. Extensive experiments on three benchmark datasets have shown that the proposed HCCRN architecture significantly outperforms state-of-the-art methods. It reduces computation time and increases classification accuracy, demonstrating its generalizability and robustness in the HSIC task. © 2025 Elsevier Inc.