SL-Net: self-learning and mutual attention-based distinguished window for RGBD complex salient object detection
Abstract
Significant improvement has been noticed in salient object detection by multi-modal cross-complementary fusion between Depth and RGB features. The multi-modal feature extracting backbone of existing networks cannot extract complex RGB and color images effectively, which limits the performance of salient object detection in complex and challenging situations. In this paper, a composite backbone network with a mutual attention-based distinguished window is proposed to enhance the salient region and minimize the non-salient region. The distinguished window based on the channel-wise, spatial, mutual, and feature-level attention is inserted in each encoder stage to enhance the saliency features. Finally, a novel self-learning-based decoder, which is capable of utilizing multi-level features is designed to get the accurately dense prediction. The multi-level fusion is guided by deep global localized features. The performance of salient object detection could significantly be enhanced in this way. The extensive comparative and ablation experiments for the proposed framework have been conducted on the seven publicly available datasets for visual saliency. Experimental results have illustrated the effectiveness of the proposed framework and show better performance in comparison with the closely related state-of-the-art methods. © 2022, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.