This paper presents a novel convolutional neural network-based single-pixel imaging method that integrates a physics-driven fusion attention mechanism. By incorporating a module that combines both channel attention mechanism and spatial attention mechanism into a randomly initialized convolutional network, the method utilizes the physical model constraints of single-pixel imaging to achieve high-quality image reconstruction. Specifically, the spatial and channel attention mechanism are combined into a single module and introduced into various layers of a multi-scale U-net convolutional network. In the spatial attention mechanism, we extract the attention weight features of each spatial region of the pooled feature map by using convolution. In the channel attention mechanism, we pool the three-dimensional feature map into a single-channel signal and input it into a two-layer fully connected network to obtain the attention weight information for each channel. This approach not only uses the critical weighting information provided by the attention mechanism in the three-dimensional data cube but also fully integrates the powerful feature extraction capabilities of the U-net network across different spatial frequencies. This innovative method can effectively capture image details, suppress background noise, and improve image reconstruction quality. During the experimental phase, we employ the optical path of single-pixel imaging to acquire bucket signals for two target images, "snowflake" and "basket". By inputting any noisy image into a randomly initialized neural network with attention mechanism, and using the mean square error between simulated bucket signal and actual bucket signal, we physically constrain the convergence of the network. Ultimately, we achieve a reconstructed image that adheres to the physical model. The experimental results demonstrate that under low sampling rate conditions, the scheme of integrating the attention mechanism can not only intuitively reconstruct image details better, but also demonstrate significant advantages in quantitative evaluation metrics such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), confirming its effectiveness and potential application in single-pixel imaging.