Surface noise due to breaking waves is ubiquitous in ocean environments and highly influences the passive sonar performance. Since it is originated by sources near the ocean surface, one of its physical features is it largely only represents the intermediate- and high-order modes. The array-level signal-to-noise ratio (SNR), incorporating array-sampled sound intensity, background noise power, and array gain, is the essential quantity accounting for the array performance. This work investigates how the array-level SNR of the vertical line array (VLA) varies with the source depth in downward-refracting shallow water, contributed by the modal structure of the surface noise. Based on the assumption that modes are well sampled, it is theoretically demonstrated that the SNR varying with the source depth can be approximated as a linear combination of the lower-order mode-amplitude intensities varying with the water depth. Particularly, when the surface noise especially dominates and the water channel is highly downward refractive, this variation can be nearly represented by the 1st-order mode-amplitude intensity varying with the depth. The structure is meaningful in practice. It suggests the SNR will be inherently larger when the source is submerged than it is near the ocean surface and will be maximized at a source depth lightly below the 1st-order mode’s peak across different source ranges. The above assertions are demonstrated in a typical downward-refracting shallow-water channel and the effects from the dominant degree of the surface noise, sound speed gradient in water column, and array aperture are numerically investigated. The results suggest that: 1) Under certain circumstances, how the SNR varies with the source depth is nearly irrelevant to the source range; 2) when the surface noise is more significant, the largest SNR at a certain source range will be more significantly larger than the SNR for the source being placed near the surface, the corresponding source depth will be closer to that presenting the 1st-order mode’s peak, and how the SNR varies with the source depth is increasingly irrelevant to the source range ; 3) a stronger downward-refracting sound speed also enhances this SNR superiority, as well as the irrelevance to the source range, but causes the source depth presenting the largest SNR to be more deviated from the 1st-order mode’s peak; 4) although the structure is unraveled assuming the VLA spans the full water column, it can be seen when the VLA does not but covers the main part of the low-order modes, while when the array aperture is insufficiently large it will become approximately periodic with the source range, with the source depth presenting the largest SNR fluctuating lightly and nearly periodically around the 1st-order mode’s peak.