Adaptive optics (AO) systems are indispensable for mitigating atmospheric turbulence-induced wavefront distortions in applications such as astronomical observation and free-space optical communication. The phase conjugation principle, which applies the conjugate of the measured aberration via a deformable mirror, serves as the foundational correction mechanism. However, in horizontal long-range laser propagation, the distributed and intense nature of atmospheric turbulence, coupled with diffraction effects, gives rise to strong non-conjugate characteristics. Under these conditions, the measured wavefront often contains phase singularities, causing conventional phase conjugation to degrade severely or even exacerbate beam distortion when the Fresnel number is small, thus fundamentally limiting its effectiveness. To transcend this bottleneck, this paper proposes a machine learning-driven intelligent correction framework designed to identify an optimal compensation wavefront that outperforms the ideal phase-conjugate solution. Firstly, we systematically deconstruct the standard stochastic parallel gradient descent (SPGD) algorithm, identifying its inherent limitations, comprising susceptibility to local optima, slow convergence, and sensitivity to initial conditions, in the context of non-convex, multi-peaked optimization landscapes induced by strong turbulence. To address these, we introduce a suite of enhancements: high-order Zernike polynomial representation (up to 65 orders) to expand the solution space, a Kolmogorov-spectrum-based exponential decay perturbation model to align exploration with physical priors, integration of the Adam optimizer for adaptive moment estimation and learning rate adjustment, an optimal parameter rollback mechanism for ensuring monotonic performance non-degradation, and an early stopping strategy with adaptive learning rate decay for computational efficiency. Our comprehensive simulations, based on a multi-phase screen wave-optics propagation model on the EasyLaser platform, systematically evaluate the improved SPGD algorithm across varying turbulence strengths and Fresnel numbers. The results demonstrate that the improved SPGD achieves a performance enhancement factor of over 20% compared to ideal phase conjugation at a Fresnel number of 1.2 in moderate-to-strong turbulence, conclusively verifying the existence of superior non-conjugate correction wavefronts. However, its performance improvement margin narrows significantly in weaker diffraction regimes, exposing the inherent local-search limitations of the gradient-based SPGD framework. To achieve more robust global optimization, we further pioneer the application of a deep reinforcement learning (RL) approach based on the Soft Actor-Critic (SAC) algorithm. By formulating the phase correction task as a Markov decision process and integrating convolutional neural network-based policy and value networks, the RL agent learns a direct mapping from high-dimensional wavefront and far-field intensity images to optimal Zernike correction coefficients. The maximum entropy framework of SAC enables systematic global exploration, effectively escaping local optima that trap gradient-based methods. Remarkably, under strong turbulence conditions where both ideal phase conjugation and the improved SPGD algorithm fail (e.g., demonstrating a performance factor of only 1.04 by SPGD), the RL-driven controller achieves a performance enhancement factor of 4.53, restoring a near-diffraction-limited far-field spot. This study establishes a transformative paradigm for turbulence correction by fundamentally moving beyond the phase conjugation principle, confirming the practical feasibility and profound potential of reinforcement learning for complex, high-dimensional, continuous wavefront control in adaptive optics.