Photonic crystals have received widespread attention in the field of photonics due to their unique band structures, which can manipulate the propagation of light through periodic dielectric arrangements. Accurate prediction of these band structures is crucial for designing and optimizing photonic devices. However, traditional numerical simulation methods, such as plane wave expansion and finite element methods, are often limited by high computational complexity and long processing times. In this study, we explore the application of the vision transformer (ViT) model to predicting the band structures of photonic crystals efficiently and accurately. To further validate the superiority of the ViT model, we also conduct experiments by using CNN and MLP models on the same scale for band structure prediction. We first generate a dataset of photonic band structures by using traditional numerical simulations and then train the ViT model on this dataset. The ViT model demonstrates excellent learning capabilities, with the loss function value decreasing to as low as 4.42×10
–6 during training. The test results show that the average mean squared (MSE) error of the ViT model predictions is 3.46×10
–5, and the coefficient of determination (
R2) reaches 0.9996, indicating high prediction accuracy and good generalization capability. In contrast, the CNN and MLP models, despite being trained on the same dataset and having the same computational resource allocation, show higher MSE values and lower
R2 scores. This highlights the superior performance of the ViT model in predicting the band structures of photonic crystals. Our study shows that the ViT model can effectively predict the band structures of photonic crystals, providing a new and efficient prediction tool for relevant research and applications. This work is expected to advance the development of photonic device design by offering a rapid and accurate alternative to traditional methods.