搜索

x

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于FPGA的高能效轻量化残差脉冲神经网络处理器实现

侯悦 项水英 邹涛 黄志权 石尚轩 郭星星 张雅慧 郑凌 郝跃

引用本文:
Citation:

基于FPGA的高能效轻量化残差脉冲神经网络处理器实现

侯悦, 项水英, 邹涛, 黄志权, 石尚轩, 郭星星, 张雅慧, 郑凌, 郝跃

Implementation of a High-Efficiency, Lightweight Residual Spiking Neural Network Processor Based on FPGA

HOU Yue, XIANG Shuiying, ZOU Tao, HUANG Zhiquan, SHI Shangxuan, GUO Xingxing, ZHANG Yahui, ZHENG Ling, HAO Yue
Article Text (iFLYTEK Translation)
PDF
导出引用
  • 随着脉冲神经网络(Spiking Neural Network,SNN)在硬件部署优化方面的发展,基于现场可编程门阵列(Field-Programmable Gate Array,FPGA)的SNN处理器因其高效性与灵活性成为研究热点。然而,现有方法依赖多时间步训练和可重配置计算架构,增加了计算与存储压力,降低了部署效率。本文设计并实现了一种高能效、轻量化的残差SNN硬件加速器,采用算法与硬件协同设计策略,以优化SNN推理过程中的能效表现。在算法上,采用单时间步训练方法,并引入分组卷积和批归一化(Batch Normalization,BN)层融合技术,有效压缩网络规模至0.69M。此外,采用量化感知训练(Quantization-Aware Training,QAT),将网络参数精度限制为8 bit。在硬件设计上,本文通过层内资源复用提高FPGA资源利用率,采用全流水层间架构提升计算吞吐率,并利用块随机存取存储器(Block Random Access Memory,BRAM)存储网络参数和计算结果,以提高存储效率。实验表明,该处理器在CIFAR-10数据集上分类准确率达到87.11%,单张图片推理时间为3.98 ms,能效为183.5 FPS/W,较主流图形处理单元(Graphics Processing Unit,GPU)平台能效提升至2倍以上,与其它SNN处理器相比,推理速度至少提升了4倍,能效至少提升了5倍。
    With the advancements in hardware-optimized deployment of Spiking Neural Networks (SNNs), SNN processors based on Field-Programmable Gate Arrays (FPGAs) have become a research hotspot due to their efficiency and flexibility. However, existing methods rely on multi-timestep training and reconfigurable computing architectures, which increase computational and memory overhead, reducing deployment efficiency. This work presents a high-efficiency, lightweight residual SNN accelerator that couples algorithmic and hardware co-design to optimize inference energy efficiency. On the algorithm side, we employ single-timesteps training, integrate grouped convolutions, and fuse Batch Normalization (BN) layers, compressing the network to only 0.69 M parameters. Quantization-aware training (QAT) further constrains all weights and activations to 8-bit precision. On the hardware side, intra-layer resource reuse maximizes FPGA utilization, a fully pipelined cross-layer architecture boosts throughput, and on-chip Block RAM (BRAM) stores both network parameters and intermediate results to improve memory efficiency. Experimental results demonstrate that the proposed processor achieves an 87.11% classification accuracy on the CIFAR-10 dataset, with an inference time of 3.98 ms per image and an energy efficiency of 183.5 FPS/W. Compared to mainstream Graphics Processing Unit (GPU) platforms, it achieves over twice the energy efficiency. Furthermore, compared to other SNN processors, it achieves at least a 4×improvement in inference speed and a 5×improvement in energy efficiency.
  • [1]

    Shelhamer E, Long J, Darrell T 2016 IEEE Transactions on Pattern Analysis and Machine Intelligence 39 640

    [2]

    Redmon J, Divvala S, Girshick R, Farhadi A 2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, June 26- July 1, 2016 p779

    [3]

    Shi Y, Ou P, Zheng M, Tai H X, Wang Y H, Duan R N, Wu J 2024 Acta Phys. Sin. 73 149(in Chinese) [施岳,欧攀,郑明,邰含旭,王玉红,段若楠,吴坚 2024 物理学报 73 149]

    [4]

    Ying D W, Zhang S H, Deng S J, Wu H B 2023 Acta Phys. Sin. 72 83(in Chinese) [应大卫,张思慧,邓书金,武海斌 2023 物理学报 72 83]

    [5]

    Cao Z Q, Sai B, Lv X 2020 Acta Phys. Sin. 69 41(in Chinese) [曹自强,赛斌,吕欣 2020 物理学报69 41]

    [6]

    Maass W. 1997 Neural Networks 10 1659

    [7]

    Nunes J D, Carvalho M, Carneiro D, Cardoso J S 2022 IEEE Access, 10 60738

    [8]

    Wu C C, Zhou P J, Wang J J, Li G, Hu S G, Yu Q, Liu Y 2022 Acta Phys. Sin. 71 304(in Chinese) [武长春,周莆钧,王俊杰,李国,胡绍刚,于奇,刘洋 物理学报 2022 71 304]

    [9]

    Aliyev I, Svoboda K, Adegbija T, Fellous J M 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Kuala Lumpur, December 16-19, 2024 p413

    [10]

    Merolla P A, Arthur J V, Alvarez-Icaza R, Cassidy A S, Sawada J, Akopyan F, Jackson B L, Imam N, Guo C, Nakamura Y, Brezzo B, Vo I, Esser S K, Appuswamy R, Taba B, Amir A, Flickner M D, Risk W P, Manohar R, Modha D S 2014 Science 345 668

    [11]

    Davies M, Srinivasa N, Lin T H, Chinya G, Cao Y, Choday S H, Dimou G, Joshi P, Imam N, Jain S, Liao Y, Lin C K, Lines A, Liu R, Mathaikutty D, McCoy S, Paul A, Tse J, Venkataramanan G, Weng Y H, Wild A, Yang Y, Wang H 2018 IEEE Micro 38 82

    [12]

    He L, Wang K, Wu C, Tao Z F, Shi X, Miao S Y, Lu S Q 2025 SCIENTIA SINICA Informationis 55 796(in Chinese) [何磊,王堃,吴晨,陶卓夫,时霄,苗斯元,陆少强 中国科学:信息科学 2025 55 796]

    [13]

    Gdaim S, Mtibaa A 2025 Journal of Real-Time Image Processing 22 67.

    [14]

    Yan F, Zheng X W, Meng C, Li C, Liu Y P Modern Electronics Technique 2025 48 151(in Chinese) [严飞,郑绪文,孟川,李楚,刘银萍 现代电子技术 2025 48 151]

    [15]

    Liu Y, Chen Y, Ye W, Gui Y 2022 IEEE Transactions on Circuits and Systems I: Regular Papers 69 2553

    [16]

    Ye W, Chen Y, Liu Y 2022 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42 448

    [17]

    Panchapakesan S, Fang Z, Li J 2022 ACM Transactions on Reconfigurable Technology and Systems 15 1

    [18]

    Chen Q, Gao C, Fu Y 2022 IEEE Transactions on Very Large Scale Integration (VLSI) Systems 30 1425

    [19]

    Wang S Q, Wang L, Deng Y, Yang Z J, Guo S S, Kang Z Y, Guo Y F, Xu W X 2020 Journal of Computer Science and Technology 35 475

    [20]

    Biswal M R, Delwar T S, Siddique A, Behera P, Choi Y, Ryu J Y 2022 Sensors 22 8694

    [21]

    Gerlinghoff D, Wang Z, Gu X, Goh R S M, Luo T 2021 IEEE Transactions on Parallel and Distributed Systems 33 3207

    [22]

    Chen Y, Liu Y, Ye W, Chang C C 2023 IEEE Transactions on Circuits and Systems II: Express Briefs 70 3634

    [23]

    Chen Y, Ye W, Liu Y, Zhou H 2024 IEEE Transactions on Circuits and Systems I: Regular Papers 71 6482

    [24]

    Aliyev I, Lopez J, Adegbija T 2024 arXiv:2411.15409[CS-Ar]

    [25]

    Stein R B, Hodgkin A L1967 Proceedings of the Royal Society of London. Series B. Biological Sciences 167 64

    [26]

    Eshraghian J K, Ward M, Neftci E O, Wang X, Lenz G, Dwivedi G 2023 Proceedings of the IEEE 111 1016

    [27]

    Liu H, Chai H F, Sun Q, Yun X, Li X 2023 Engineering 25 61(in Chinese)[刘浩,柴洪峰,孙权,云昕,李鑫 2023 中国工程科学 25 61]

    [28]

    Krizhevsky A, Sutskever I, Hinton G E 2012 Advances in Neural Information Processing Systems 25

    [29]

    Huang G, Liu S, Van der Maaten L, Q. Weinberger K 2018 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, June 18-22, 2018 p2752

    [30]

    Krizhevsky A, Hinton G https://www.cs.toronto.edu/~kriz/cifar.html [2025-3-22]

    [31]

    Ioffe S, Szegedy C 2015 arXiv:1502.03167[CS-LG]

    [32]

    Zheng J W 2021 M.S. Dissertation (Xi’an: Xidian University)(in Chinese)[郑俊伟 2021 硕士学位论文 (西安:西安电子科技大学)]

    [33]

    Jacob B, Kligys S, Chen B, Tang M, Howard A, Adam H, Kalenichenko D 2018 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Sale Lake City, June 18-22, 2018 p2704

    [34]

    Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y 2016 arXiv:1606.06160[cs.NE]

    [35]

    Liu Z, Cheng K T, Huang D, Xing E P, Shen Z 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, June 19-24, 2022 p4942

    [36]

    Chen Y H, Krishna T, Emer J S, Sze V 2016 IEEE Journal of Solid-State Circuits 52 127

  • [1] 王永博, 唐曦, 赵乐涵, 张鑫, 邓进, 吴正茂, 杨俊波, 周恒, 吴加贵, 夏光琼. 基于Si3N4微环混沌光频梳的Tbit/s并行实时物理随机数方案. 物理学报, doi: 10.7498/aps.73.20231913
    [2] 全旭, 邱达, 孙智鹏, 张贵重, 刘嵩. 一个具有共存吸引子的四阶混沌系统动力学分析及FPGA实现. 物理学报, doi: 10.7498/aps.72.20230795
    [3] 张贵重, 全旭, 刘嵩. 一个具有超级多稳定性的忆阻混沌系统的分析与FPGA实现. 物理学报, doi: 10.7498/aps.71.20221423
    [4] 张亚君, 蔡佳林, 乔亚, 曾中明, 袁喆, 夏钶. 基于磁性隧道结的群体编码实现无监督聚类. 物理学报, doi: 10.7498/aps.71.20220252
    [5] 王童, 温娟, 吕康, 陈健中, 汪亮, 郭新. 仿生生物感官的感存算一体化系统. 物理学报, doi: 10.7498/aps.71.20220281
    [6] 武长春, 周莆钧, 王俊杰, 李国, 胡绍刚, 于奇, 刘洋. 基于忆阻器的脉冲神经网络硬件加速器架构设计. 物理学报, doi: 10.7498/aps.71.20220098
    [7] 康志伟, 刘拓, 刘劲, 马辛, 陈晓. 基于自归一化神经网络的脉冲星候选体选择. 物理学报, doi: 10.7498/aps.69.20191582
    [8] 吕晏旻, 闵富红. 基于现场可编程逻辑门阵列的磁控忆阻电路对称动力学行为分析. 物理学报, doi: 10.7498/aps.68.20190453
    [9] 王传福, 丁群. 基于混沌系统的SM4密钥扩展算法. 物理学报, doi: 10.7498/aps.66.020504
    [10] 许雅明, 王丽丹, 段书凯. 磁控二氧化钛忆阻混沌系统及现场可编程逻辑门阵列硬件实现. 物理学报, doi: 10.7498/aps.65.120503
    [11] 郭业才, 周林锋. 基于脉冲耦合神经网络和图像熵的各向异性扩散模型研究. 物理学报, doi: 10.7498/aps.64.194204
    [12] 邵书义, 闵富红, 吴薛红, 张新国. 基于现场可编程逻辑门阵列的新型混沌系统实现. 物理学报, doi: 10.7498/aps.63.060501
    [13] 张旭东, 朱萍, 谢小平, 何国光. 混沌神经网络的动态阈值控制. 物理学报, doi: 10.7498/aps.62.210506
    [14] 潘晶, 齐娜, 薛兵兵, 丁群. 基于现场可编程门阵列的手机短信息混沌加密系统设计方案及硬件实现. 物理学报, doi: 10.7498/aps.61.180504
    [15] 刘 强, 方锦清, 赵耿, 李永. 基于FPGA技术的混沌加密系统研究. 物理学报, doi: 10.7498/aps.61.130508
    [16] 高博, 余学峰, 任迪远, 李豫东, 崔江维, 李茂顺, 李明, 王义元. 静态存储器型现场可编程门阵列总剂量辐射损伤效应研究. 物理学报, doi: 10.7498/aps.60.036106
    [17] 周武杰, 禹思敏. 基于现场可编程门阵列技术的混沌数字通信系统——设计与实现. 物理学报, doi: 10.7498/aps.58.113
    [18] 周武杰, 禹思敏. 基于IEEE-754标准和现场可编程门阵列技术的混沌产生器设计与实现. 物理学报, doi: 10.7498/aps.57.4738
    [19] 何国光, 曹志彤. 混沌神经网络的控制. 物理学报, doi: 10.7498/aps.50.2103
    [20] 马余强, 张玥明, 龚昌德. Hopfield神经网络模型的恢复特性. 物理学报, doi: 10.7498/aps.42.1356
计量
  • 文章访问数:  70
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 上网日期:  2025-05-27

/

返回文章
返回