Learning long sequences in spiking neural networks

Tang, S., Dunnmon, J. A., Liangqiong, Q., Saab, K. K., Baykaner, T., Lee-Messer, C. & Rubin, D. L. Modeling multivariate biosignals with graph neural networks and structured state space models. In Conference on health, inference, and learning, 50–71 (PMLR, 2023).Zhou, W., Jiang, Y. E., Cui, P., Wang, T., Xiao, Z., Hou, Y., Cotterell, R. & Sachan, M. Recurrentgpt: Interactive generation of (arbitrarily) long text. arXiv preprint arXiv:2305.13304 (2023).Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F. & Liang, P. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172 (2023).Lipton, Z. C., Berkowitz, J. & Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015).Maass, W. Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10(9), 1659–1671 (1997).Article 

Google Scholar 
Hasler, J. Special report: Can we copy the brain?-a road map for the artificial brain. IEEE Spectr. 54(6), 46–50 (2017).Article 
ADS 

Google Scholar 
McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995).Article 
PubMed 

Google Scholar 
Shen, J., Ni, W., Qi, X. & Tang, H. Efficient spiking neural networks with sparse selective activation for continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence38, 611–619 (2024).Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning, 1310–1318 (Pmlr, 2013).Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).Article 
CAS 
PubMed 

Google Scholar 
Yarga, S. Y. A. & Sean U. N. W. Accelerating snn training with stochastic parallelizable spiking neurons. In 2023 international joint conference on neural networks (IJCNN), 1–8, (2023). https://doi.org/10.1109/IJCNN54540.2023.10191884.Orvieto, A., Smith, S. L., Gu, A., Fernando, A., Gulcehre, C., Pascanu, R. & De, S. Resurrecting recurrent neural networks for long sequences. arXiv preprint arXiv:2303.06349 (2023).Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A. & Kavukcuoglu, K. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).Diehl, P. U., Neil, D., Binas, J., Cook, M., Liu, S. -C. Pfeiffer, M. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN), 1–8. (IEEE, 2015).Davidson, S. & Furber, S. B. Comparison of artificial and spiking neural networks on digital hardware. Front. Neurosci. 15, 345 (2021).Article 

Google Scholar 
Garg, I., Chowdhury, S. S. & Roy, K. Dct-snn: Using dct to distribute spatial information over time for learning low-latency spiking neural networks. arXiv preprint arXiv:2010.01795 (2020).Liu, F., Zhao, W., Chen, Y., Wang, Z. & Jiang, L. Spikeconverter: An efficient conversion framework zipping the gap between artificial neural networks and spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence36, 1692–1701 (2022).Eshraghian, J. K., Ward, M., Neftci, E. O., Wang, X., Lenz, G., Dwivedi, G., Bennamoun, M., Jeong, D. S. & Lu, W. D. Training spiking neural networks using lessons from deep learning. In Proceedings of the IEEE, (2023).Neftci, E. O., Mostafa, H. & Zenke, F. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process. Mag. 36(6), 51–63 (2019).Article 

Google Scholar 
Yujie, W., Deng, L., Li, G. & Shi, L. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci. 12, 323875 (2018).
Google Scholar 
Malcom, K. & Casco-Rodriguez, J. A comprehensive review of spiking neural networks: Interpretation, optimization, efficiency, and best practices. arXiv preprint arXiv:2303.10780 (2023).Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30, (2017).Zeyer, A., Bahar, P., Irie, K., Schlüter, R. & Ney, H. A comparison of transformer and lstm encoder decoder models for asr. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 8–15 (IEEE, 2019).Min, B. et al. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv. 56(2), 1–40 (2023).Article 

Google Scholar 
Davies, M. et al. Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE 109(5), 911–934 (2021).Article 

Google Scholar 
Li, G. et al.Wolfgang Maass (Brain-Inspired Computing: A Systematic Survey and Future Trends, 2023)Zhou, Z., Zhu, Y., He, C., Wang, Y., Yan, S., Tian, Y. & Yuan, L. Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425 (2022).Zhu, R.-J., Zhao, Q. & Eshraghian, J. K. Spikegpt: Generative pre-trained language model with spiking neural networks. arXiv preprint arXiv:2302.13939 (2023).Yao, M., Hu, J., Zhou, Z., Yuan, L., Tian, Y., Xu, B. & Li, G. Spike-driven transformer. arXiv preprint arXiv:2307.01694 (2023).Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: A survey. arXiv preprint arXiv:cs.LG/2009.06732, (2020b).Strubell, E., Ganesh, A. & McCallum, A. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243 (2019).Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Cao, H., Cheng, X., Chung, M., Grella, M., Kranthi Kiran, G.V. et al. Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048 (2023).Voelker, A., Kajić, I. & Eliasmith, C. Legendre memory units: Continuous-time representation in recurrent neural networks. Advances in neural information processing systems, 32, (2019).Chilkuri, N. R. & Eliasmith, C. Parallelizing legendre memory unit training. In International conference on machine learning, 1898–1907. (PMLR, 2021).Albert, G. et al. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Adv. Neural. Inf. Process. Syst. 34, 572–585 (2021).
Google Scholar 
Gu, A., Goel, K. & Ré, C. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021a).Albert, G., Dao, T., Ermon, S., Rudra, A. & Ré, C. Hippo: Recurrent memory with optimal polynomial projections. Adv. Neural. Inf. Process. Syst. 33, 1474–1487 (2020).
Google Scholar 
Tay, Y, Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J., Yang, L., Ruder, S. & Metzler, D. Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006 (2020a).Albert, G., Goel, K., Gupta, A. & Ré, C. On the parameterization and initialization of diagonal state space models. Adv. Neural. Inf. Process. Syst. 35, 35971–35983 (2022).
Google Scholar 
Gupta, A., Albert, G. & Berant, J. Diagonal state spaces are as effective as structured state spaces. Adv. Neural. Inf. Process. Syst. 35, 22982–22994 (2022).
Google Scholar 
Eissa, S., Stuijk, S. & Corporaal, H. Hardware approximation of exponential decay for spiking neural networks. In 2021 IEEE 3rd international conference on artificial intelligence circuits and systems (AICAS), 1–4. (IEEE, 2021).Huang, Y., Xu, J., Jiang, Z., Lai, J., Li, Z., Yao, Y., Chen, T., Yang, L., Xin, Z. & Ma, X. Advancing transformer architecture in long-context large language models: A comprehensive survey. arXiv preprint arXiv:2311.12351 (2023).Zaheer, M. et al. Big bird: Transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020).
Google Scholar 
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A. & Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489 (2016).Ma, X., Zhou, C., Kong, X., He, J., Gui, L., Neubig, G., May, J. & Zettlemoyer, L. Mega: Moving average equipped gated attention. arXiv preprint arXiv:2209.10655 (2022).Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V. & Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020).
Google Scholar 
Anil, C. et al. Exploring length generalization in large language models. Adv. Neural. Inf. Process. Syst. 35, 38546–38556 (2022).
Google Scholar 
Yao, M., Gao, H., Zhao, G., Wang, D., Lin, Y., Yang, Z. & Li, G. Temporal-wise attention spiking neural networks for event streams classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10221–10230 (2021).Du, Y., Liu, X. & Chua, Y. Spiking structured state space model for monaural speech enhancement. arXiv preprint arXiv:2309.03641 (2023).Fang, W., Yu, Z., Zhou, Z., Chen, Y., Ma, Z., Masquelier, T. & Tian, Y. Parallel spiking neurons with high efficiency and long-term dependencies learning ability. arXiv preprint arXiv:2304.12760 (2023).Hermans, M. & Schrauwen, B. Memory in linear recurrent neural networks in continuous time. Neural Netw. 23(3), 341–355 (2010).Article 
PubMed 

Google Scholar 
Dauphin, Y. N., Fan, A., Auli, M. & Grangier, D. Language modeling with gated convolutional networks. Computation and Language, (2016).Zhu, C., Han, S., Mao, H. & Dally, W. J. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).Gulcehre, C., Moczulski, M., Denil, M. & Bengio, Y. Noisy activation functions. In International conference on machine learning, 3059–3068 (PMLR, 2016).Le, Q. V., Jaitly, N. & Hinton, G. E. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015).Bellec, G., Salaj, D., Subramoney, A., Legenstein, R. & Maass, W. Long short-term memory and learning-to-learn in networks of spiking neurons. Adv. Neural Inf. Process. Syst.31, (2018).Hasani, R., Lechner, M., Wang, T.-H., Chahine, M., Amini, A. & Rus, D. Liquid structural state-space models. arXiv preprint arXiv:2209.12951 (2022).Smith, J. T. H., Warrington, A. & Linderman, S. W. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022).Nangia, N. & Bowman, S. R. Listops: A diagnostic dataset for latent tree learning. arXiv preprint arXiv:1804.06028 (2018).Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images (2009).Yin, B., Corradi, F. & Bohté, S. M. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nat. Mach. Intell. 3(10), 905–913 (2021).Article 

Google Scholar 
Eshraghian, Jason K, & Lu, Wei D. The fine line between dead neurons and sparsity in binarized spiking neural networks. arXiv preprint arXiv:2201.11915 (2022).Douglas, S. C, & Yu, J. Why Relu units sometimes die: analysis of single-unit error backpropagation in neural networks. In 2018 52nd Asilomar conference on signals, systems, and computers, 864–868. (IEEE, 2018).Roberts, D. A., Yaida, S. & Hanin, B. The principles of deep learning theory. Cambridge University Press Cambridge, MA, USA, (2022).Lemaire, E., Cordone, L., Castagnetti, A., Novac, P.-E., Courtois, J. & Miramond, B. An analytical estimation of spiking neural networks energy efficiency. In International Conference on Neural Information Processing, 574–587 (Springer, 2022).Horowitz, M. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC), 10–14 (IEEE, 2014).Herranz-Celotti, L. & Rouat, J. Surrogate gradients design. arXiv preprint arXiv:2202.00282 (2022).Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 315–323 (JMLR Workshop and Conference Proceedings, 2011).Orchard, G., Frady, E. P., Rubin, B. D., Daniel, S., S., Shrestha, S. B., Sommer, F. T. & Davies, M. Efficient neuromorphic signal processing with loihi 2. In 2021 IEEE Workshop on Signal Processing Systems (SiPS), 254–259 (IEEE, 2021).Fang, W. et al. Deep residual learning in spiking neural networks. Adv. Neural. Inf. Process. Syst. 34, 21056–21069 (2021).
Google Scholar 
Chen, G., Peng, P., Li, G. & Tian, Y. Training full spike neural networks via auxiliary accumulation pathway. arXiv preprint arXiv:2301.11929 (2023).Dao, T., Fu, D. Y, Saab, K. K, Thomas, A. W, Rudra, A. & Ré, C. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052 (2022).Poli, M., Massaroli, S., Nguyen, E., Fu, D. Y., Dao, T., Baccus, S., Bengio, Y., Ermon, S. & Ré, C. Hyena hierarchy: Towards larger convolutional language models. arXiv preprint arXiv:2302.10866 (2023).Gu, A., & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).OpenAI, R. Gpt-4 technical report. 2303–08774, (2023).Warden, P. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).Goel, Karan, Gu, Albert, Donahue, Chris, & Ré, Christopher. It’s raw! audio generation with state-space models. In International conference on machine learning, 7616–7633 (PMLR, 2022).Kuehne, H., Jhuang, H., Garrote, E., Poggio, T. & Serre, T. Hmdb: A large video database for human motion recognition. In 2011 International conference on computer vision, 2556–2563 (IEEE, 2011).Nguyen, E., Goel, K., Gu, A., Downs, G. W., Shah, P., Dao, T., Baccus, S. A. & Ré, C. S4nd: Modeling images and videos as multidimensional signals using state spaces. arXiv preprint arXiv:2210.06583 (2022).Yan, J. N., Gu, J. & Rush, A. M. Diffusion models without attention. arXiv preprint arXiv:2311.18257 (2023).Yang, S., Wang, H. & Chen, B. Sibols: robust and energy-efficient learning for spike-based machine intelligence in information bottleneck framework. IEEE Transactions on Cognitive and Developmental Systems, (2023b).Yang, Shuangming, & Chen, Badong. Snib: improving spike-based machine learning using nonlinear information bottleneck. IEEE Transactions on Systems, Man, and Cybernetics: Systems, (2023b).Yang, S. & Chen, B. Effective surrogate gradient learning with high-order information bottleneck for spike-based machine intelligence. IEEE transactions on neural networks and learning systems, (2023a).Yang, S. et al. Spike-driven multi-scale learning with hybrid mechanisms of spiking dendrites. Neurocomputing 542, 126240 (2023).Article 

Google Scholar 
Liu, F. et al. Sstdp: Supervised spike timing dependent plasticity for efficient spiking neural network training. Front. Neurosci. 15, 756876 (2021).Article 
PubMed 
PubMed Central 

Google Scholar 
Shen, J., Xu, Q., Liu, J. K., Wang, Y., Pan, G. & Tang, H. Esl-snns: An evolutionary structure learning strategy for spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 86–93 (2023).Vardasbi, A., Pires, T. P., Schmidt, R. M. & Peitz, S. State spaces aren’t enough: Machine translation needs attention. In EAMT, (2023). arXiv:2304.12776.Fu, D. Y., Dao, T., Saab, K. K., Thomas, A. W., Rudra, A. & Ré, C. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052 (2022).Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3(12), 1023–1032 (2021).Article 

Google Scholar 
Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. Neuronal dynamics: From single neurons to networks and models of cognition (Cambridge University Press, 2014).Book 

Google Scholar 
Zhu, L., Li, J., Wang, X., Huang, T. & Tian, Y. Neuspike-net: High speed video reconstruction via bio-inspired neuromorphic cameras. In Proceedings of the IEEE/CVF international conference on computer vision, 2400–2409 (2021).Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).Han, S., Pool, J., Tran, J. & Dally, W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, (2015).Shen, J., Liu, J. K. & Wang, Y. Dynamic spatiotemporal pattern recognition with recurrent spiking neural network. Neural Comput. 33(11), 2971–2995 (2021).MathSciNet 
PubMed 

Google Scholar 
Hunger, R. Floating point operations in matrix-vector calculus, Vol. 2019 (Munich University of Technology, Inst. for Circuit Theory and Signal, 2005).Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y. & Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, 142–150, Portland, Oregon, USA, June (2011). Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.Zhu, Y., Fang, W., Xie, X., Huang, T. & Yu, Z. Exploring loss functions for time-based training strategy in spiking neural networks. Adv. Neural Inf. Process. Syst., 36, (2024).Kitaev, N., Kaiser, Ł. & Levskaya, A. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020).Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F. Transformers are RNNS: Fast autoregressive transformers with linear attention. In International conference on machine learning, 5156–5165 (PMLR, 2020).Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L. et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020).Lee-Thorp, J., Ainslie, J., Eckstein, I. & Ontanon, S. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824 (2021).Xiong, Y. et al. Nyströmformer: A nyström-based algorithm for approximating self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence35, 14138–14148 (2021).Zhu, Z. & Soricut, R. H-transformer-1d: Fast one-dimensional hierarchical attention for sequences. arXiv preprint arXiv:2107.11906 (2021).Romero, D. W., Knigge, D. M., Gu, A., Bekkers, E. J., Gavves, E., Tomczak, J. M. & Hoogendoorn, M. Towards a general purpose cnn for long range dependencies in \(n\) d. arXiv preprint arXiv:2206.03398 (2022).

Hot Topics

Related Articles