光合及目
1lux.xyz
← 报道
创始人访谈Nature(封面论文)· 2019-08-01

面向人工通用智能的异构天机芯片架构

赵明国作为共同第一作者,与施路平教授团队合作,在Nature封面发表天机芯片成果,是中国AI领域首篇Nature正刊封面文章,报道量达1400,是通常水平的14倍。

The datasets that we used for benchmarks are publicly available, as described in the text and the relevant references38,41,42,44,45,46. The training methods are provided in the relevant references36,37,47,54. The experimental setups for simulations and measurements are detailed in the text. Other data that support the findings of this study are available from the corresponding author on reasonable request.

Code availability

The codes used for the software tool chain and the bicycle demonstration are available from the corresponding author on reasonable request.

References

  1. Goertzel, B. Artificial general intelligence: concept, state of the art, and future prospects. J. Artif. Gen. Intell. 5, 1–48 (2014).

    Article  Google Scholar 

  2. Benjamin, B. V. et al. Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc. IEEE 102, 699–716 (2014).

    Article  Google Scholar 

  3. Merolla, P. A. et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 668–673 (2014).

    Article  ADS  CAS  Google Scholar 

  4. Furber, S. B. et al. The SpiNNaker project. Proc. IEEE 102, 652–665 (2014).

    Article  Google Scholar 

  5. Schemmel, J. et al. A wafer-scale neuromorphic hardware system for large-scale neural modeling. In Proc. 2010 IEEE Int. Symposium on Circuits and Systems 1947–1950 (IEEE, 2010).

  6. Davies, M. et al. Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 82–99 (2018).

    Article  Google Scholar 

  7. Chen, Y.-H. et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2017).

    Article  ADS  Google Scholar 

  8. Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual Int. Symposium on Computer Architecture 1–12 (IEEE, 2017).

  9. Markram, H. The blue brain project. Nat. Rev. Neurosci. 7, 153–160 (2006).

    Article  CAS  Google Scholar 

  10. Izhikevich, E. M. Simple model of spiking neurons. IEEE Trans. Neural Netw. 14, 1569–1572 (2003).

    Article  CAS  Google Scholar 

  11. Eliasmith, C. et al. A large-scale model of the functioning brain. Science 338, 1202–1205 (2012).

    Article  ADS  CAS  Google Scholar 

  12. Song, S., Miller, K. D. & Abbott, L. F. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat. Neurosci. 3, 919–926 (2000).

    Article  CAS  Google Scholar 

  13. Gusfield, D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology (Cambridge Univ. Press, 1997).

  14. Qiu, G. Modelling the visual cortex using artificial neural networks for visual image reconstruction. In Fourth Int. Conference on Artificial Neural Networks 127–132 (Institution of Engineering and Technology, 1995).

  15. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  ADS  CAS  Google Scholar 

  16. Russell, S. J. & Norvig, P. Artificial Intelligence: A Modern Approach (Pearson Education, 2016).

  17. He, K. et al. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  18. Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012).

    Article  ADS  Google Scholar 

  19. Young, T. et al. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13, 55–75 (2018).

    Article  Google Scholar 

  20. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  ADS  CAS  Google Scholar 

  21. Lake, B. M. et al. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).

    Article  Google Scholar 

  22. Hassabis, D. et al. Neuroscience-inspired artificial intelligence. Neuron 95, 245–258 (2017).

    Article  CAS  Google Scholar 

  23. Marblestone, A. H., Wayne, G. & Kording, K. P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10, 94 (2016).

    Article  Google Scholar 

  24. Lillicrap, T. P. et al. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7, 13276 (2016).

    Article  ADS  CAS  Google Scholar 

  25. Roelfsema, P. R. & Holtmaat, A. Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 19, 166–180 (2018).

    Article  CAS  Google Scholar 

  26. Ullman, S. Using neuroscience to develop artificial intelligence. Science 363, 692–693 (2019).

    Article  ADS  Google Scholar 

  27. Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Int. Conference on Machine Learning (eds Bach, F. & Blei, D.) 2048–2057 (International Machine Learning Society, 2015).

  28. Zhang, B., Shi, L. & Song, S. in Brain-Inspired Robotics: The Intersection of Robotics and Neuroscience (eds Sanders, S. & Oberst, J.) 4–9 (Science/AAAS, 2016).

  29. Sabour, S., Frosst, N. & Hinton, G. E. Dynamic routing between capsules. Adv. Neural Inf. Processing Syst. 30, 3856–3866 (2017).

    Google Scholar 

  30. Mi, Y. et al. Spike frequency adaptation implements anticipative tracking in continuous attractor neural networks. Adv. Neural Inf. Processing Syst. 27, 505–513 (2014).

    Google Scholar 

  31. Herrmann, M., Hertz, J. & Prügel-Bennett, A. Analysis of synfire chains. Network 6, 403–414 (1995).

    Article  Google Scholar 

  32. London, M. & Häusser, M. Dendritic computation. Annu. Rev. Neurosci. 28, 503–532 (2005).

    Article  CAS  Google Scholar 

  33. Imam, N. & Manohar, R. Address-event communication using token-ring mutual exclusion. In 2011 17th IEEE Int. Symposium on Asynchronous Circuits and Systems 99–108 (IEEE, 2011).

  34. Deng, L. et al. GXNOR-Net: training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. Neural Netw. 100, 49–58 (2018).

    Article  Google Scholar 

  35. Han, S. et al. EIE: efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual Int. Symposium on Computer Architecture 243–254 (IEEE, 2016).

  36. Diehl, P. U. et al. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 Int. Joint Conference on Neural Networks 1–8 (IEEE, 2015).

  37. Wu, Y. et al. Spatio-temporal backpropagation for training high-performance spiking neural networks. Front. Neurosci. 12, 331 (2018).

    Article  Google Scholar 

  38. Orchard, G. et al. Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 9, 437 (2015).

    Article  Google Scholar 

  39. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).

    Google Scholar 

  40. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Int. Conference on Learning Representations; preprint at https://arxiv.org/pdf/1409.1556.pdf (2015).

  41. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  42. LeCun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  43. Courbariaux, M., Bengio, Y. & David, J.-P. BinaryConnect: training deep neural networks with binary weights during propagations. Adv. Neural Inf. Processing Syst. 28, 3123–3131 (2015).

    Google Scholar 

  44. Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. MSc thesis, Univ. Toronto (2009).

  45. Merity, S. et al. Pointer sentinel mixture models. In Int. Conference on Learning Representations; preprint at https://arxiv.org/abs/1609.07843 (2017).

  46. Krakovna, V. & Doshi-Velez, F. Increasing the interpretability of recurrent neural networks using hidden Markov models. Preprint at https://arxiv.org/abs/1606.05320 (2016).

  47. Wu, S. et al. Training and inference with integers in deep neural networks. In Int. Conference on Learning Representations; preprint at https://arxiv.org/abs/1802.04680 (2018).

  48. Paszke, A. et al. Automatic differentiation in Pytorch. In Proc. NIPS Autodiff Workshop https://openreview.net/pdf?id=BJJsrmfCZ (2017).

  49. Narang, S. & Diamos, G. Baidu DeepBench. https://github.com/baidu-research/DeepBench (2017).

  50. Fowers, J. et al. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM/IEEE 45th Annual Int. Symposium on Computer Architecture 1–14 (IEEE, 2018).

  51. Xu, M. et al. HMM-based audio keyword generation. In Advances in Multimedia Information Processing – PCM 2004, Vol. 3333 (eds Aizawa, K. et al.) 566–574 (Springer, 2004).

  52. Mathis, A., Herz, A. V. & Stemmler, M. B. Resolution of nested neuronal representations can be exponential in the number of neurons. Phys. Rev. Lett. 109, 018103 (2012).

    Article  ADS  Google Scholar 

  53. Gerstner, W. et al. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition (Cambridge Univ. Press, 2014).

  54. Liang, D. & Indiveri, G. Robust state-dependent computation in neuromorphic electronic systems. In IEEE Biomedical Circuits and Systems Conference 1–4 (IEEE, 2017).

  55. Akopyan, F. et al. TrueNorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput. Aided Des. Integrated Circ. Syst. 34, 1537–1557 (2015).

    Article  Google Scholar 

  56. Han, S. et al. ESE: efficient speech recognition engine with sparse LSTM on FPGA. In Proc. 2017 ACM/SIGDA Int. Symposium on Field-Programmable Gate Arrays 75–84 (ACM, 2017).

Download references

Acknowledgements

We thank B. Zhang, R. S. Williams, J. Zhu, J. Guan, X. Zhang, W. Dou, F. Zeng and X. Hu for thoughtful discussions; L. Tian, Q. Zhao, M. Chen, J. Feng, D. Wang, X. Lin, H. Cui, Y. Hu and Y. Yu contributing to experiments; H. Xu for coordinating experiments; and MLink for design assistance. This work was supported by projects of the National Natural Science Foundation of China (NSFC; 61836004, 61327902 and 61475080); the Brain-Science Special Program of Beijing (grant Z181100001518006); and the Suzhou-Tsinghua innovation leading program (2016SZ0102).

Author information

Author notes

  1. These authors contributed equally: Jing Pei, Lei Deng, Sen Song, Mingguo Zhao, Youhui Zhang, Shuang Wu, Guanrui Wang

Authors and Affiliations

  1. Department of Precision Instruments, Center for Brain-Inspired Computing Research (CBICR), Optical Memory National Engineering Research Center, Tsinghua University, Beijing, China

    Jing Pei, Lei Deng, Shuang Wu, Guanrui Wang, Zhe Zou, Wei He, Yujie Wu, Zheyu Yang, Cheng Ma, Guoqi Li, Huanglong Li & Luping Shi

  2. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China

    Jing Pei, Shuang Wu, Guanrui Wang, Zhe Zou, Wei He, Yujie Wu, Zheyu Yang, Cheng Ma, Guoqi Li, Huanglong Li & Luping Shi

  3. Laboratory of Brain and Intelligence, Department of Biomedical Engineering, CBICR, Tsinghua University, Beijing, China

    Sen Song

  4. IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China

    Sen Song

  5. Department of Automation, CBICR, Tsinghua University, Beijing, China

    Mingguo Zhao & Feng Chen

  6. Department of Computer Science and Technology, CBICR, Tsinghua University, Beijing, China

    Youhui Zhang & Wentao Han

  7. Lynxi Technologies, Beijing, China

    Zhenzhi Wu

  8. Institute of Microelectronics, CBICR, Tsinghua University, Beijing, China

    Ning Deng & Huaqiang Wu

  9. State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China

    Si Wu

  10. Department of Electronic Engineering, CBICR, Tsinghua University, Beijing, China

    Yu Wang

  11. Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore, Singapore

    Rong Zhao

  12. Department of Electrical and Computer Engineering, University of California Santa Barbara, Santa Barbara, CA, USA

    Yuan Xie

Authors

  1. Jing Pei
  2. Lei Deng
  3. Sen Song
  4. Mingguo Zhao
  5. Youhui Zhang
  6. Shuang Wu
  7. Guanrui Wang
  8. Zhe Zou
  9. Zhenzhi Wu
  10. Wei He
  11. Feng Chen
  12. Ning Deng
  13. Si Wu
  14. Yu Wang
  15. Yujie Wu
  16. Zheyu Yang
  17. Cheng Ma
  18. Guoqi Li
  19. Wentao Han
  20. Huanglong Li
  21. Huaqiang Wu
  22. Rong Zhao
  23. Yuan Xie
  24. Luping Shi

Contributions

J.P., L.D., S.S., M.Z., Y.Z., Shuang Wu and G.W. were in charge of, respectively, the principles of chip design, chip design, the principles of neuron computing, the unmanned bicycle system, software, implementation of Tianjic in the unmanned bicycle system, and chip testing. J.P., L.D., G.W., Z.W. and Y.Z. carried out chip development. Shuang Wu, G.W., Z.Z., Z.Y. and Yujie Wu worked on the unmanned bicycle experiment. Y.Z. and W. Han worked on software development. Yujie Wu, Shuang Wu and G.L. developed the algorithm. J.P., L.D., S.S., Si Wu, C.M., F.C., W. He, R.Z. and L.S. contributed to the analysis and interpretation of results. All of the authors contributed to discussion of architecture design principles. L.D., W. He, R.Z., S.S., Z.W. and L.S. wrote the manuscript with input from all authors. L.S. proposed the concept of hybrid architecture and supervised the whole project.

Corresponding author

Correspondence to Luping Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review information Nature thanks Meng-Fan (Marvin) Chang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Fig. 1 Overview of the FCore architecture.

We adopted a fully digital design. The axon module acts as a data buffer to store the inputs and the outputs. Synapses are designed to store on-chip weights and are pinned close to the dendrite for better memory locality. The dendrite is an integration engine that contains multipliers and accumulators. The soma is a computation unit for neuronal transformations. IntraFCore and interFCore communications are wired by a router, which supports arbitrary topology. Act_Fun, activation function; A/s, activation (ANN mode)/spike (SNN mode); B_L, bias/leakage; In BUF, input buffer; Inhibit Reg, inhibition register; In(Out)_L/E/W/S/N, local/eastern/western/southern/northern input(output); Lat_Acc, lateral accumulation; MEM, memory; MUX, multiplexer; Out_Copy, output copy; Out_Trans, output transmission; P_MEM, parameter memory; Spike_Gen, spike generator; V_MEM, membrane potential memory; X & Index, axon output and weight index. The numbers in or above memories indicate memory size; ‘b’ represents bit(s).

Extended Data Fig. 2 Fabrication of the Tianjic chip and testing boards.

a, Chip layout and images of the Tianjic chip. b, Testing boards equipped with a single Tianjic chip or a chip array (5 × 5 size).

Extended Data Fig. 3 Throughput-aware unfolded mapping and resource-aware folded mapping.

a, Unfolded mapping converts all topologies into a fully connected (FC) structure without reusing data. In CANN: Norm, normalization; r, firing rate; V, membrane potential. In LSTM: f/i/o, forget/input/output gate output; g, input activation; h/c, hidden/cell state; t, time step; x, external input. b, Folded mapping folds the network along the row dimension of feature maps (FMs) for resource reuse. We note that the weights are still unfolded along the column dimension to maintain parallelism, and wide FMs can be split into multiple slices, which are allocated into different FCores for concurrent processing. r0/1/2, row 0/1/2.

Extended Data Fig. 4 Chip measurements in different modes.

a, Power consumption in ANN-only mode at different voltages and frequencies. Here the ‘compute ratio’ is the duty ratio for computation, that is, the ratio of computation time/(computation time + idle time). The phase on the x- axis denotes the execution time phase of FCore. b, Power consumption in SNN-only mode with different rates of input spikes. c, Membrane potential of output neurons in SNN mode. Information was represented in a rate-coding scheme by counting the number of spikes during a given time period.

Extended Data Fig. 5 Performance comparison and routing profiling.

a, FCore placements in six layers (split into seven execution layers); the numbers within the image denote the numbers of FCores used. b, Comparison of the performance of different neural network modes. Acc., accuracy. c, Power consumption for each layer. d, Average number of received routing packets per FCore in each layer. e, Average number of sent packets per FCore across time phases. f, Distribution of total transfer packets for each FCore. The oval with the arrow emphasizes the difference in packet amount between the SNN-only mode and the hybrid mode.

Extended Data Fig. 6 Overheads of the Tianjic chip during the bicycle experiment.

a, Placement of FCores in different network models. Numbers refer to the number of FCores used. b, Measured power consumption under different tasks and at different voltages. The Tianjic chip typically worked at 0.9 V during the bicycle demonstration, and the power consumption was about 400 mW.

Extended Data Fig. 7 Neural state machine.

a, State transition in the bicycle task. b, NSM architecture. The NSM is composed of three subgroups of neurons: state, transfer and output neurons. There are three matrices that determine the connections between different neurons: the trigger, state-transfer and output matrices.

Extended Data Table 1 A unified description of neural network models

Full size table

Extended Data Table 2 Comparison of the Tianjic chip with existing specialized platforms

Full size table

Extended Data Table 3 Model topologies and input/output descriptions for networks applied in the bicycle demonstration

Full size table

Supplementary information

Supplementary Video 1 (download MP4 )

Unmanned bicycle equipped with Tianjic chip for real-time object detection, tracking, voice recognition, obstacle avoidance, and balance control. The video consists of two scenes. In Scene 1, the bicycle rides over a speed bump, then it follows the voice commands to change direction or adjust speed. In Scene 2, the bicycle detects and tracks a moving human, and avoids obstacles when necessary.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, J., Deng, L., Song, S. et al. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 106–111 (2019). https://doi.org/10.1038/s41586-019-1424-8

Download citation

  • Received: 20 May 2018

  • Accepted: 07 May 2019

  • Published: 31 July 2019

  • Version of record: 31 July 2019

  • Issue date: 01 August 2019

成为付费用户可以阅读 加速进化 所有资料

了解更多 →
阅读原文 ↗Nature(封面论文)