Yutaka Yasuda Kyoto Sangyo University 2016/12/16

Intel Lake Crest 

Yutaka Yasuda, Kyoto Sangyo University, 2016/12/16

AI 

2016.3 AlphaGO vs 

2016.9 Google (AI) 

2015 Google Photo 

“Google's AlphaGo AI Continues to Wallop Expert Human Go Player”, Popular Mechanics, 2016/3/10 

http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/

Deep Learning 

 

2014ImageNetGoogle20 

2012Google

”Deep Visual-Semantic 

Alignments for 

Generating Image 

Descriptions”, 

Andrej Karpathy, Li Fei-Fei, 

Stanford University, CVPR 2015

Neural Network = Neuron 

 

 

https://en.wikipedia.org/wiki/Artificial_neural_network

“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang 

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6





https://www.youtube.com/watch?v=BMEffRAvnk4

Why nVIDIA?

Lake Crest

Intel Artificial Intelligence Day 

2016/11/17 -12:30 PM PT San Francisco

http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html

Intel Nervana Engine

https://www.nervanasys.com/technology/engine/

ASIC

Wikipedia 

CPUASICGPUASIC

“ ASIC ” 

Nervana Engine Web

2.5D 

 

Blazingly fast data 

access via 

high-bandwith 

memory (HBM)

8GB HBM2 x4 

Processing Cluster x12 (3x4) 

ICL (Inter Chip Link) x12

HBM?

• HBM 

DRAM 

• 

• GPU 

Interposer 

 

• 2.5D 

An Introduction to HBM - High Bandwidth Memory - 

Stacked Memory and The Interposer 

http://www.guru3d.com/articles-pages/an-introduction-to-hbm-high-bandwidth-memory,2.html

GDDR5 

HBM2 

32-bit Bus With 1024-bit 

Up-to 1750 MHz (7 Gbps) 2 Gbps 

Up-to 28 GB/s per chip 

125GB/s (2Tb/s) 

per unit 

1.5V 1.3V

LGA 2011: CPU2011 

Xeon E5 1600/2600 v4 Broadwell-EP 

2000 1024 x4 

 

 

→ Wikipedia: LGA 2011

http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html

Tensor

https://www.tensorflow.org 

“TensorFlow: Large-Scale Machine Learning on 

Heterogeneous Distributed Systems”, Abdai, et. al, 2015, 

https://arxiv.org/abs/1603.04467v2

https://en.wikipedia.org/wiki/ 

Artificial_neural_network 

https://www.tensorflow.org/tutorials/mnist/beginners/ 

or CPU

Nervana Engine 

ASIC 

Tensor 

HBM2 4 unit 

HBM 1024bit! 

2.5D


 

12 

100Gbit/s 

 

 

 

https://www.nervanasys.com/technology/engine/

100Gbit/s *12

Deep Learning GPU 

GPU

GPUSIMD 

“” 

 

 

 

 

 

 

http://logmi.jp/45705

GPUSIMD 

 

GPU32bit 

 

 

AIGPU 

nVIDIA CPU 

 

https://www.tensorflow.org/tutorials/mnist/beginners/

GPU 

Nervana Engine

Binary Neural Network 

GPU32bit 

 

 

 

 

BNN - Binarized Neural 

Network ( -1 / +1 ) 

Nervana 

Accelerating Neural Networks with Binary Arithmetic 

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/

“Accelerating Neural Networks with Binary Arithmetic” 

(blog post) 

These 32 bit floating point multiplications, however, are very expensive. 

32bit 

In BNNs, floating point multiplications are supplanted with 

bitwise XNORs and left and right bit shifts. 

This is extremely attractive from a hardware perspective: 

BNN XNOR bit shift 

 

binary operations can be implemented computationally efficiently at a low 

power cost. 

 

Nervana website (blog post) 

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/


GPU SIMD 

 

BNN (ASIC) 

XNOR 

-1 0, +1 1 

 

Tensor

GPU nVIDIA

IntelXeon Phi 

http://www.4gamer.net/games/049/G004963/20161007061/

IntelNervana Engine

https://software.intel.com/en-us/blogs/2013/avx-512-instructions

Deep Learning 

nVIDIA GPU 

Deep Learning 

Nervana Binalized HBM2 

nVIDIA FP16 

Intel AVX-512 SIMD 

Google TPU (Tensor Processing Unit) 8bit CPU!

Google

XNOR / 

CPU 

100Gbps 

 

SIMD

“Follow your heart” 

'You've got to find what you love,' Jobs says 

Steve Jobs, 2005, Stanford University 

https://www.youtube.com/watch?v=UF8uR6Z6KLc

Yutaka Yasuda Kyoto Sangyo University 2016/12/16

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?