You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Intel Lake Crest <br />
<strong>Yutaka</strong> <strong>Yasuda</strong>, <strong>Kyoto</strong> <strong>Sangyo</strong> <strong>University</strong>, <strong>20<strong>16</strong></strong>/<strong>12</strong>/<strong>16</strong>
AI <br />
<strong>20<strong>16</strong></strong>.3 AlphaGO vs <br />
<strong>20<strong>16</strong></strong>.9 Google (AI) <br />
2015 Google Photo <br />
“Google's AlphaGo AI Continues to Wallop Expert Human Go Player”, Popular Mechanics, <strong>20<strong>16</strong></strong>/3/10<br />
http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/
Deep Learning<br />
<br />
2014ImageNetGoogle20<br />
20<strong>12</strong>Google
”Deep Visual-Semantic<br />
Alignments for<br />
Generating Image<br />
Descriptions”,<br />
Andrej Karpathy, Li Fei-Fei,<br />
Stanford <strong>University</strong>, CVPR 2015
Neural Network = Neuron <br />
<br />
<br />
https://en.wikipedia.org/wiki/Artificial_neural_network
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />
http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />
http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />
http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
https://www.youtube.com/watch?v=BMEffRAvnk4
Why nVIDIA?
Lake Crest
Intel Artificial Intelligence Day<br />
<strong>20<strong>16</strong></strong>/11/17 -<strong>12</strong>:30 PM PT San Francisco
http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html
Intel Nervana Engine
https://www.nervanasys.com/technology/engine/
ASIC
Wikipedia<br />
CPUASICGPUASIC
“ ASIC ”<br />
Nervana Engine Web
2.5D<br />
<br />
Blazingly fast data<br />
access via<br />
high-bandwith<br />
memory (HBM)
8GB HBM2 x4<br />
Processing Cluster x<strong>12</strong> (3x4)<br />
ICL (Inter Chip Link) x<strong>12</strong>
HBM?
• HBM<br />
DRAM<br />
• <br />
• GPU<br />
Interposer <br />
<br />
• 2.5D <br />
An Introduction to HBM - High Bandwidth Memory - <br />
Stacked Memory and The Interposer <br />
http://www.guru3d.com/articles-pages/an-introduction-to-hbm-high-bandwidth-memory,2.html
GDDR5<br />
HBM2<br />
32-bit Bus With 1024-bit<br />
Up-to 1750 MHz (7 Gbps) 2 Gbps<br />
Up-to 28 GB/s per chip <br />
<strong>12</strong>5GB/s (2Tb/s)<br />
per unit<br />
1.5V 1.3V
LGA 2011: CPU2011<br />
Xeon E5 <strong>16</strong>00/2600 v4 Broadwell-EP <br />
2000 1024 x4 <br />
<br />
<br />
→ Wikipedia: LGA 2011
http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html
Tensor
https://www.tensorflow.org<br />
“TensorFlow: Large-Scale Machine Learning on<br />
Heterogeneous Distributed Systems”, Abdai, et. al, 2015, <br />
https://arxiv.org/abs/<strong>16</strong>03.04467v2
https://en.wikipedia.org/wiki/<br />
Artificial_neural_network<br />
https://www.tensorflow.org/tutorials/mnist/beginners/<br />
or CPU
Nervana Engine <br />
ASIC <br />
Tensor <br />
HBM2 4 unit <br />
HBM 1024bit!<br />
2.5D
Nervana Engine <br />
<br />
<strong>12</strong> <br />
100Gbit/s<br />
<br />
<br />
<br />
https://www.nervanasys.com/technology/engine/
100Gbit/s *<strong>12</strong>
Deep Learning GPU <br />
GPU
GPUSIMD<br />
“”<br />
<br />
<br />
<br />
<br />
<br />
<br />
http://logmi.jp/45705
GPUSIMD<br />
<br />
GPU32bit<br />
<br />
<br />
AIGPU<br />
nVIDIA CPU<br />
<br />
https://www.tensorflow.org/tutorials/mnist/beginners/
GPU<br />
Nervana Engine
Binary Neural Network <br />
GPU32bit<br />
<br />
<br />
<br />
<br />
BNN - Binarized Neural<br />
Network ( -1 / +1 )<br />
Nervana <br />
Accelerating Neural Networks with Binary Arithmetic<br />
https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/
“Accelerating Neural Networks with Binary Arithmetic”<br />
(blog post)<br />
These 32 bit floating point multiplications, however, are very expensive.<br />
32bit <br />
In BNNs, floating point multiplications are supplanted with <br />
bitwise XNORs and left and right bit shifts.<br />
This is extremely attractive from a hardware perspective:<br />
BNN XNOR bit shift <br />
<br />
binary operations can be implemented computationally efficiently at a low<br />
power cost.<br />
<br />
Nervana website (blog post)<br />
https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/
Nervana Engine <br />
GPU SIMD <br />
<br />
BNN (ASIC)<br />
XNOR <br />
-1 0, +1 1 <br />
<br />
Tensor
GPU nVIDIA
IntelXeon Phi<br />
http://www.4gamer.net/games/049/G004963/<strong>20<strong>16</strong></strong>1007061/
IntelNervana Engine
https://software.intel.com/en-us/blogs/2013/avx-5<strong>12</strong>-instructions
Deep Learning <br />
nVIDIA GPU <br />
Deep Learning <br />
Nervana Binalized HBM2 <br />
nVIDIA FP<strong>16</strong> <br />
Intel AVX-5<strong>12</strong> SIMD <br />
Google TPU (Tensor Processing Unit) 8bit CPU!
XNOR / <br />
CPU<br />
100Gbps<br />
<br />
SIMD
“Follow your heart”<br />
'You've got to find what you love,' Jobs says<br />
Steve Jobs, 2005, Stanford <strong>University</strong><br />
https://www.youtube.com/watch?v=UF8uR6Z6KLc