03.01.2017 Views

Yutaka Yasuda Kyoto Sangyo University 2016/12/16

20161216LunchTime

20161216LunchTime

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Intel Lake Crest <br />

<strong>Yutaka</strong> <strong>Yasuda</strong>, <strong>Kyoto</strong> <strong>Sangyo</strong> <strong>University</strong>, <strong>20<strong>16</strong></strong>/<strong>12</strong>/<strong>16</strong>


AI <br />

<strong>20<strong>16</strong></strong>.3 AlphaGO vs <br />

<strong>20<strong>16</strong></strong>.9 Google (AI) <br />

2015 Google Photo <br />

“Google's AlphaGo AI Continues to Wallop Expert Human Go Player”, Popular Mechanics, <strong>20<strong>16</strong></strong>/3/10<br />

http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/


Deep Learning<br />

<br />

2014ImageNetGoogle20<br />

20<strong>12</strong>Google


”Deep Visual-Semantic<br />

Alignments for<br />

Generating Image<br />

Descriptions”,<br />

Andrej Karpathy, Li Fei-Fei,<br />

Stanford <strong>University</strong>, CVPR 2015


Neural Network = Neuron <br />

<br />

<br />

https://en.wikipedia.org/wiki/Artificial_neural_network


“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6


“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6


“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang<br />

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6


https://www.youtube.com/watch?v=BMEffRAvnk4


Why nVIDIA?


Lake Crest


Intel Artificial Intelligence Day<br />

<strong>20<strong>16</strong></strong>/11/17 -<strong>12</strong>:30 PM PT San Francisco


http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html


Intel Nervana Engine


https://www.nervanasys.com/technology/engine/


ASIC


Wikipedia<br />

CPUASICGPUASIC


“ ASIC ”<br />

Nervana Engine Web


2.5D<br />

<br />

Blazingly fast data<br />

access via<br />

high-bandwith<br />

memory (HBM)


8GB HBM2 x4<br />

Processing Cluster x<strong>12</strong> (3x4)<br />

ICL (Inter Chip Link) x<strong>12</strong>


HBM?


• HBM<br />

DRAM<br />

• <br />

• GPU<br />

Interposer <br />

<br />

• 2.5D <br />

An Introduction to HBM - High Bandwidth Memory - <br />

Stacked Memory and The Interposer <br />

http://www.guru3d.com/articles-pages/an-introduction-to-hbm-high-bandwidth-memory,2.html


GDDR5<br />

HBM2<br />

32-bit Bus With 1024-bit<br />

Up-to 1750 MHz (7 Gbps) 2 Gbps<br />

Up-to 28 GB/s per chip <br />

<strong>12</strong>5GB/s (2Tb/s)<br />

per unit<br />

1.5V 1.3V


LGA 2011: CPU2011<br />

Xeon E5 <strong>16</strong>00/2600 v4 Broadwell-EP <br />

2000 1024 x4 <br />

<br />

<br />

→ Wikipedia: LGA 2011


http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html


Tensor


https://www.tensorflow.org<br />

“TensorFlow: Large-Scale Machine Learning on<br />

Heterogeneous Distributed Systems”, Abdai, et. al, 2015, <br />

https://arxiv.org/abs/<strong>16</strong>03.04467v2


https://en.wikipedia.org/wiki/<br />

Artificial_neural_network<br />

https://www.tensorflow.org/tutorials/mnist/beginners/<br />

or CPU


Nervana Engine <br />

ASIC <br />

Tensor <br />

HBM2 4 unit <br />

HBM 1024bit!<br />

2.5D


Nervana Engine <br />

<br />

<strong>12</strong> <br />

100Gbit/s<br />

<br />

<br />

<br />

https://www.nervanasys.com/technology/engine/


100Gbit/s *<strong>12</strong>


Deep Learning GPU <br />

GPU


GPUSIMD<br />

“”<br />

<br />

<br />

<br />

<br />

<br />

<br />

http://logmi.jp/45705


GPUSIMD<br />

<br />

GPU32bit<br />

<br />

<br />

AIGPU<br />

nVIDIA CPU<br />

<br />

https://www.tensorflow.org/tutorials/mnist/beginners/


GPU<br />

Nervana Engine


Binary Neural Network <br />

GPU32bit<br />

<br />

<br />

<br />

<br />

BNN - Binarized Neural<br />

Network ( -1 / +1 )<br />

Nervana <br />

Accelerating Neural Networks with Binary Arithmetic<br />

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/


“Accelerating Neural Networks with Binary Arithmetic”<br />

(blog post)<br />

These 32 bit floating point multiplications, however, are very expensive.<br />

32bit <br />

In BNNs, floating point multiplications are supplanted with <br />

bitwise XNORs and left and right bit shifts.<br />

This is extremely attractive from a hardware perspective:<br />

BNN XNOR bit shift <br />

<br />

binary operations can be implemented computationally efficiently at a low<br />

power cost.<br />

<br />

Nervana website (blog post)<br />

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/


Nervana Engine <br />

GPU SIMD <br />

<br />

BNN (ASIC)<br />

XNOR <br />

-1 0, +1 1 <br />

<br />

Tensor


GPU nVIDIA


IntelXeon Phi<br />

http://www.4gamer.net/games/049/G004963/<strong>20<strong>16</strong></strong>1007061/


IntelNervana Engine


https://software.intel.com/en-us/blogs/2013/avx-5<strong>12</strong>-instructions


Deep Learning <br />

nVIDIA GPU <br />

Deep Learning <br />

Nervana Binalized HBM2 <br />

nVIDIA FP<strong>16</strong> <br />

Intel AVX-5<strong>12</strong> SIMD <br />

Google TPU (Tensor Processing Unit) 8bit CPU!


Google


XNOR / <br />

CPU<br />

100Gbps<br />

<br />

SIMD


“Follow your heart”<br />

'You've got to find what you love,' Jobs says<br />

Steve Jobs, 2005, Stanford <strong>University</strong><br />

https://www.youtube.com/watch?v=UF8uR6Z6KLc

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!