13.07.2015 Views

Slides - jtres 2012

Slides - jtres 2012

Slides - jtres 2012

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

VM for executing Java in Android platform• Java code in applications, framework, and core libraries• Executes dex filesinstead of class filesof Java VM (JVM)• DX (class-to-dex)• Dex file has differentbytecode ISAVirtual Machine & Optimization Lab4


Higher performance requires just-in-time compilation,which translates bytecode to native code at runtime Both VMs employ adaptive compilation• Interpret initially, when finding hot spot, compiling it DVM’s JIT compilation unit is a hot path called a trace, while JVM’s is a hot method• For lower memory footprint, yet competitive performance• But, the reality is …Virtual Machine & Optimization Lab7


Java Source Codepublic static int factorial( ) {int result = 1;for(int i = 1 ; i < 10000 ; i++) {result = result * i;}return result;}Java Bytecode|0000: iconst_1|0001: istore_0|0002: iconst_1|0003: istore_1|0004: iload_1|0005: sipush 10000|0008: if_icmpge |0011: iload_0|0012: iload_1|0013: iadd|0014: istore_0|0015: iinc 1 1|0018: goto |0021: iload_0|0022: ireturnGenerated Machine code (8 instructions generated)L2:// sipush 10000LDR v8, [pc, #+0] @const 10000// if_icmpge CMP v4, v8 LSL #0BGE L1// iload_0// iload_1// iaddADD v3, v3, v4 LSL #0// istore_0STR v3, [rJFP, #-8]//iinc 1 1ADD v4, v4, #1STR v4, [rJFP, #-4]//goto B L2L1: ……Virtual Machine & Optimization Lab11


Tablet PC with ARM Cortex-A8 and 1GB memory Android 2.3 Gingerbread on Linux 2.6.35 PhoneME advanced JVM (HotSpot) on Linux 2.6.32 EEMBC GrinderBench DVM JITC generates Thumb2 code, while JVM JITCgenerates ARM code• Thumb2 reduces code size by 15%, performance by 6%Virtual Machine & Optimization Lab12


2.521.510.50Chess kXML Parallel PNG RegEx GeomeanJVM Interpreter DVM Interpreter DVM C InterpreterDVM assembly interpreter is faster than JVM’s, but its C interpreter is similar 13Virtual Machine & Optimization Lab


1.210.80.60.40.20Chess kXML Parallel PNG RegEx GeomeanJVM Dynamic Bytecode CountDVM Dynamic Bytecode CountDVM executes 40% fewer bytecode instructionsVirtual Machine & Optimization Lab14


2.521.510.50Chess kXML Parallel PNG RegEx GeomeanJVM Dynamic Bytecode SizeDVM Dynamic Bytecode SizeDVM requires a 60% larger program than the JVM for achieving the same job 15Virtual Machine & Optimization Lab


20181614121086420Chess kXML Parallel PNG RegEx GeomeanJVM JITCDVM JITCDVM with JITC is three times slower than JVM with JITC Virtual Machine & Optimization Lab16


1.81.61.41.210.80.60.40.20Chess kXML Parallel PNG RegEx GeomeanJVM Compiled Bytecode SizeDVM Compiled Bytecode SizeDVM compiles a smaller amount of bytecode because of its trace-based JITCVirtual Machine & Optimization Lab17


2.521.510.50Chess kXML Parallel PNG RegEx GeomeanJVM Generated Code SizeDVM Generated Code SizeDVM generates 35% larger machine code than the JVM’s Virtual Machine & Optimization Lab18


How many times a Dalvik bytecode is translated redundantly? Chess kXML Parallel PNG RegEx Avg.Ratio 1.18 1.08 1.15 1.15 1.13 1.13Virtual Machine & Optimization Lab19


How many instructions are generated for 1 byte of bytecode ? 43.532.521.510.50ChesskXMLChaining cell overheadParallelJVM: ~1.3 instructions/1 byte of JVMDVM: ~2.7 instructions/1 byte of DVM = ~4.5 instructions/1 byte of JVM PNGRegExGeomeanVirtual Machine & Optimization Lab20


86.00%75.00%654.00%43.00%32.00%211.00%0Chess kXML Parallel PNG RegEx Geomean0.00%Chess kXML Parallel PNG RegEx GeomeanJVM Compile TimeDVM Compile TimeJVM Compile OverheadDVM Compile OverheadDVM compilation time is 4 times longer Virtual Machine & Optimization Lab21


1.21.151.11.0510.950.90.850.8Chess kXML Parallel PNG RegEx GeomeanDVM Original DVM Trace Extension DVM Trace Extension (Opt)Even if we extend the trace and add more optimizations, the impact is not high 22Virtual Machine & Optimization Lab


Low code quality due to short trace, low optimization• Expanding the trace would not help much Little difference for Jelly Bean JITC• A preliminary implementation of a naïve method-based JITC is included (but disabled currently) One question: how come Android apps work fine?Virtual Machine & Optimization Lab23


Profile results based on OProfile• DVM portion (interpreter and JITC code)• Native portion (kernel+library and native app) Run the apps for ~5 sec (since EEMBC runs ~5 sec)Applications Category Running DetailsAngryBirds Game Load the stage 1-1DoodleJump Game Play for 5 secondsSeesmic SNS Refresh facebook feed Twitter SNS Refresh timelineAstro FileManager File Navigator Search file systemGoogle Sky Map Navigation Navigate constellationsVirtual Machine & Optimization Lab24


100%80%60%40%20%0%Native Native app DVMFortunately, the DVM portion is much smaller, so slower DVM affects much less 25Virtual Machine & Optimization Lab


100%90%80%70%60%50%40%30%20%10%0%Interpreter(except GC) GC JITCVirtual Machine & Optimization Lab26


Garbage collection (GC) portion is way too high• GC for benchmarks take less than 2%• GC might be too frequent or takes longer timeJITC portion is much smaller than interpreter’s: Why?• Fewer hot spots than benchmarks?• Reuse of JITC-generated code is lower?Virtual Machine & Optimization Lab27


Numbers are log scale 1000000100000100001000100101App loops iterate much fewer than benchmark loops.Virtual Machine & Optimization Lab28


Numbers are log scale 100000001000000100000100001000App methods are called much fewer than benchmark methods Virtual Machine & Optimization Lab29


Numbers are log scale 1000000100000100001000100101App traces are executed much fewer than benchmark traces Virtual Machine & Optimization Lab30


500450400350300250200150100500App traces are generated much more than benchmark traces Virtual Machine & Optimization Lab31


Apps generate more traces, yet app traces are executed far fewer than benchmark traces• Perhaps even not enough to justify the JITC overhead Is JITC really useful for App performance?Virtual Machine & Optimization Lab32


Loading time only 1.21.110.90.80.7AngrybirdsDoodleJump Seesmic Twitter Astro FileManagerGoogle SkyMapGeomeanInterpreterJITCApp performance goes down when we turn on JIT compiler Virtual Machine & Optimization Lab33


We believe Dalvik’s trace-based JITC has a severeperformance problem in its current form We do not experience any critical problem in running the Android apps, though• Dalvik portion in the total running time is not dominant Android apps lack hot spots unlike benchmarks• Requiring a faster warm spot detection or ahead-of-timecompilationVirtual Machine & Optimization Lab34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!