Software Analytics in Practice: Approaches and Experiences
Software Analytics in Practice: Approaches and Experiences
Software Analytics in Practice: Approaches and Experiences
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Software</strong> <strong>Analytics</strong> <strong>in</strong> <strong>Practice</strong>:<br />
<strong>Approaches</strong> <strong>and</strong> <strong>Experiences</strong><br />
Dongmei Zhang<br />
Senior Researcher/Research Manager<br />
<strong>Software</strong> <strong>Analytics</strong> group, Microsoft Research Asia<br />
December 16, 2012<br />
(Speaker Info)<br />
Frodo Bagg<strong>in</strong>s<br />
R<strong>in</strong>g Bearer<br />
FOTR, LLC
Outl<strong>in</strong>e<br />
• <strong>Software</strong> <strong>Analytics</strong><br />
• Experience shar<strong>in</strong>g <strong>in</strong> practic<strong>in</strong>g <strong>Software</strong> <strong>Analytics</strong><br />
• Example projects<br />
2<br />
ISHCS 2011
New Era …New Opportunities…<br />
<strong>Software</strong> itself is chang<strong>in</strong>g …<br />
The way people use software is chang<strong>in</strong>g …<br />
How software is built <strong>and</strong> operated is chang<strong>in</strong>g…<br />
Scope of software development & tools has naturally exp<strong>and</strong>ed...<br />
3<br />
ISHCS 2011
<strong>Software</strong> <strong>Analytics</strong> Group @ MSRA<br />
Utilize data-driven approach to help create highly perform<strong>in</strong>g, user<br />
friendly, <strong>and</strong> efficiently developed <strong>and</strong> operated software <strong>and</strong> services.<br />
Information Visualization<br />
<strong>Software</strong><br />
Users<br />
<strong>Software</strong><br />
Development<br />
Process<br />
Vertical<br />
Analysis Algorithms<br />
<strong>Software</strong><br />
Systems<br />
Horizontal<br />
Large-scale Comput<strong>in</strong>g<br />
Research Topics<br />
Technology Pillars<br />
4<br />
ISHCS 2011
<strong>Software</strong> <strong>Analytics</strong><br />
<strong>Software</strong> analytics is to enable software practitioners<br />
to perform data exploration <strong>and</strong> analysis <strong>in</strong> order to<br />
obta<strong>in</strong> <strong>in</strong>sightful <strong>and</strong> actionable <strong>in</strong>formation for datadriven<br />
tasks around software <strong>and</strong> services.<br />
5<br />
ISHCS 2011
<strong>Software</strong> <strong>Analytics</strong> <strong>in</strong> <strong>Practice</strong><br />
• Gett<strong>in</strong>g real<br />
– Work<strong>in</strong>g on real data<br />
– Address<strong>in</strong>g real problems<br />
– Build<strong>in</strong>g real tools<br />
– Mak<strong>in</strong>g real impact<br />
• Experience shar<strong>in</strong>g<br />
– Engagement of practitioners<br />
– Walk<strong>in</strong>g the last mile<br />
– Comb<strong>in</strong>ation of expertise<br />
6<br />
ISHCS 2011
Engagement of Practitioners<br />
• Broad range of practitioners<br />
– Developers, testers, program managers, UI designers, customer<br />
support, operators…<br />
• Solv<strong>in</strong>g their problems<br />
• Champions <strong>in</strong> product teams<br />
• Tim<strong>in</strong>g<br />
• Culture<br />
7<br />
ISHCS 2011
Walk<strong>in</strong>g the Last Mile<br />
• Target<strong>in</strong>g at real scenarios<br />
• Try<strong>in</strong>g out tools has cost<br />
• “It works” is not enough<br />
– Performance<br />
– Userbility<br />
– Customizability<br />
– Predictability<br />
• Feedback & improve -> iterative process<br />
• Gett<strong>in</strong>g eng<strong>in</strong>eer<strong>in</strong>g support<br />
8<br />
ISHCS 2011
Comb<strong>in</strong>ation of Expertise<br />
• Research capabilities<br />
• Eng<strong>in</strong>eer<strong>in</strong>g skills to build systems<br />
• Visualization & design lead to ease of use<br />
• Project management<br />
• Communication<br />
9<br />
ISHCS 2011
Example Projects<br />
• Code Clone Analysis<br />
– Y<strong>in</strong>gnong Dang, Song Ge, Gong Cheng, Weipeng Liu, Dongmei<br />
Zhang<br />
• StackM<strong>in</strong>e<br />
– Shi Han, Y<strong>in</strong>gnong Dang, Song Ge, Dongmei Zhang<br />
10<br />
ISHCS 2011
XIAO – Code Clone Analysis<br />
四 十 年 来 画 竹 枝 ,<br />
日 间 挥 写 夜 间 思 ;<br />
繁 冗 削 尽 留 清 瘦 ,<br />
画 到 生 时 是 熟 时 .<br />
- [ 清 ] 郑 板 桥<br />
“ 削 ”<br />
XIAO<br />
• 削 (XIAO) means “trimm<strong>in</strong>g” <strong>in</strong> Ch<strong>in</strong>ese<br />
• Similar spirit between bamboo pa<strong>in</strong>t<strong>in</strong>g <strong>and</strong> programm<strong>in</strong>g<br />
11<br />
ISHCS 2011
XIAO: Code Clone Analysis<br />
• Motivation<br />
– Copy-<strong>and</strong>-paste is a common developer behavior<br />
– A real tool widely adopted at Microsoft<br />
• XIAO enables code clone analysis <strong>in</strong> the follow<strong>in</strong>g way<br />
– High tunability<br />
– High scalability<br />
– High compatibility<br />
– High explorability<br />
[IWSC’11 Dang et.al.]<br />
12<br />
ISHCS 2011
Comprehensive Solution<br />
Quality gates at milestones<br />
• Architecture refactor<strong>in</strong>g<br />
• Code clone clean up<br />
• Bug fix<strong>in</strong>g<br />
Post-release ma<strong>in</strong>tenance<br />
• Security bug <strong>in</strong>vestigation<br />
• Bug <strong>in</strong>vestigation for susta<strong>in</strong>ed eng<strong>in</strong>eer<strong>in</strong>g<br />
Onl<strong>in</strong>e code<br />
clone search<br />
Offl<strong>in</strong>e code<br />
clone analysis<br />
Development <strong>and</strong> test<strong>in</strong>g<br />
• Similar issue check before check-<strong>in</strong><br />
• Reference <strong>in</strong>fo for code reviewer<br />
• Support<strong>in</strong>g tool for bug triage<br />
13<br />
ISHCS 2011
Adoption <strong>in</strong> Microsoft<br />
• More than 900 downloads<br />
• Ga<strong>in</strong><strong>in</strong>g overall underst<strong>and</strong><strong>in</strong>g of copy-<strong>and</strong>-paste clones<br />
<strong>in</strong> a codebase<br />
• F<strong>in</strong>d<strong>in</strong>g potential bugs & refactor<strong>in</strong>g opportunities<br />
• Add<strong>in</strong>g custom parsers<br />
14<br />
ISHCS 2011
More Secure Microsoft Products<br />
Code Clone Search service <strong>in</strong>tegrated <strong>in</strong>to<br />
workflow of Microsoft Security Response Center<br />
Over 400 million l<strong>in</strong>es of code <strong>in</strong>dexed across<br />
multiple products<br />
Real security issues proactively identified <strong>and</strong><br />
addressed<br />
15<br />
ISHCS 2011
Benefit<strong>in</strong>g Developer Community<br />
Available <strong>in</strong> Visual Studio vNext<br />
Search<strong>in</strong>g similar snippets<br />
for fix<strong>in</strong>g bug once<br />
F<strong>in</strong>d<strong>in</strong>g refactor<strong>in</strong>g<br />
opportunity<br />
16<br />
ISHCS 2011
StackM<strong>in</strong>e: Towards Flawless OS Performance<br />
OS performance <strong>in</strong> the real world<br />
• One of top user compla<strong>in</strong>ts<br />
• Impact<strong>in</strong>g large number of users every day<br />
• High impact on usability <strong>and</strong> productivity<br />
Internet<br />
Challenges<br />
• Large-scale trace data<br />
• Highly complex performance analysis at OS level<br />
• Comb<strong>in</strong>ation of mach<strong>in</strong>e learn<strong>in</strong>g <strong>and</strong> doma<strong>in</strong> knowledge<br />
Problems<br />
• Unknown issue discovery<br />
• Issue prioritization<br />
• Scalable to large number of traces<br />
17<br />
ISHCS 2011
Technical Highlights<br />
• Mach<strong>in</strong>e learn<strong>in</strong>g for system doma<strong>in</strong><br />
– Formulate the discovery of problematic execution patterns as<br />
callstack m<strong>in</strong><strong>in</strong>g & cluster<strong>in</strong>g<br />
– Systematic mechanism to <strong>in</strong>corporate doma<strong>in</strong> knowledge<br />
• Interactive performance analysis system<br />
– Parallel m<strong>in</strong><strong>in</strong>g <strong>in</strong>frastructure based on HPC + MPI<br />
– Visualization aided <strong>in</strong>teractive exploration<br />
18<br />
ISHCS 2011
Impact<br />
“We believe that the MSRA tool is highly valuable <strong>and</strong> much more<br />
efficient for mass trace (100+ traces) analysis. For 1000 traces, we<br />
believe the tool saves us 4-6 weeks of time to create new signatures,<br />
which is quite a significant productivity boost.”<br />
- from Development Manager <strong>in</strong> W<strong>in</strong>dows<br />
Highly effective new issue discovery on W<strong>in</strong>dows m<strong>in</strong>i-hang<br />
Cont<strong>in</strong>uous impact on future W<strong>in</strong>dows versions<br />
19<br />
ISHCS 2011
Suggested Actions<br />
• Get research problems from real practice<br />
• Get feedback from real practice<br />
• Collaborate across discipl<strong>in</strong>es<br />
• Collaborate with <strong>in</strong>dustry<br />
20<br />
ISHCS 2011
Summary<br />
• Scope of software development & tools has<br />
naturally exp<strong>and</strong>ed<br />
• <strong>Software</strong> <strong>Analytics</strong> – Insightful & actionable<br />
• Experience shar<strong>in</strong>g<br />
– Engagement of practitioners<br />
– Walk<strong>in</strong>g the last mile<br />
– Comb<strong>in</strong>ation of expertise<br />
21<br />
ISHCS 2011
Q & A<br />
http://research.microsoft.com/groups/sa/