24.11.2014 Views

Software Analytics in Practice: Approaches and Experiences

Software Analytics in Practice: Approaches and Experiences

Software Analytics in Practice: Approaches and Experiences

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Software</strong> <strong>Analytics</strong> <strong>in</strong> <strong>Practice</strong>:<br />

<strong>Approaches</strong> <strong>and</strong> <strong>Experiences</strong><br />

Dongmei Zhang<br />

Senior Researcher/Research Manager<br />

<strong>Software</strong> <strong>Analytics</strong> group, Microsoft Research Asia<br />

December 16, 2012<br />

(Speaker Info)<br />

Frodo Bagg<strong>in</strong>s<br />

R<strong>in</strong>g Bearer<br />

FOTR, LLC


Outl<strong>in</strong>e<br />

• <strong>Software</strong> <strong>Analytics</strong><br />

• Experience shar<strong>in</strong>g <strong>in</strong> practic<strong>in</strong>g <strong>Software</strong> <strong>Analytics</strong><br />

• Example projects<br />

2<br />

ISHCS 2011


New Era …New Opportunities…<br />

<strong>Software</strong> itself is chang<strong>in</strong>g …<br />

The way people use software is chang<strong>in</strong>g …<br />

How software is built <strong>and</strong> operated is chang<strong>in</strong>g…<br />

Scope of software development & tools has naturally exp<strong>and</strong>ed...<br />

3<br />

ISHCS 2011


<strong>Software</strong> <strong>Analytics</strong> Group @ MSRA<br />

Utilize data-driven approach to help create highly perform<strong>in</strong>g, user<br />

friendly, <strong>and</strong> efficiently developed <strong>and</strong> operated software <strong>and</strong> services.<br />

Information Visualization<br />

<strong>Software</strong><br />

Users<br />

<strong>Software</strong><br />

Development<br />

Process<br />

Vertical<br />

Analysis Algorithms<br />

<strong>Software</strong><br />

Systems<br />

Horizontal<br />

Large-scale Comput<strong>in</strong>g<br />

Research Topics<br />

Technology Pillars<br />

4<br />

ISHCS 2011


<strong>Software</strong> <strong>Analytics</strong><br />

<strong>Software</strong> analytics is to enable software practitioners<br />

to perform data exploration <strong>and</strong> analysis <strong>in</strong> order to<br />

obta<strong>in</strong> <strong>in</strong>sightful <strong>and</strong> actionable <strong>in</strong>formation for datadriven<br />

tasks around software <strong>and</strong> services.<br />

5<br />

ISHCS 2011


<strong>Software</strong> <strong>Analytics</strong> <strong>in</strong> <strong>Practice</strong><br />

• Gett<strong>in</strong>g real<br />

– Work<strong>in</strong>g on real data<br />

– Address<strong>in</strong>g real problems<br />

– Build<strong>in</strong>g real tools<br />

– Mak<strong>in</strong>g real impact<br />

• Experience shar<strong>in</strong>g<br />

– Engagement of practitioners<br />

– Walk<strong>in</strong>g the last mile<br />

– Comb<strong>in</strong>ation of expertise<br />

6<br />

ISHCS 2011


Engagement of Practitioners<br />

• Broad range of practitioners<br />

– Developers, testers, program managers, UI designers, customer<br />

support, operators…<br />

• Solv<strong>in</strong>g their problems<br />

• Champions <strong>in</strong> product teams<br />

• Tim<strong>in</strong>g<br />

• Culture<br />

7<br />

ISHCS 2011


Walk<strong>in</strong>g the Last Mile<br />

• Target<strong>in</strong>g at real scenarios<br />

• Try<strong>in</strong>g out tools has cost<br />

• “It works” is not enough<br />

– Performance<br />

– Userbility<br />

– Customizability<br />

– Predictability<br />

• Feedback & improve -> iterative process<br />

• Gett<strong>in</strong>g eng<strong>in</strong>eer<strong>in</strong>g support<br />

8<br />

ISHCS 2011


Comb<strong>in</strong>ation of Expertise<br />

• Research capabilities<br />

• Eng<strong>in</strong>eer<strong>in</strong>g skills to build systems<br />

• Visualization & design lead to ease of use<br />

• Project management<br />

• Communication<br />

9<br />

ISHCS 2011


Example Projects<br />

• Code Clone Analysis<br />

– Y<strong>in</strong>gnong Dang, Song Ge, Gong Cheng, Weipeng Liu, Dongmei<br />

Zhang<br />

• StackM<strong>in</strong>e<br />

– Shi Han, Y<strong>in</strong>gnong Dang, Song Ge, Dongmei Zhang<br />

10<br />

ISHCS 2011


XIAO – Code Clone Analysis<br />

四 十 年 来 画 竹 枝 ,<br />

日 间 挥 写 夜 间 思 ;<br />

繁 冗 削 尽 留 清 瘦 ,<br />

画 到 生 时 是 熟 时 .<br />

- [ 清 ] 郑 板 桥<br />

“ 削 ”<br />

XIAO<br />

• 削 (XIAO) means “trimm<strong>in</strong>g” <strong>in</strong> Ch<strong>in</strong>ese<br />

• Similar spirit between bamboo pa<strong>in</strong>t<strong>in</strong>g <strong>and</strong> programm<strong>in</strong>g<br />

11<br />

ISHCS 2011


XIAO: Code Clone Analysis<br />

• Motivation<br />

– Copy-<strong>and</strong>-paste is a common developer behavior<br />

– A real tool widely adopted at Microsoft<br />

• XIAO enables code clone analysis <strong>in</strong> the follow<strong>in</strong>g way<br />

– High tunability<br />

– High scalability<br />

– High compatibility<br />

– High explorability<br />

[IWSC’11 Dang et.al.]<br />

12<br />

ISHCS 2011


Comprehensive Solution<br />

Quality gates at milestones<br />

• Architecture refactor<strong>in</strong>g<br />

• Code clone clean up<br />

• Bug fix<strong>in</strong>g<br />

Post-release ma<strong>in</strong>tenance<br />

• Security bug <strong>in</strong>vestigation<br />

• Bug <strong>in</strong>vestigation for susta<strong>in</strong>ed eng<strong>in</strong>eer<strong>in</strong>g<br />

Onl<strong>in</strong>e code<br />

clone search<br />

Offl<strong>in</strong>e code<br />

clone analysis<br />

Development <strong>and</strong> test<strong>in</strong>g<br />

• Similar issue check before check-<strong>in</strong><br />

• Reference <strong>in</strong>fo for code reviewer<br />

• Support<strong>in</strong>g tool for bug triage<br />

13<br />

ISHCS 2011


Adoption <strong>in</strong> Microsoft<br />

• More than 900 downloads<br />

• Ga<strong>in</strong><strong>in</strong>g overall underst<strong>and</strong><strong>in</strong>g of copy-<strong>and</strong>-paste clones<br />

<strong>in</strong> a codebase<br />

• F<strong>in</strong>d<strong>in</strong>g potential bugs & refactor<strong>in</strong>g opportunities<br />

• Add<strong>in</strong>g custom parsers<br />

14<br />

ISHCS 2011


More Secure Microsoft Products<br />

Code Clone Search service <strong>in</strong>tegrated <strong>in</strong>to<br />

workflow of Microsoft Security Response Center<br />

Over 400 million l<strong>in</strong>es of code <strong>in</strong>dexed across<br />

multiple products<br />

Real security issues proactively identified <strong>and</strong><br />

addressed<br />

15<br />

ISHCS 2011


Benefit<strong>in</strong>g Developer Community<br />

Available <strong>in</strong> Visual Studio vNext<br />

Search<strong>in</strong>g similar snippets<br />

for fix<strong>in</strong>g bug once<br />

F<strong>in</strong>d<strong>in</strong>g refactor<strong>in</strong>g<br />

opportunity<br />

16<br />

ISHCS 2011


StackM<strong>in</strong>e: Towards Flawless OS Performance<br />

OS performance <strong>in</strong> the real world<br />

• One of top user compla<strong>in</strong>ts<br />

• Impact<strong>in</strong>g large number of users every day<br />

• High impact on usability <strong>and</strong> productivity<br />

Internet<br />

Challenges<br />

• Large-scale trace data<br />

• Highly complex performance analysis at OS level<br />

• Comb<strong>in</strong>ation of mach<strong>in</strong>e learn<strong>in</strong>g <strong>and</strong> doma<strong>in</strong> knowledge<br />

Problems<br />

• Unknown issue discovery<br />

• Issue prioritization<br />

• Scalable to large number of traces<br />

17<br />

ISHCS 2011


Technical Highlights<br />

• Mach<strong>in</strong>e learn<strong>in</strong>g for system doma<strong>in</strong><br />

– Formulate the discovery of problematic execution patterns as<br />

callstack m<strong>in</strong><strong>in</strong>g & cluster<strong>in</strong>g<br />

– Systematic mechanism to <strong>in</strong>corporate doma<strong>in</strong> knowledge<br />

• Interactive performance analysis system<br />

– Parallel m<strong>in</strong><strong>in</strong>g <strong>in</strong>frastructure based on HPC + MPI<br />

– Visualization aided <strong>in</strong>teractive exploration<br />

18<br />

ISHCS 2011


Impact<br />

“We believe that the MSRA tool is highly valuable <strong>and</strong> much more<br />

efficient for mass trace (100+ traces) analysis. For 1000 traces, we<br />

believe the tool saves us 4-6 weeks of time to create new signatures,<br />

which is quite a significant productivity boost.”<br />

- from Development Manager <strong>in</strong> W<strong>in</strong>dows<br />

Highly effective new issue discovery on W<strong>in</strong>dows m<strong>in</strong>i-hang<br />

Cont<strong>in</strong>uous impact on future W<strong>in</strong>dows versions<br />

19<br />

ISHCS 2011


Suggested Actions<br />

• Get research problems from real practice<br />

• Get feedback from real practice<br />

• Collaborate across discipl<strong>in</strong>es<br />

• Collaborate with <strong>in</strong>dustry<br />

20<br />

ISHCS 2011


Summary<br />

• Scope of software development & tools has<br />

naturally exp<strong>and</strong>ed<br />

• <strong>Software</strong> <strong>Analytics</strong> – Insightful & actionable<br />

• Experience shar<strong>in</strong>g<br />

– Engagement of practitioners<br />

– Walk<strong>in</strong>g the last mile<br />

– Comb<strong>in</strong>ation of expertise<br />

21<br />

ISHCS 2011


Q & A<br />

http://research.microsoft.com/groups/sa/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!