Application Performance Analysis - Sharkfest - Wireshark
Application Performance Analysis - Sharkfest - Wireshark Application Performance Analysis - Sharkfest - Wireshark
1Mike CanneyPrincipal Network Analystgetpackets.com
- Page 2 and 3: My contact infocontactMike Canney,P
- Page 4 and 5: Commercial vs. Free Capturecapture
- Page 6 and 7: Capture to Disk Appliance (on a bud
- Page 8 and 9: So why did I write multiple 2 Gig t
- Page 10 and 11: Instructor DemodemoTroubleshooting
- Page 12 and 13: Much better and Still on a Budget
- Page 14 and 15: So why focus on the Application?foc
- Page 16 and 17: Scenario XYour company has a remote
- Page 18 and 19: The Application QA Lifecycleqa cycl
- Page 20 and 21: Top Causes for Poor Application Per
- Page 22 and 23: 25When are TCP ACKs sent?
- Page 24 and 25: TCP Window Sizesize• The TCP Wind
- Page 26 and 27: Scenario VII - The Slow File Transf
- Page 28 and 29: Application Turnsturns• An Applic
- Page 30 and 31: Example in WiresharkDisplay Filter:
- Page 32 and 33: So what causes all these App Turns?
- Page 34 and 35: Hint…• Forget about the bandwid
- Page 36 and 37: Remember to Calculate BDP• Bandwi
- Page 38 and 39: TCP Inflight Data in WiresharkThe B
- Page 40 and 41: TCP Inflight Data in WiresharkGraph
- Page 42 and 43: TCP Retransmissionstcp flow• Exce
- Page 44 and 45: ULPs (upper layer protocols)Case in
- Page 46 and 47: CIFS/SMB Quiz• What is faster usi
- Page 48 and 49: 56ULPs (upper layer protocols)
- Page 50 and 51: CIFS/SMB Tuningtuning• SMB Maximu
1Mike CanneyPrincipal Network Analystgetpackets.com
My contact infocontactMike Canney,Principal Network Analyst, getpackets.comcanney@getpackets.com319.389.11372
Capture StrategiescaptureCapture to Disk Appliance (CDA)6
Commercial vs. Free Capturecapture• Define your capture strategy– Data Rates– What are my goals? Troubleshooting vs.Statistical information.– Do I need to capture every packet?7
SPAN vs. TAPSPAN8
Capture to Disk Appliance (on a budget)budget• What is needed?– dumpcap is a command line utility includedwith the <strong>Wireshark</strong> download to enable ringbuffer captures.– Use an inexpensive PC or laptop (best tohave 2 NICs or more).– Basic batch file to initiate capture.– Cascade Pilot9
Dumpcap Exampleexamplecd \program files (x86)\wiresharkdumpcap -i 1 -s 128 -b files:100 -b filesize:2000000 –w c:\traces\internet\headersonly1.pcapThis is a basic batch file that will capture offof interface 1, slice the packets to 128bytes, write 100 trace files of ~2 Gigabytes,and write the trace file out to a pcap file.10
So why did I write multiple 2 Gig trace files?trace file• Pilot!• Pilot can easily read HUGE trace files.• This allows us to utilize our CDA in waysno other analyzer can.• I personally have sliced and diced 100GB trace files in Pilot in a matter ofseconds.11
So how does this all work together?practice• Directory full of 2GB trace files, all timestamped based on when they were writtento disk.• User calls in and complains that “thenetwork” is slow.• Locate that trace file based on time anddate and launch Pilot.12
Instructor DemodemoTroubleshooting user“Network Issue”13
Think about what you just sawhmmm• From a 2 GB trace file we were able to:– Look at the total Network throughput.– See what applications were consuming thebandwidth.– Identify the user that was responsible for consumingthe bandwidth.– Identify the URI’s the user was hitting and what theresponse times were.– Drill down to the packets involved in the slow webresponse time in <strong>Wireshark</strong>.• All in a matter of a few seconds.14
Much better and Still on a Budget…• vShark– Cascade Virtual Shark– Low cost, up to 2 TB of capture capability– Can analyze within a ESX server as well as build astand alone CDA– Great for 1 GB SPANs and below– My new, CDA of choice15
<strong>Application</strong> <strong>Analysis</strong>AppsUsing Pilot and <strong>Wireshark</strong> totroubleshoot <strong>Application</strong> issues16
So why focus on the <strong>Application</strong>?focus• In many cases it is the Network Engineers thathave the tool set to help pinpoint where theproblem exists.• “It’s not the Network!” - The Network is guiltyuntil proven innocent.• <strong>Application</strong> performance issues can impactyour business/customers ability to makemoney.• User Response time is “Relative”.• Intermittent performance issues (movingtarget).17
The “moving target”target• Analyzer placement - Two options– Move the analyzers as needed– Capture anywhere and everywhere• To defend the Network multiple capturepoints of the problem is often the bestsolution.18
Scenario XYour company has a remote site that is connected back tothe Data Center via a T1. There are 25-30 users at thissite and they are complaining that the Network is slow.Not only is is slow but it is down. When you check yourbandwidth usage you see that it is only ~300Kbps. Youdecide to take a trace file at the Data Center and at theRemote site to see what is going on.19
Why are there so many application issues?help• <strong>Application</strong>s are typically developed in a“golden” environment– Fastest PCs– High Bandwidth/low latency• When applications move from test (LAN)to production (WAN) the phone startsringing with complaints coming in.20
The <strong>Application</strong> QA Lifecycleqa cycle• In most organizations, applications go through aQA process• Typical QA/App developers test the following:– Functional tests– Regression tests– Stress tests (server)– Rinse and Repeat• What is often missing is “Networkability” testing• All QA Lifecycles should include Networkabilitytesting21
<strong>Application</strong> Networkablility Testingtesting• Identify key business transactions, number ofusers and network conditions the applicationwill be deployed in.• Simulation vs. Emulation– Simulation is very quick, often gives you roughnumbers of how an application will perform overdifferent network conditions.– Emulation is the only way to determine when anapplication will “fail” under those conditions.• A Combination of both is recommended.22
Top Causes for Poor <strong>Application</strong> <strong>Performance</strong>Top 6• TCP• <strong>Application</strong> Turns• Misconfigured Devices• Layer 7 Bottlenecks• Congestion (network)• Processing Delay23
Causes for Slow <strong>Application</strong> <strong>Performance</strong>TCPtcp24
25When are TCP ACKs sent?
Server Client TCP Hands 5,840 Bytes Up to the applica@on 5,840 Byte Block 5,840 Bytes ACKed 5,840 Byte Block 26
TCP Window Sizesize• The TCP Window Size defines the host’sreceive buffer.• Large Window Sizes can sometimeshelp overcome the impact of latency.• Depending on how the application waswritten, advertised TCP Window Sizemay not have an impact at all (more onthis later).29
TCP Inflight Datainflight• The amount of unacknowledged TCP datathat is on the wire at any given time.• TCP inflight data in limited by thefollowing:– TCP Retransmissions– TCP Window Size– <strong>Application</strong> block size• The amount of TCP inflight data will neverexceed the receiving devices advertisedTCP Window Size.30
Scenario VII – The Slow File Transfer• A customer is having an issue with their FTP to one oftheir customers. They had recently installed adedicated FTP server for this specific customer. Theyhad a 10Mbps MPLS connection and as expected, theNetwork was being blamed for the poor file transfers.When we checked the Bandwidth usage on the MPLSconnection all looked well. In fact, they were hardlyusing any of their bandwidth. The customer wasplanning to upgrade the MPLS connection from10Mbps to 20Mbps in hopes to speed up the filetransfers but decided to let us help them take a look atthe issue before making the purchase of the additionalbandwidth.31
Causes for Slow <strong>Application</strong> <strong>Performance</strong>turns<strong>Application</strong> Turns32
<strong>Application</strong> Turnsturns• An <strong>Application</strong> Turn is a request/responsepair• For each “turn” the application must waitthe full round trip delay.• The greater the number of turns, theworse the application will perform over aWAN (latency bound).33
App TurnBegin End 34
Example in <strong>Wireshark</strong>Display Filter:882 <strong>Application</strong> Turns in this trace35
App Turns and Latencylatency• It is fairly easy to determine App Turnsimpact on end user response time– Multiply the number of App Turns by theround trip delay:• 10,000 turns * .050 ms delay = 500 seconds dueto latency• Note, this has nothing to do withBandwidth or the Size of the WAN Circuit36
So what causes all these App Turns?cause• Size of a fetch in a Data Base call• Number of files that are being accessed• Loading single images in a Web Pageinstead of using an image map• Number of bytes being retrieved and howthey are being retrieved (block size)37
Pop Quiz!• We have an application that uses a 16,384 KB blocksize• This application is going to be rolled out over a DS3with 40 ms of round trip latency.• Users are complaining that the network is slow andthey need to upgrade the DS3 to a 1 Gbpsdedicated link.• What kind of throughput should I expect?39
Hint…• Forget about the bandwidth, it has nothing to dowith the throughput40
Answer• Very simple calculation• Throughput = Blocksize / Round Trip Delay• (16,384 / .04) * 8 = 3,278,800 bits per second• Again, note, this has NOTHING to do withbandwidth it’s all on the application.41
Remember to Calculate BDP• Bandwidth Delay Product• BW * RT Latency /8 = offered load to fill the pipe• (44,000,000 * .04) /8 = 220,000 bits per second tofill the DS342
43TCP Inflight Data in <strong>Wireshark</strong>
TCP Inflight Data in <strong>Wireshark</strong>The Bytes in Flight Column shows us how much payload is in each packet.44
47Easier way for SMB/CIFS
TCP Inflight Data in <strong>Wireshark</strong>Graphed in Excel48
TCP Retransmissionstcp• Every time a TCP segment is sent, aretransmission timer is started.• When the Acknowledgement for thatsegment is received the timer is stopped.• If the retransmission timer expires beforethe Acknowledgement is received, theTCP segment is retransmitted.49
TCP Retransmissionstcp flow• Excessive TCP Retransmissions can havea huge impact on application performance.• Not only does the data have to get resent,but TCP flow control (Slow Start) kicks intoaction.50
ULPs (upper layer protocols)ulp• TCP often gets blamed for the ULPs problem.– The application hands down to TCP amount of data to go retrieve(application block size)– TCP then is responsible for reliably getting that data back to theapplication layer• TCP has certain parameters in which to work with and can usuallybe tuned based on bandwidth and latency• Many times too much focus is put on “tuning” TCP as the fix forpoor performance in the network• If the TCP advertised receive window is set to 64K and the application isonly handing down to TCP requests for 16K, where is the bottleneck?51
ULPs (upper layer protocols)Case in point: CIFS/SMBulp52
Troubleshooting CIFS/SMBCifs/smb• Arguably the most common File Transfermethod used in businesses today.• SMB was NOT developed with the WAN inmind.• One of the most “chatty” protocols/applications I run into (with the exception ofpoorly written SQL).53
CIFS/SMB Quiz• What is faster using MS File Sharing?– Pushing a file to a file server?– Pulling a file from a file server?quiz54
55ULPs (upper layer protocols)
56ULPs (upper layer protocols)
CIFS/SMBcifs/smb• What is faster using MS File Sharing?– Pushing a file to a file server?– Pulling a file from a file server?• SMB Write (Pushing the file) can almost be 2X asfast as pulling (SMB Read)• Depends on the Latency57
CIFS/SMB Tuningtuning• SMB Maximum Transmit Buffer Size– Negotiated MaxBufferSize in the NegotiateProtocol response– Default for Windows servers is typically16644 (dependent upon physical memory)– Client default typically 435658
59CIF/SMB Tuning
CIFS/SMB Tuningtuning• Caveat:– SMB is extremely dependent upon the API• Even though you set the max buffer size to 64K,windows “share” data will always get truncated to60K (61440) even though the server can support64K60
CIFS/SMB Tuning• Custom SMB APIstuning– The Windows limitation can be exceeded byprograms written to use SMB as they filetransfer protocol61
CIFS/SMB TuningNote the SMB writes of 65,536 This is a file transfer using a custom API on a Windows XP machine 62
Instructor Demo of SMB ProfilesdemoDemo of SMB Tracefiles64
65My personal SMB Profile
Take Away Pointspoints• Building your own CDA is easy to do andmay fit in a majority of the areas you needto capture from• Pilot, Pilot, Pilot, it’s not just a fancyreporting engine for <strong>Wireshark</strong>!• Test your applications “Networkability”before they hit production.• Use the <strong>Wireshark</strong> Profiles, they will saveyou a ton of time.67
68Mike CanneyPrincipal Network Analystcanney@getpackets.com