10.07.2015 Views

Design and Linux Server Implementation Experiences

Design and Linux Server Implementation Experiences

Design and Linux Server Implementation Experiences

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sessions Overview✽Correctness✽✽✽✽Exactly Once SemanticsExplicit negotiation of bounds✽Clients make best use of available resources1 client, many sessions✽✽/usr/bin (read only, no cache, many small requests)/home (read/write, cache, fewer, larger requests)Client-initiated back channel✽✽Eliminates firewall woesCan share connection, no need to keep alive


Example of 4.0 ComplexitySETCLIENTID implementation discussion from RFC 3530The server has previously recorded a confirmed { u, x, c, l, s }record such that v != u, l may or may not equal k, <strong>and</strong> recordedan unconfirmed { w, x, d, m, t } record such that c != d, t != s, mmay or may not equal k, m may or may not equal l, <strong>and</strong> k may ormay not equal l. Whether w == v or w != v makes no difference.The server simply removes the unconfirmed { w, x, d, m, t }record <strong>and</strong> replaces it with an unconfirmed { v, x, e, k, r }record, such that e != d, e != c, r != t, r != s.The server returns { e, r }.The server awaits confirmation of { e, k } viaSETCLIENTID_CONFIRM { e, r }.


Sessions Overview (continued)✽✽Simplicity✽✽CREATECLIENTID, CREATESESSION✽Eliminate callback informationDuplicate Request Cache✽✽✽Explicit part of protocolNew metadata eases implementation; RPC independentSee implementation discussionSupport for RDMA✽✽✽Reduce CPU overheadIncrease throughputSee NFS/RDMA talks for more


Draft Issues✽✽✽False Starts✽✽Channels & Client/Session RelationshipChainingOpen Issues✽✽Lifetime of client stateManagement of RDMA-specific parametersFuture Directions✽✽“Smarter” clients & serversBack channel implementation


Channels✽✽✽Originally, sessionid ⋲ clientid;1 session, many channelsDirect correspondence to transport instance✽✽Back & operations channels are similarSame BINDCHANNEL operationProtocol Layering Violation✽✽✽ULP should be insulated from transportKiller use case: <strong>Linux</strong> RPC auto-reconnectsLesson: layering violations & LLP assumptions


Channels (continued)✽ Now clientid:sessionid is 1:N✽✽✽✽Per-channel control replaced by per-sessionSessions can be accessed by any connection✽Facilitates trunking, failoverNo layering violations on forward channelBack channel still bound to transport✽✽✽Only way to achieve client-initiated channelLayering violation, not required featureNot yet implemented, possibly more to learn


Chaining ExampleNFS v4.0Allows COMPOUND procedures to contain anarbitrary number of operationsNFS v4.1 SessionsSince the maximum size of a COMPOUND is negotiated,arbitrarily large compounds are not allowed. InsteadCOMPOUNDS are “chained” together to preserve stateCOMPOUNDOPERATION 1OPERATION kCOMPOUND 1CHAIN: BEGINOPERATION 1OPERATION iCOMPOUND mCHAIN: CONTINUEOPERATION i + 1OPERATION jCOMPOUND nCHAIN: ENDOPERATION j + 1OPERATION k


Chaining✽✽✽Max request size limits COMPOUND✽✽4.0 places no limit on size or # of operationsFile h<strong>and</strong>les live in COMPOUND scopeOriginally sessions proposed chaining facility✽✽Preserve COMPOUND scope across requestsChain flags in SEQUENCEChaining eliminated✽✽✽✽Ordering issues across connections problematicAnnoying to implement <strong>and</strong> of dubious valueLarge COMPOUNDS on 4.0 get errors anywaySessions can still be tailored for large COMPOUNDS


<strong>Implementation</strong> Challenges✽✽✽✽✽Constantly changing specification✽✽Problem for me, but not for youTime implementing dead-end conceptsFast pace of <strong>Linux</strong> kernel development✽Difficulty merging changes from 4.0 developmentLack of packet analysis toolsSEQUENCE operation✽✽Unlike other v4 operationsRequires somewhat special h<strong>and</strong>lingDuplicate Request Cache


Duplicate Request Cache✽ No real DRC in 4.0; Compare to v3.0(on <strong>Linux</strong>)✽✽✽Global scope✽✽All client replies saved in same poolUnfair to less busy clientsSmall✽✽Unlikely to retain replies long enoughNo strong semantics govern cache evictionGeneral DRC Problems✽✽Nonst<strong>and</strong>ard <strong>and</strong> undocumentedDifficult to identify replay with IP & XID


4.1 Sessions Cache Principles✽✽✽✽✽✽✽✽✽Actual part of the protocolClients can depend on behaviorIncreases reliability <strong>and</strong> interoperabilityReplies cached at session scopeMaximum number of concurrent requests &maximum sizes negotiatedCache access <strong>and</strong> entry retirementReplays unambiguously identifiedNew identifiers obviate caching of request dataEntries retained until explicit client overwrite


DRC Initial <strong>Design</strong>✽✽✽Statically allocated buffers based on limitsnegotiated at session creationHow to save reply?✽✽Tried to provide own buffers to RPC, no can doStart simple, copy reply before sendingKiller problem: can’t predict response size✽✽✽If reply is too large, it can’t be saved in cacheMust not do non-idempotent non-cacheable opsOperations with unbounded reply size: GETATTR,LOCK, OPEN…


DRC Redesign✽✽✽No statically allocated reply buffersAdd reference to XDR reply pages✽✽✽✽Tiny cache footprintNo copies, modest increase in memory usageLayering? This is just one implementation;<strong>Linux</strong> RPC is inexorably linked to NFS anyway1 pernicious bug: RPC status pointerLarge non-idempotent replies still a problem✽✽Truly hard to solve, given current operationsIn practice, not a problem at all (rsize,wsize)


DRC StructuresSession Statestruct nfs4_session {/* other fields omitted */u32 se_maxreqsize;u32 se_maxrespsize;u32 se_maxreqs;struct nfs4_cache_entry *se_drc;};SEQUENCE Argumentsstruct nfsd4_sequence {sessionid_t se_sessionid;u32 se_sequenceid;u32 se_slotid;};Slot IDSequence IDStatusXDR Reply011complete0xBEEFBE101286in-progress0xDECAFBADmaxreqs - 10available0x00000000


DRC Fringe Benefit✽✽4.0 Bug: Operations that generate upcalls✽✽✽Execution is deferred & revisited (pseudo-drop)Partial reply state not savedNon-idempotent operations may be repeatedSessions Solution✽When execution is deferred retain state in DRC✽ Only additions are file h<strong>and</strong>les & operation #✽Revisit triggers DRC hit, execution resumes


DRC Future✽✽✽Refinement, stress testingCompare performance to v3Quantify benefits over stateful operationcaching in 4.0✽ Backport to v4.0✽✽✽No session scope, will use client scopeNo unique identifiers, must use IP, port & XIDMore work, but significant benefit over v3


<strong>Implementation</strong> Delights✽✽Draft changes made for better code✽✽DRC & RPC uncoupledSETCLIENTID & SETCLIENTID_CONFIRMRelatively little code✽✽✽✽CREATECLIENTIDCREATESESSIONDESTROYSESSIONSEQUENCE (Duplicate Request Cache)


Conclusions✽✽✽Basic sessions additions are positive✽✽Reasonable to implementDefinite improvements: correctness, simplicityLayering violations✽✽Avoid in protocolCan be leveraged in implementationFurther additions require moreinvestigation✽✽Back channelRDMA


Questions & Other Issues✽✽✽Open Issues✽✽Lifetime of client stateManagement of RDMA-specific parametersFuture Directions✽✽“Smarter” clients & serversBack channel implementationRDMA/Sessions Draft✽✽Under NFSv4 Drafts at IETF sitehttp://ietf.org/internet-drafts/draft-ietf-nfsv4-sess-01.txt

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!