12.07.2015 Views

Best Practices for Sun Solaris Containers and VMware Infrastructure

Best Practices for Sun Solaris Containers and VMware Infrastructure

Best Practices for Sun Solaris Containers and VMware Infrastructure

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

AgendaWhat is OS Virtualization ?What are <strong>Solaris</strong> <strong>Containers</strong> <strong>and</strong> how do they work ?Is it “<strong>VMware</strong> ESX or <strong>Containers</strong>” or “<strong>VMware</strong> ESX <strong>and</strong> <strong>Containers</strong>” ?Examination of common use cases


BackgroundAssumptions:You already underst<strong>and</strong> <strong>VMware</strong> <strong>Infrastructure</strong> componentsYou have heard of <strong>Solaris</strong> <strong>Containers</strong> or ZonesObservations:<strong>Solaris</strong> <strong>Containers</strong> <strong>and</strong> <strong>VMware</strong> <strong>Infrastructure</strong> (ESX) technologies arecomplementaryEach provides a unique set of capabilities <strong>and</strong> efficiencies that can beleveraged togetherThe key to success is knowing when to use each technology


<strong>Solaris</strong> <strong>Containers</strong> <strong>and</strong> OS VirtualizationMultiple isolated execution environments within one <strong>Solaris</strong> instanceIncludes resource management, security, failure isolationLightweight, flexible, efficientMore than 8,000 zones per system (or dynamic system domain)One operating system to manageDevice configuration details hiddenComponents:Workload identification & accounting, process aggregationResource management (CPU, memory, ...)Security/namespace isolation (zones)Features can be used separately or in combination


Evolution of <strong>Solaris</strong> <strong>Containers</strong><strong>Solaris</strong> <strong>Containers</strong> prior to <strong>Solaris</strong> 10Introduced in <strong>Solaris</strong> 2.6 as SRM 1.0 [aka “Share II” scheduler ]Integrated into <strong>Solaris</strong> 9; new comm<strong>and</strong>sRedesigned Fair Share SchedulerResource Capping DaemonIntroduced Extended AccountingBetter integration with Processor Pools/SetsNew In <strong>Solaris</strong> 10:Partitioning <strong>and</strong> Isolation with ZonesDynamic Control of PoolsMore Dynamic Resource Controls• Trend is to move away from /etc/system


<strong>Solaris</strong> <strong>Containers</strong> ComponentsWorkload identificationProcess aggregation via tasks, projectsResource usage log with extended accountingResource Management ToolsGuarantee minimum CPU use (FSS)Limit maximum CPU use (pools, processor sets)Limit physical memory use (resource capping daemon)Limit virtual memory use (projects)Limit network b<strong>and</strong>width use (ipqos)Workload isolation featuresPrivilegesZones


OS Virtualization through <strong>Solaris</strong> ZonesVirtualizes OS layer: file system, devices, network, processesProvides:Privacy: can't see outside zoneSecurity: can't affect activity outside zoneFailure isolation: application or service failure in one zone doesn'taffect othersLightweight, granular, efficientNetworkComplements resource managementNo porting; ABI/APIs are the sameRequires no special hardware assist


Exampleglobal zone (v1280-room3-rack12-2; 129.76.4.24)global zone root: /web zonezone root: /zone/webapp_server zonezone root: /zone/appdatabase zonezone root: /zone/mysqlsystem services(patrol)audit services(auditd)security services(login, BSM)15105web service project(Apache 1.3.22)crypto project(ssl)proxy project(proxy)60020jes project(j2se)app users proj(sh, bash, prstat)system project(inetd, sshd)702010mysql project(mysqld)dba users proj(sh, bash, prstat)system project(inetd, sshd)ApplicationEnvironment/usrconsolehme0ce0default pool(1 CPU; 4GB)ce110/usrzconshme0:1ce0:160/usrzconshme0:2zoneadmdzoneadmdpool1 (7 CPU; 3GB), FSSCe1:1/usrzconshme0:3ce0:3zoneadmdpool2 (4 CPU; 5GB), FSSVirtualPlat<strong>for</strong>mzone management(zonecfg, zoneadm, zlogin)core services(inetd, rpcbind, sshd, ...)remote admin/monitoring(SNMP, <strong>Sun</strong>MC, WBEM)plat<strong>for</strong>m administration(syseventd, devfsadm, ifconfig,...)network device(hme0)network device(ce0)network device(ce1)storage complex


<strong>Solaris</strong> SecurityUser Rights ManagementLimit access to privileged comm<strong>and</strong>s <strong>and</strong> operationsManage “who can do what” centrallyAudit <strong>and</strong> report privileged comm<strong>and</strong> useProcess Rights ManagementGrant or revoke fine-grained privileges to individual processes <strong>and</strong>applicationsImplement “Least Privilege”• Applications can only do exactly what they require to operateUsually removes the need to run as root


<strong>Solaris</strong> Security (cont)More than 40 specific rights historically associated with UID 0 (root)For legacy compatibility, UID 0 has all rights by defaultBasic users have very few default rightsSelectable privilege inheritanceRole-based access control framework enables:Privileges to be assigned to a roleSpecific users to temporarily take on a role, gaining its privilegesKernel en<strong>for</strong>ces rules based on uid <strong>and</strong> current privilegesNo more “if (uid==0)”


<strong>Solaris</strong> Zones <strong>and</strong> SecurityEach zone has a security boundaryRuns with subset of privileges(5)A compromised zone cannot escalate its privilegesImportant name spaces are isolatedProcesses running in a zone are unable to affect activity in other zonesZone-aware audit:Global zone administrator can specify whether auditing should beglobal or per-zoneIf per-zone, each zone administrator can configure <strong>and</strong> process theiraudit trails independently<strong>Solaris</strong> 10 11/06 introduces configurable privileges


<strong>Solaris</strong> Zones Security Limitscontract_eventRequest reliable delivery of eventsproc_lock_memoryLock pages in physical memorycontract_observerObserve contract events <strong>for</strong> other usersproc_ownerSee/modify other process statescpc_cpuAccess to per-CPU perf countersproc_priocntlIncrease priority/sched classdtrace_kernelDTrace kernel tracingproc_sessionSignal/trace other session processdtrace_procDTrace process-level tracingproc_setidSet process UIDdtrace_userDTrace user-level tracingproc_taskidAssign new task IDfile_chownChange file's owner/group IDsproc_zoneSignal/trace processes in other zonesfile_chown_selfGive away (chown) filessys_acctManage accounting system (acct)file_dac_executeOverride file's execute permssys_adminSystem admin tasks (e.g. domain name)file_dac_readOverride file's read permssys_auditControl audit systemfile_dac_searchOverride dir's search permssys_configManage swapfile_dac_writeOverride (non-root) file's write permssys_devicesOverride device restricts (exclusive)file_link_anyCreate hard links to diff uid filessys_ipc_configIncrease IPC queuefile_ownerNon-owner can do misc owner opssys_linkdirLink/unlink directoriesfile_setidSet uid/gid (non-root) to diff idsys_mountFilesystem admin (mount,quota)ipc_dac_readOverride read on IPC, Shared Mem permssys_net_configConfig net interfaces,routes,stackipc_dac_writeOverride write on IPC, Shared Mem permssys_nfsBind NFS ports <strong>and</strong> use syscallsipc_ownerOverride set perms/owner on IPCsys_res_configAdmin processor sets, res poolsnet_icmpaccessSend/Receive ICMP packetssys_resourceModify res limits (rlimit)net_privaddrBind to privilege port (


ProcessesCertain system calls are not permitted or have restricted scope inside azoneFrom the global zone, all processes can be seen but control is privilegedFrom within a zone, only processes in the same zone can be seenor affectedproc(4) has been virtualized to only show processes in the same zone# prstat -ZPID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP1344 root 8956K 8108K sleep 59 0 0:00:04 2.0% svc.configd/141342 root 7312K 6456K sleep 59 0 0:00:01 0.4% svc.startd/121460 root 3824K 2932K sleep 59 0 0:00:00 0.1% inetd/4ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE1 23 78M 46M 4.5% 0:00:05 2.8% zone1


Networking <strong>and</strong> Interprocess CommunicationSingle TCP/IP stack <strong>for</strong> the system (today) so that zones can beshielded from configuration details <strong>for</strong> devices, routing <strong>and</strong> IPMPEach zone can be assigned IPv4/IPv6 addresses <strong>and</strong> has its own portspaceApplications can bind to INADDR_ANY <strong>and</strong> will only get traffic <strong>for</strong> thatzoneZones cannot see the traffic of othersGlobal zone can snoop traffic of all zonesExpected IPC mechanisms such as System V IPC, STREAMS, sockets,libdoor(3LIB) <strong>and</strong> loopback transports are available inside a zoneKey name spaces virtualized per zoneInter-zone communication is available using st<strong>and</strong>ard network interfacesover a private memory channel.Global zone can setup rendezvous too, although this is not commonlyneeded


Devices <strong>and</strong> FilesystemsUnlike chroot(2), processes cannot escape out of a zone'sfilesystemsAdditional directories can be mounted read-writeExample /usr/localFilesystems mounted by zoneadmd at zone boot time.Global zone managed filesystems also supportedThird party filesystems also work (ex: VxFS)Zones see a subset of “safe” pseudo devices in their /dev directoryDevices like /dev/r<strong>and</strong>om are safe but others like/dev/ip are notZones can modify the permissions of their devices but cannotmknod(2)Physical device files like those <strong>for</strong> raw disks can be put in a zone withcautionOften unnecessary due to on-disk filesystem support in zonecfg


Zones <strong>and</strong> <strong>Solaris</strong> Dynamic Tracing (DTrace)Zonename variable availableExample: Count syscalls by zone:# dtrace -n 'syscall:::/zonename==”red”/{@[probefunc=count()}'Also available: curpsinfo->pr_zoneidDTrace can be useful <strong>for</strong> tracing multiple application tiers in conjunctionwith zonesEliminates complexities such as clock skew<strong>Solaris</strong> 10 11/06 configurable privileges will allow dtrace_user <strong>and</strong>dtrace_proc to be granted to a zoneAllows tracing of processes (pid) <strong>and</strong> system calls (syscall)


Zones, Resources <strong>and</strong> LimitsBy default, all zones use all CPUsAlso, tools like prstat base %'s on all CPUsRestricted view is enabled automatically when resource pools areenabledvirtualized view based on the pool (pset) bindingAffects iostat(1M), mpstat(1M), prstat(1M), psrinfo(1M),sar(1), etc.sysconf(3C) (when detecting number of processors) <strong>and</strong>getloadavg(3C)numerous kstat(3KSTAT) values from the cpu, cpu_info <strong>and</strong>cpu_stat publishersOracle licensing to pool size


Resource PoolsGlobal ZoneTwilightZoneSchoolZoneNoParkingZonecpu0cpu1cpu2cpu3 cpu4 cpu5cpu6cpu7Default Resource PoolResource Pool AResource Pool B


Zones <strong>and</strong> the Fair Share Scheduler (FSS)13twilightdropfractureglobal1Shares Allocatedto Zones2DatabaseProject64354Shares Allocated byZone Administrator2(3+1+2+1)x6(4+5+4+3+6) = 2 76x = 622 77 ~ 7.8%


Sparse vs Whole Root ZonesEach zone is assigned its own root file system <strong>and</strong> cannot see that ofothersThe default file system configuration is called a “sparse-root” zoneThe zone contains its own writable /etc, /var, /proc, /devInherited file systems (/usr, /lib, /plat<strong>for</strong>m, /sbin) are read-onlymounted via a loopback file system (LOFS)/opt is a good c<strong>and</strong>idate <strong>for</strong> inheritingA zone can be created as a “whole-root” zoneThe zone gets its own writable copy of all <strong>Solaris</strong> file systemsAdvantages of a sparse root zoneFaster patching <strong>and</strong> installation due to inheritance of /usr <strong>and</strong> /libRead-only access prevents trojan horse attacks against other zonesLibraries shared across all zones reducing VM footprint


Packages <strong>and</strong> PatchesZones can add <strong>and</strong> remove own packages <strong>and</strong> patches (i.e. database)Assuming packages don't conflict with global zone packages(or allzone packages)System PatchesApplied in global zoneThen in each non-global zones (zone will automatically boot -s toapply patch)Package typesSUNW_PKG_HOLLOW: Package info exists (to satisfy dependencies)but its contents are not present.SUNW_PKG_ALLZONES: Package will be kept consistent between theglobal zone <strong>and</strong> all non-global zones (e.g. kernel drivers).SUNW_PKG_THISZONE: If true, package installs only in the currentzone (like pkgadd -G). If installed in the global zone, it will not bemade available to future zones.


Zone Administrationzonecfg(1M) is used to specify resources (e.g. IP interfaces) <strong>and</strong>properties (e.g. resource pool binding)zoneadm(1M) is used to per<strong>for</strong>m administrative steps <strong>for</strong> a zone suchas list, install, (re)boot, haltInstallation creates a root file system with factory-default editable filesA zone can be cloned very quickly using ZFSA zone can be moved to another system with detach/attachzlogin(1) is used to access a zonezlogin -C to access the zone console


Zone Installation ProcessBy default, all of the files that are packaged in the global zone are storedin the new zonePackaged files are copied directly out of the global zone's root filesystem except <strong>for</strong> those that are editable or volatile (see pkgmap(4))Editable <strong>and</strong> volatile files are copied from the sparse-root packagearchiveholds factory default copies of filesA properly configured sparse-root zone is typically about 70-100MB; awhole-root zone is 3-5GB depending on installed packages


When to Use a Whole Root ZoneUse full root zones when writes into /usr or /lib cannot be containedWritable loopback mounts <strong>for</strong> individual directories (such as /usr/java)can be used <strong>for</strong> sparse root zonesSometimes this is not practical (example: /usr/bin)Use of writable loopback mounts makes /opt a good c<strong>and</strong>idate <strong>for</strong>inheritanceRequirement to patch <strong>Solaris</strong> user components individuallyThird party software typically installed in /optUse a sparse root zone <strong>for</strong> all other situations


Single or Multiple Applications ZonesSingle application zonesLow overhead (administrative <strong>and</strong> per<strong>for</strong>mance) makes this arecommended practiceAll configuration files are in the default locationVirtualized IP space allows applications to reside on well known portsPatching is simplified due to applications being where they areexpectedMultiple application zonesWhen applications require or can benefit from shared memory


<strong>VMware</strong> <strong>and</strong> <strong>Solaris</strong> <strong>Containers</strong> General ApproachUse <strong>VMware</strong> whenUsing heterogeneous or multiple (incompatible) versions of operating systemsConsolidated privileged applications are unstableOperating system maintenance windows become unmanageableRequiring live migrationRunning obsolete operating systems on current hardwareUse <strong>Solaris</strong> <strong>Containers</strong> whenFine grained control of resource limitsLeveraging advanced <strong>Solaris</strong> features such as DTrace, Fault Management(FMA), ZFSResource sharing between environments can reduce plat<strong>for</strong>m costsDeploying extremely heavy or very light services• Applications require high I/O throughput (databases)Combine generously as real world conditions are never sim


<strong>Solaris</strong> <strong>Containers</strong> <strong>Best</strong> <strong>Practices</strong>Use sparse root zones where possibleMaximize sharing of componentsMinimize memory footprint (shared libs, binaries)Use full root zones only where neededExtensive writing into /usrCore component patch testingUse of ZFS clones will make this much more attractiveGroup applications into zonesBy shared memory requirementsBy user credential domainUse loopback file system mounts to share dataUse NFS to share data <strong>for</strong> zones that will be migrated


<strong>Solaris</strong> <strong>Containers</strong> <strong>Best</strong> <strong>Practices</strong> (cont)File system backup can be run in the global zoneNon-global zones have no private file system data that is not visible tothe global zoneRun backup clients in the non-global zone when there is someapplication state that needs to be captured or modifiedRun a minimum number of services in the global zonesshIntrusion detection <strong>and</strong> auditingHardware monitoringAccountingBackup


Use Case 1: You are in a maze of web servers, all alikeConsiderationsWeb servers prefer to live on well known portsServer utilization can be very lowMany configurations are basicConsolidating in a single operating system can become very complexClassic partitioning problem in disguiseRecommendationOne web server instance per <strong>Solaris</strong> zoneVery few operating system dependenciesConfiguration files are all in their well known locationPatch automation is simplifiedSeparate content creation <strong>for</strong> a more secure solutionCan leverage <strong>Solaris</strong> least privilege


Use Case 2: Web 2.0ConsiderationsOperating system dependencies more complicatedAvoid unintended application linkages that make future updates orredeployment difficultLeverage operating system hardening <strong>and</strong> privilege minimizationRequire fine grain control over resource utilizationRecommendationUse <strong>Solaris</strong> zones with one application instance per zoneDeploy only the OS components necessary to support the serviceUse configurable privileges to limit access to memory, networkinterfaces <strong>and</strong> kernel modules.ExceptionsServices that have dependencies on kernel modulesHeterogeneous operating system requirements


Use Case 3: ERP in a boxConsiderationsDifferent than typical ERP l<strong>and</strong>scape• Trade off database per<strong>for</strong>mance considerations <strong>for</strong> reduced footprintNot all features or applications run on all operating systemsWill require a combination of virtualization <strong>and</strong> partitioningDesire fine grain control of resourcesObservability <strong>and</strong> security are desired features, especially indevelopmentRecommendationUse <strong>VMware</strong> ESX server to host multiple operating systemsRun database in one zone <strong>and</strong> application logic in separate zonesbased on software scalability features• <strong>Solaris</strong> Dynamic Tracing can be used across tiersHost additional guest virtual machines <strong>for</strong> interfaces <strong>and</strong> applicationfeatures not available on <strong>Solaris</strong>


Use Case 4: Enterprise Java Application DevelopmentConsiderationsLeverage advanced development tools such as DTrace• Java Virtual Machine DTrace provider is very h<strong>and</strong>yIsolate to minimize impact on other developersDevelop in same environment as deploymentRapidly provision complete software stacksRecommendationCreate development zones that mirror production <strong>and</strong> testenvironmentsUse zone privilege limits to safely delegate administrative roles todevelopersExceptionsHeterogeneous plat<strong>for</strong>m developmentDevelop on multiple operating system versions


Conclusion<strong>Solaris</strong> <strong>Containers</strong> <strong>and</strong> <strong>VMware</strong> <strong>Infrastructure</strong> (ESX) technologies arecomplementary• Each provides a unique set of capabilities <strong>and</strong> efficiencies that canbe leveraged togetherThe key to success is knowing when to use each technology


References <strong>and</strong> Additional ReadingZones BigAdmin site:http://www.sun.com/bigadmin/content/zones<strong>Solaris</strong> Zones: Operating System Support <strong>for</strong> Server Consolidation.(LISA 2004, available from BigAdmin)<strong>Solaris</strong> <strong>Containers</strong> Blueprint:http://www.sun.com/blueprints/0505/819-2679.html<strong>Solaris</strong> Kernel Engineering <strong>and</strong> Field Technical Weblogshttp://blogs.sun.com/comayhttp://blogs.sun.com/dphttp://blogs.sun.com/jclinganhttp://blogs.sun.com/joostpZones/<strong>Containers</strong> FAQ on opensolaris.orgzones-interest@opensolaris.org mailing list<strong>Solaris</strong> 10 global zone with 3 containers (web, app, <strong>and</strong> dba)http://www.vmware.com/vmtn/appliances/directory/227


Thank you!


Presentation DownloadPlease remember to complete yoursession evaluation <strong>for</strong>m<strong>and</strong> return it to the room monitorsas you exit the sessionThe presentation <strong>for</strong> this session can be downloaded athttp://www.vmware.com/vmtn/vmworld/sessions/Enter the following to download (case-sensitive):Username: cbv_repPassword: cbv<strong>for</strong>9v9r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!