16.02.2014 Views

Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...

Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...

Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Per<strong>for</strong>mance</strong> <strong>Tuning</strong> <strong>for</strong> <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>:<br />

Strategies & Tactics<br />

Chris Rothwell, <strong>Fishbowl</strong> Solutions<br />

Paul Heupel, <strong>Fishbowl</strong> Solutions<br />

Introduction<br />

<strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> 10g functionality was effectively contained in one container - the <strong>Content</strong> Server. This<br />

fact alone made it easy to deploy, administer, and customize. However, <strong>for</strong> all of these easy capabilities, the product<br />

was somewhat lacking on the scalability and per<strong>for</strong>mance side. With <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>, and the<br />

inclusion of <strong>Oracle</strong> Weblogic Server, SOA and BPM, the product has expanded its functionality to achieve best-in-<br />

class per<strong>for</strong>mance and scalability. The new tradeoff is that these additional infrastructure components have created<br />

a layer of complexity that often leads to delayed deployments and non-optimized optimized systems. The good news is that<br />

with the right tuning strategies and appropriate use of reverse proxies and load balancing you can truly optimize<br />

<strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong> and maximize your technology investment.<br />

<strong>Tuning</strong> <strong>WebCenter</strong> <strong>Content</strong><br />

<strong>Oracle</strong>’s ECM solution has its roots in the Stellent <strong>Content</strong> Management offering. From Xpedio 4.5 to <strong>Oracle</strong> 10gR3<br />

the content management system’s core was a Java Standard Edition based solution. <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong> is<br />

deployed as a Fusion Middleware solution. Optimization techniques that held true from the Xpedio days to UCM<br />

10gR3 may no longer apply to the latest incarnation of Fusion Middleware’s e’s Enterprise <strong>Content</strong> Management (ECM)<br />

system.<br />

<strong>Content</strong> Server architecture<br />

Memory and Java Virtual Machine (JVM) <strong>Tuning</strong><br />

Memory is still one of the most significant per<strong>for</strong>mance tuning areas with <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong>. As part of the<br />

Fusion Middleware e stack <strong>WebCenter</strong> <strong>Content</strong> requires a Java Enterprise Edition container. Prior versions of the<br />

<strong>Content</strong> Server ran as a Java Standard Edition application. Under UCM 10gR3 and earlier, you could specify JVM<br />

tuning in the $UCM_HOME/bin/intradoc.cfg<br />

configuration file. <strong>Tuning</strong> was somewhat limited since the<br />

JAVA_OPTIONS would append custom parameters with computed values.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


In <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>, the content management solution is deployed inside Weblogic Server (WLS), with any<br />

JVM tuning per<strong>for</strong>med on the application server. You have full control, either by modifying the managed server<br />

using the administrative console or modifying the USER_MEM_ARGS environment variable startup scripts.<br />

<strong>Oracle</strong>’s documentation suggests the following on Unix and Windows with the JRockit JVM:<br />

-Xms256m -Xmx1024m –XnoOpt<br />

The –Xmx flag specifies the maximum heap size with this example specifying 1GB of memory. Best practice is to<br />

keep your JVM heap settings under 75-80% of the available physical RAM, within limits <strong>for</strong> machines with<br />

excessive amounts of memory. As heap size is increased, CPU load will also increase <strong>for</strong> larger garbage collections.<br />

Under 32-bit operating systems, 1.5GB is the practical maximum limit assuming other services are not consuming<br />

resources.<br />

The –Xms flag specifies the minimum heap size on initial startup. Increasing the heap takes considerable time, so it<br />

is best to set the Xmx and Xno parameters to the same size. For example:<br />

-Xms1024m -Xmx1024m -XnoOpt -XgcPrio:throughput<br />

On x86 and x64 hardware, JRockit should be the preferred JVM. JRockit was a Java virtual machine optimized <strong>for</strong><br />

x86 hardware by Intel, purchased by BEA, and acquired by <strong>Oracle</strong>. The JRockit JVM per<strong>for</strong>ms significantly faster<br />

on x86 or x64 Windows and Linux architectures than Sun’s architecturally neutral JVM implementation.<br />

An example of JVM tuning, from another <strong>Oracle</strong> whitepaper, started with:<br />

-Xms3g<br />

-Xmx3g<br />

-XX:PermSize=512m<br />

-XX:MaxPermSize=512m<br />

-XX:+UseParallelGC<br />

-XX:ParallelGCThreads=8<br />

-verbose:gc<br />

-XX:+PrintGCDetails<br />

-XX:+PrintGCTimeStamps<br />

-XX:NewRatio=3<br />

-XX:+UseAdpativeSizePolicy<br />

-XX:+AggressiveHeap<br />

-XX:+DisableExplicitGC<br />

-Xnoclassgc<br />

-Xloggc:<br />

and continued to tune <strong>WebCenter</strong> as:<br />

-d64<br />

-server<br />

-Xms3g<br />

-Xmx3g<br />

-XX:PermSize=512m<br />

-XX:MaxPermSize=1024m<br />

-XX:+AggressiveOpts<br />

-XX:+UseParallelGC<br />

-XX:ParallelGCThreads=16<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


-verbose:gc<br />

-XX:+PrintGCDetails<br />

-XX:+PrintGCTimeStamps<br />

-XX:NewRatio=4<br />

-Xnoclassgc<br />

-Xloggc:<br />

-Dweblogic.threadpool.MinPoolSize=72<br />

-Dweblogic.threadpool.MaxPoolSize=72<br />

-Dweblogic.SocketReaders=12cketReaders=12<br />

-Djps.auth.debug=false<br />

Operating system architecture does not on its own provide enough in<strong>for</strong>mation to properly<br />

tune the <strong>Content</strong> Server.<br />

As seen in the above example, repeated tuning and testing was required to find an optimum configuration. The<br />

content repository has the additional complexity of requiring different per<strong>for</strong>mance configurations <strong>for</strong> contribution<br />

and consumption environments. A heavy ingestion pattern will benefited from a -XgcPrio:throughput garbage<br />

collection, while searching may benefited from other GC models.<br />

Confirm your capitalization is correct. In many cases, command-line options are case sensitive unless explicitly<br />

stated. A configuration flag improperly set may be ignored, or cause unintended consequences.<br />

Disk<br />

Usage<br />

<strong>WebCenter</strong> <strong>Content</strong>, like the earlier versions of the content repository, has a variety of disk mounting options, with<br />

implications <strong>for</strong> what type of storage may be appropriate <strong>for</strong> each area. Directories within the content repository<br />

may have different service level agreements and per<strong>for</strong>mance requirements. Using a single storage system does<br />

not produce optimal per<strong>for</strong>mance-cost cost optimization.<br />

The latest incarnation of the <strong>Oracle</strong> <strong>Content</strong> Repository, a shared file system is still required <strong>for</strong> clustering. The<br />

ECM services run as Java processes. Prior to <strong>11g</strong>, these services took the strategy of keeping a memory cache,<br />

writing to a shared file system or database, and having the other nodes update their local cache. All content<br />

management services continue to be stateless and utilize the same concurrency mechanism even though they are<br />

living in a Java Enterprise Edition world.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


<strong>Content</strong> Server in a clustered configuration<br />

High per<strong>for</strong>mance low latency shared disk space is critical <strong>for</strong> per<strong>for</strong>mance in the shared directory.<br />

When a file is ingested into the content repository, it is placed in the<br />

/user_projects/domains//ucm/cs/vault/~temp<br />

directory. From that directory, the file is copied to any refineries, copied <strong>for</strong> full-text indexing, any necessary<br />

trans<strong>for</strong>mations created, and moved to the appropriate vault and weblayout locations. File IO is key to that ~temp<br />

directory, with five or more read operations as part of a standard check in.<br />

All other sub directories within the vault are the ‘native’ or original file checked into the repository. The vault<br />

directory is a long term archive <strong>for</strong> the asset, and should be viewed from a disaster recovery perspective. A copy of<br />

the file, or a version intended <strong>for</strong> heavy consumption,<br />

is typically placed in the weblayout directory. Any file in the<br />

weblayout directory could be recreated, so emphasis should be on per<strong>for</strong>mance rather than reliability.<br />

In 10gR4 and below, <strong>Content</strong> IDs or dDocNames had required optimizations like the Fast Checkin component to get<br />

around row locking on the counter tables under heavy ingestion. The <strong>11g</strong> repository changed the way the identifiers<br />

were generated, caching a block of content identifiers. There may be minor gaps in the sequence of content<br />

identifiers, which can be ignored.<br />

Prior to <strong>11g</strong>, a typical installation would have data, search, shared, and weblayout directories that were typically<br />

excluded from virus scanning. These directories still exist in <strong>11g</strong>, but are now found in the domain directory rather<br />

than the base UCM path. For example, in 10g:<br />

/server/weblayout<br />

became<br />

/user_projects/domains//ucm/cs/weblayout<br />

WebLogic logging directories should also avoid virus scanning in version <strong>11g</strong> and later.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


Logging<br />

<strong>11g</strong> uses the Weblogic logging. The granularity of in<strong>for</strong>mation sent to the logging system goes from<br />

TRACE, DEBUG, INFO, NOTICE, WARNING, ERROR, CRITICAL, ALERT, to EMERGENCY. In production<br />

environment, change the logging level to ERROR. One could modify the<br />

/user_projects/domains//config/servers/UCM_server1/logging.xm<br />

l<br />

or modify the logging levels using the Weblogic administrative console.<br />

File Store Providers<br />

<strong>Oracle</strong>’s ECM solution moved a File Store Provider to accommodate different usage patterns. The default file store<br />

provider in <strong>11g</strong> continues to use the vault/weblayout file structure.<br />

Classically, the <strong>Oracle</strong> ECM solution would store relational data in a database and files in a file system. As the<br />

number of managed assets increased, some scalability issues became apparent. Three metadata fields – dDocType,<br />

dSecurityGroup, and dSecurityAccount – were used to spread the assets out to multiple directory structures. There<br />

is a limit to how many files can go into a directory structure, and as the number of assets grew into the tens of<br />

millions, hundreds of millions, and eventually billions inode issues and disk management became a bottleneck. UCM<br />

updated the default file store provider to add additional dispersion directories to spread out the files.<br />

A database file store provider was added where the assets are persisted in the database rather than a file system.<br />

The <strong>Oracle</strong> <strong>11g</strong>R2 Database SecureFiles API improved per<strong>for</strong>mance by over 40% compared to the 10g<br />

implementation. <strong>Per<strong>for</strong>mance</strong> matches, and in some cases exceeds, major networked file systems. In addition to<br />

the I/O gains, repositories that have Database Compression will automatically have de-duplication per<strong>for</strong>med<br />

against content stored the repository.<br />

When content is uploaded to the repository, a temporary file is placed in the vault/~temp location with a cache<br />

cleanup eventually clearing out that disk space. The current version allows that cache to be limited to one day, so<br />

care must be taken when ingesting very large volumes of content. <strong>Content</strong> must also be indexed be<strong>for</strong>e that<br />

temporary area becomes a candidate <strong>for</strong> cleanup.<br />

Virtualization<br />

<strong>Oracle</strong> differentiates between hard and soft portioning from a licensing perspective. With hard partitioning in use,<br />

one only licenses the CPU used by the virtual machine. Soft partitioning requires licensing <strong>for</strong> all CPUs in the host<br />

machine. <strong>Oracle</strong> VM can be configured to qualify as hard partitioning, but EMC VMWare is considered soft<br />

partitioning. Hardware prices are trivial compared to software, so optimize the virtual hosts to consolidate licenses.<br />

Typically, multiple smaller instances per<strong>for</strong>m better than fewer larger instances. Attempt to optimize CPU<br />

utilization, adding additional CPUs to the host servers as needed.<br />

While CPU architecture, socket, and cores impact the licensing costs, memory does not. A physical CPU may be<br />

shared among multiple virtual machines, but memory should not be a pooled resource.<br />

Services and Components<br />

<strong>WebCenter</strong> <strong>Content</strong> continues the service-based architecture introduced in earlier versions of the content<br />

repository. Services that return search results, metadata, or actual assets can be extended or overridden.<br />

GET_SEARCH_RESULTS, <strong>for</strong> example, can return a large amount of data if a repository has many custom metadata<br />

fields. The content repository will cache the search results, but network traffic can be significantly reduced by<br />

creating a template that returns only the fields and result sets needed.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


IDOC script includes can be cached, so the html will be dynamically rendered and then placed in session scope <strong>for</strong> a<br />

specific user or application scope <strong>for</strong> all users. The cacheInclude method takes the include name, scope, and life<br />

span as required parameters. For example, the std_page_begin include would be cached <strong>for</strong> ten minutes <strong>for</strong> each<br />

user.<br />

<br />

<strong>11g</strong> continues to lack a default success status code or message returned with all services. <strong>Content</strong> Services<br />

typically indicate an error by setting StatusCode property to a non-zero number. CIS, RIDC, and several other<br />

integration methods will potentially throw an exception when there are problems, but will absolutely throw an<br />

exception if you look <strong>for</strong> the StatusCode property and it is missing. One can either trust the service will throw an<br />

exception and assume it works, or modify the content server to set a default success status code.<br />

Reverse Proxy<br />

When architecting a website <strong>for</strong> high per<strong>for</strong>mance using Webcenter <strong>Content</strong>, a reverse proxy can be used to<br />

improve both per<strong>for</strong>mance and security <strong>for</strong> your site. A reverse proxy functions as a gateway to your network and<br />

adds an additional layer of caching <strong>for</strong> site visitors that will help to improve page response time, particularly under<br />

heavy load.<br />

Typically, a reverse proxy will reside in the DMZ of your network and will be the entry point <strong>for</strong> users accessing your<br />

site. The standard process flow <strong>for</strong> a user accessing a site behind a reverse proxy is as follows:<br />

1. A user enters http://www.mysite.com in a browser.<br />

2. DNS directs the user to the reverse proxy server that is sitting in your DMZ.<br />

3. The reverse proxy determines if the request is being made <strong>for</strong> static content or dynamic content.<br />

4. If static content is being requested, the reverse proxy will check its cache and will return the cached page to<br />

the user if the page is found in the cache.<br />

5. If dynamic content is being requested or if the reverse proxy does not have the page in its cache, it will send<br />

a request through the firewall to a web server inside your network to retrieve the requested page and will<br />

return that page to the user.<br />

The per<strong>for</strong>mance gains from caching at the reverse proxy level are obviously contingent on a number of factors<br />

including the number of static resources and pages that users are requesting, the frequency at which those items<br />

are accessed, and the hardware-network infrastructure that is being used. One popular reverse caching application,<br />

Varnish Cache, claims to improve delivery by a factor of 300 – 1000x depending on architecture when serving a<br />

page from cache (www.varnish-cache.org),.<br />

Besides the caching advantage of this model, your site also gains an additional level of security by implementing a<br />

reverse proxy. All requests that are made to your site are being filtered through the remote proxy server, which<br />

limits an end-user from distinguishing server names or other network architecture in<strong>for</strong>mation that could<br />

potentially be used to compromise your systems. Additionally, there is only a single entry point through your<br />

firewall, namely between your proxy server and your web server, so network administrators have considerably more<br />

control over limiting the traffic that is allowed past the firewall.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.


Under a contribution-consumption<br />

consumption site architecture model, utilizing a reverse proxy allows your network<br />

administrator to keep both the contribution and the consumption <strong>Content</strong> Server instances inside the firewall. The<br />

network architecture diagram below demonstrates using multiple reverse proxies with a load balancer ancer <strong>for</strong> added<br />

per<strong>for</strong>mance in a contribution-consumption consumption Site Studio web site model.<br />

© 2012. <strong>Fishbowl</strong> Solutions, Inc.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!