Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...
Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...
Performance Tuning for Oracle WebCenter Content 11g - Fishbowl ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Per<strong>for</strong>mance</strong> <strong>Tuning</strong> <strong>for</strong> <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>:<br />
Strategies & Tactics<br />
Chris Rothwell, <strong>Fishbowl</strong> Solutions<br />
Paul Heupel, <strong>Fishbowl</strong> Solutions<br />
Introduction<br />
<strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> 10g functionality was effectively contained in one container - the <strong>Content</strong> Server. This<br />
fact alone made it easy to deploy, administer, and customize. However, <strong>for</strong> all of these easy capabilities, the product<br />
was somewhat lacking on the scalability and per<strong>for</strong>mance side. With <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>, and the<br />
inclusion of <strong>Oracle</strong> Weblogic Server, SOA and BPM, the product has expanded its functionality to achieve best-in-<br />
class per<strong>for</strong>mance and scalability. The new tradeoff is that these additional infrastructure components have created<br />
a layer of complexity that often leads to delayed deployments and non-optimized optimized systems. The good news is that<br />
with the right tuning strategies and appropriate use of reverse proxies and load balancing you can truly optimize<br />
<strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong> and maximize your technology investment.<br />
<strong>Tuning</strong> <strong>WebCenter</strong> <strong>Content</strong><br />
<strong>Oracle</strong>’s ECM solution has its roots in the Stellent <strong>Content</strong> Management offering. From Xpedio 4.5 to <strong>Oracle</strong> 10gR3<br />
the content management system’s core was a Java Standard Edition based solution. <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong> is<br />
deployed as a Fusion Middleware solution. Optimization techniques that held true from the Xpedio days to UCM<br />
10gR3 may no longer apply to the latest incarnation of Fusion Middleware’s e’s Enterprise <strong>Content</strong> Management (ECM)<br />
system.<br />
<strong>Content</strong> Server architecture<br />
Memory and Java Virtual Machine (JVM) <strong>Tuning</strong><br />
Memory is still one of the most significant per<strong>for</strong>mance tuning areas with <strong>Oracle</strong> <strong>WebCenter</strong> <strong>Content</strong>. As part of the<br />
Fusion Middleware e stack <strong>WebCenter</strong> <strong>Content</strong> requires a Java Enterprise Edition container. Prior versions of the<br />
<strong>Content</strong> Server ran as a Java Standard Edition application. Under UCM 10gR3 and earlier, you could specify JVM<br />
tuning in the $UCM_HOME/bin/intradoc.cfg<br />
configuration file. <strong>Tuning</strong> was somewhat limited since the<br />
JAVA_OPTIONS would append custom parameters with computed values.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
In <strong>WebCenter</strong> <strong>Content</strong> <strong>11g</strong>, the content management solution is deployed inside Weblogic Server (WLS), with any<br />
JVM tuning per<strong>for</strong>med on the application server. You have full control, either by modifying the managed server<br />
using the administrative console or modifying the USER_MEM_ARGS environment variable startup scripts.<br />
<strong>Oracle</strong>’s documentation suggests the following on Unix and Windows with the JRockit JVM:<br />
-Xms256m -Xmx1024m –XnoOpt<br />
The –Xmx flag specifies the maximum heap size with this example specifying 1GB of memory. Best practice is to<br />
keep your JVM heap settings under 75-80% of the available physical RAM, within limits <strong>for</strong> machines with<br />
excessive amounts of memory. As heap size is increased, CPU load will also increase <strong>for</strong> larger garbage collections.<br />
Under 32-bit operating systems, 1.5GB is the practical maximum limit assuming other services are not consuming<br />
resources.<br />
The –Xms flag specifies the minimum heap size on initial startup. Increasing the heap takes considerable time, so it<br />
is best to set the Xmx and Xno parameters to the same size. For example:<br />
-Xms1024m -Xmx1024m -XnoOpt -XgcPrio:throughput<br />
On x86 and x64 hardware, JRockit should be the preferred JVM. JRockit was a Java virtual machine optimized <strong>for</strong><br />
x86 hardware by Intel, purchased by BEA, and acquired by <strong>Oracle</strong>. The JRockit JVM per<strong>for</strong>ms significantly faster<br />
on x86 or x64 Windows and Linux architectures than Sun’s architecturally neutral JVM implementation.<br />
An example of JVM tuning, from another <strong>Oracle</strong> whitepaper, started with:<br />
-Xms3g<br />
-Xmx3g<br />
-XX:PermSize=512m<br />
-XX:MaxPermSize=512m<br />
-XX:+UseParallelGC<br />
-XX:ParallelGCThreads=8<br />
-verbose:gc<br />
-XX:+PrintGCDetails<br />
-XX:+PrintGCTimeStamps<br />
-XX:NewRatio=3<br />
-XX:+UseAdpativeSizePolicy<br />
-XX:+AggressiveHeap<br />
-XX:+DisableExplicitGC<br />
-Xnoclassgc<br />
-Xloggc:<br />
and continued to tune <strong>WebCenter</strong> as:<br />
-d64<br />
-server<br />
-Xms3g<br />
-Xmx3g<br />
-XX:PermSize=512m<br />
-XX:MaxPermSize=1024m<br />
-XX:+AggressiveOpts<br />
-XX:+UseParallelGC<br />
-XX:ParallelGCThreads=16<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
-verbose:gc<br />
-XX:+PrintGCDetails<br />
-XX:+PrintGCTimeStamps<br />
-XX:NewRatio=4<br />
-Xnoclassgc<br />
-Xloggc:<br />
-Dweblogic.threadpool.MinPoolSize=72<br />
-Dweblogic.threadpool.MaxPoolSize=72<br />
-Dweblogic.SocketReaders=12cketReaders=12<br />
-Djps.auth.debug=false<br />
Operating system architecture does not on its own provide enough in<strong>for</strong>mation to properly<br />
tune the <strong>Content</strong> Server.<br />
As seen in the above example, repeated tuning and testing was required to find an optimum configuration. The<br />
content repository has the additional complexity of requiring different per<strong>for</strong>mance configurations <strong>for</strong> contribution<br />
and consumption environments. A heavy ingestion pattern will benefited from a -XgcPrio:throughput garbage<br />
collection, while searching may benefited from other GC models.<br />
Confirm your capitalization is correct. In many cases, command-line options are case sensitive unless explicitly<br />
stated. A configuration flag improperly set may be ignored, or cause unintended consequences.<br />
Disk<br />
Usage<br />
<strong>WebCenter</strong> <strong>Content</strong>, like the earlier versions of the content repository, has a variety of disk mounting options, with<br />
implications <strong>for</strong> what type of storage may be appropriate <strong>for</strong> each area. Directories within the content repository<br />
may have different service level agreements and per<strong>for</strong>mance requirements. Using a single storage system does<br />
not produce optimal per<strong>for</strong>mance-cost cost optimization.<br />
The latest incarnation of the <strong>Oracle</strong> <strong>Content</strong> Repository, a shared file system is still required <strong>for</strong> clustering. The<br />
ECM services run as Java processes. Prior to <strong>11g</strong>, these services took the strategy of keeping a memory cache,<br />
writing to a shared file system or database, and having the other nodes update their local cache. All content<br />
management services continue to be stateless and utilize the same concurrency mechanism even though they are<br />
living in a Java Enterprise Edition world.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
<strong>Content</strong> Server in a clustered configuration<br />
High per<strong>for</strong>mance low latency shared disk space is critical <strong>for</strong> per<strong>for</strong>mance in the shared directory.<br />
When a file is ingested into the content repository, it is placed in the<br />
/user_projects/domains//ucm/cs/vault/~temp<br />
directory. From that directory, the file is copied to any refineries, copied <strong>for</strong> full-text indexing, any necessary<br />
trans<strong>for</strong>mations created, and moved to the appropriate vault and weblayout locations. File IO is key to that ~temp<br />
directory, with five or more read operations as part of a standard check in.<br />
All other sub directories within the vault are the ‘native’ or original file checked into the repository. The vault<br />
directory is a long term archive <strong>for</strong> the asset, and should be viewed from a disaster recovery perspective. A copy of<br />
the file, or a version intended <strong>for</strong> heavy consumption,<br />
is typically placed in the weblayout directory. Any file in the<br />
weblayout directory could be recreated, so emphasis should be on per<strong>for</strong>mance rather than reliability.<br />
In 10gR4 and below, <strong>Content</strong> IDs or dDocNames had required optimizations like the Fast Checkin component to get<br />
around row locking on the counter tables under heavy ingestion. The <strong>11g</strong> repository changed the way the identifiers<br />
were generated, caching a block of content identifiers. There may be minor gaps in the sequence of content<br />
identifiers, which can be ignored.<br />
Prior to <strong>11g</strong>, a typical installation would have data, search, shared, and weblayout directories that were typically<br />
excluded from virus scanning. These directories still exist in <strong>11g</strong>, but are now found in the domain directory rather<br />
than the base UCM path. For example, in 10g:<br />
/server/weblayout<br />
became<br />
/user_projects/domains//ucm/cs/weblayout<br />
WebLogic logging directories should also avoid virus scanning in version <strong>11g</strong> and later.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
Logging<br />
<strong>11g</strong> uses the Weblogic logging. The granularity of in<strong>for</strong>mation sent to the logging system goes from<br />
TRACE, DEBUG, INFO, NOTICE, WARNING, ERROR, CRITICAL, ALERT, to EMERGENCY. In production<br />
environment, change the logging level to ERROR. One could modify the<br />
/user_projects/domains//config/servers/UCM_server1/logging.xm<br />
l<br />
or modify the logging levels using the Weblogic administrative console.<br />
File Store Providers<br />
<strong>Oracle</strong>’s ECM solution moved a File Store Provider to accommodate different usage patterns. The default file store<br />
provider in <strong>11g</strong> continues to use the vault/weblayout file structure.<br />
Classically, the <strong>Oracle</strong> ECM solution would store relational data in a database and files in a file system. As the<br />
number of managed assets increased, some scalability issues became apparent. Three metadata fields – dDocType,<br />
dSecurityGroup, and dSecurityAccount – were used to spread the assets out to multiple directory structures. There<br />
is a limit to how many files can go into a directory structure, and as the number of assets grew into the tens of<br />
millions, hundreds of millions, and eventually billions inode issues and disk management became a bottleneck. UCM<br />
updated the default file store provider to add additional dispersion directories to spread out the files.<br />
A database file store provider was added where the assets are persisted in the database rather than a file system.<br />
The <strong>Oracle</strong> <strong>11g</strong>R2 Database SecureFiles API improved per<strong>for</strong>mance by over 40% compared to the 10g<br />
implementation. <strong>Per<strong>for</strong>mance</strong> matches, and in some cases exceeds, major networked file systems. In addition to<br />
the I/O gains, repositories that have Database Compression will automatically have de-duplication per<strong>for</strong>med<br />
against content stored the repository.<br />
When content is uploaded to the repository, a temporary file is placed in the vault/~temp location with a cache<br />
cleanup eventually clearing out that disk space. The current version allows that cache to be limited to one day, so<br />
care must be taken when ingesting very large volumes of content. <strong>Content</strong> must also be indexed be<strong>for</strong>e that<br />
temporary area becomes a candidate <strong>for</strong> cleanup.<br />
Virtualization<br />
<strong>Oracle</strong> differentiates between hard and soft portioning from a licensing perspective. With hard partitioning in use,<br />
one only licenses the CPU used by the virtual machine. Soft partitioning requires licensing <strong>for</strong> all CPUs in the host<br />
machine. <strong>Oracle</strong> VM can be configured to qualify as hard partitioning, but EMC VMWare is considered soft<br />
partitioning. Hardware prices are trivial compared to software, so optimize the virtual hosts to consolidate licenses.<br />
Typically, multiple smaller instances per<strong>for</strong>m better than fewer larger instances. Attempt to optimize CPU<br />
utilization, adding additional CPUs to the host servers as needed.<br />
While CPU architecture, socket, and cores impact the licensing costs, memory does not. A physical CPU may be<br />
shared among multiple virtual machines, but memory should not be a pooled resource.<br />
Services and Components<br />
<strong>WebCenter</strong> <strong>Content</strong> continues the service-based architecture introduced in earlier versions of the content<br />
repository. Services that return search results, metadata, or actual assets can be extended or overridden.<br />
GET_SEARCH_RESULTS, <strong>for</strong> example, can return a large amount of data if a repository has many custom metadata<br />
fields. The content repository will cache the search results, but network traffic can be significantly reduced by<br />
creating a template that returns only the fields and result sets needed.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
IDOC script includes can be cached, so the html will be dynamically rendered and then placed in session scope <strong>for</strong> a<br />
specific user or application scope <strong>for</strong> all users. The cacheInclude method takes the include name, scope, and life<br />
span as required parameters. For example, the std_page_begin include would be cached <strong>for</strong> ten minutes <strong>for</strong> each<br />
user.<br />
<br />
<strong>11g</strong> continues to lack a default success status code or message returned with all services. <strong>Content</strong> Services<br />
typically indicate an error by setting StatusCode property to a non-zero number. CIS, RIDC, and several other<br />
integration methods will potentially throw an exception when there are problems, but will absolutely throw an<br />
exception if you look <strong>for</strong> the StatusCode property and it is missing. One can either trust the service will throw an<br />
exception and assume it works, or modify the content server to set a default success status code.<br />
Reverse Proxy<br />
When architecting a website <strong>for</strong> high per<strong>for</strong>mance using Webcenter <strong>Content</strong>, a reverse proxy can be used to<br />
improve both per<strong>for</strong>mance and security <strong>for</strong> your site. A reverse proxy functions as a gateway to your network and<br />
adds an additional layer of caching <strong>for</strong> site visitors that will help to improve page response time, particularly under<br />
heavy load.<br />
Typically, a reverse proxy will reside in the DMZ of your network and will be the entry point <strong>for</strong> users accessing your<br />
site. The standard process flow <strong>for</strong> a user accessing a site behind a reverse proxy is as follows:<br />
1. A user enters http://www.mysite.com in a browser.<br />
2. DNS directs the user to the reverse proxy server that is sitting in your DMZ.<br />
3. The reverse proxy determines if the request is being made <strong>for</strong> static content or dynamic content.<br />
4. If static content is being requested, the reverse proxy will check its cache and will return the cached page to<br />
the user if the page is found in the cache.<br />
5. If dynamic content is being requested or if the reverse proxy does not have the page in its cache, it will send<br />
a request through the firewall to a web server inside your network to retrieve the requested page and will<br />
return that page to the user.<br />
The per<strong>for</strong>mance gains from caching at the reverse proxy level are obviously contingent on a number of factors<br />
including the number of static resources and pages that users are requesting, the frequency at which those items<br />
are accessed, and the hardware-network infrastructure that is being used. One popular reverse caching application,<br />
Varnish Cache, claims to improve delivery by a factor of 300 – 1000x depending on architecture when serving a<br />
page from cache (www.varnish-cache.org),.<br />
Besides the caching advantage of this model, your site also gains an additional level of security by implementing a<br />
reverse proxy. All requests that are made to your site are being filtered through the remote proxy server, which<br />
limits an end-user from distinguishing server names or other network architecture in<strong>for</strong>mation that could<br />
potentially be used to compromise your systems. Additionally, there is only a single entry point through your<br />
firewall, namely between your proxy server and your web server, so network administrators have considerably more<br />
control over limiting the traffic that is allowed past the firewall.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.
Under a contribution-consumption<br />
consumption site architecture model, utilizing a reverse proxy allows your network<br />
administrator to keep both the contribution and the consumption <strong>Content</strong> Server instances inside the firewall. The<br />
network architecture diagram below demonstrates using multiple reverse proxies with a load balancer ancer <strong>for</strong> added<br />
per<strong>for</strong>mance in a contribution-consumption consumption Site Studio web site model.<br />
© 2012. <strong>Fishbowl</strong> Solutions, Inc.