12.07.2015 Views

Outline of High Speed Ingestion with Data Partitioning in Content ...

Outline of High Speed Ingestion with Data Partitioning in Content ...

Outline of High Speed Ingestion with Data Partitioning in Content ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

© Copyright 2007 EMC Corporation. All rights reserved.4Direct Path Load<strong>in</strong>g• Oracle Direct Path Load (ODPL) allows object meta-data tobe loaded significantly faster than conventionaltransactional methods– By-passes normal transaction logic– By-passes Oracle server, formats pages, stores directly to file– 100’s <strong>of</strong> millions per day can easily be <strong>in</strong>gested• Caveats:– Indexes made unusable on tables by ODPL– Indexes need to be rebuilt• <strong>Data</strong> can come from ascii files


© Copyright 2007 EMC Corporation. All rights reserved.5Other “traditional” limitations <strong>of</strong> Direct Path Load<strong>in</strong>g• Many <strong>of</strong> the tables that had to be generated have lots <strong>of</strong><strong>Content</strong> Server <strong>in</strong>ternal columns– Subject to change <strong>in</strong> a release– Not documented– Example: data ticket• Direct Path load<strong>in</strong>g provided back-door that wouldn’t allowBOF customizations to be applied to objects


© Copyright 2007 EMC Corporation. All rights reserved.6Gett<strong>in</strong>g around the ODPL limitations• Model Simplifications– Lightweight objects:• Allows us to avoid load<strong>in</strong>g dm_sysobject_s & dm_sysobject_r– External store objects• Stored where the application decides. Avoids hav<strong>in</strong>g to reverse eng<strong>in</strong>eerdatatickets• <strong>Data</strong> <strong>Partition<strong>in</strong>g</strong>:– ODPL can be run on schema-identical partition on separate <strong>of</strong>fl<strong>in</strong>etable– Once Indexes rebuilt the <strong>of</strong>fl<strong>in</strong>e partition can be swapped <strong>with</strong>correspond<strong>in</strong>g empty onl<strong>in</strong>e one <strong>with</strong> little overhead– Requires, potentially only a few m<strong>in</strong>utes <strong>of</strong> application downtime


© Copyright 2007 EMC Corporation. All rights reserved.7<strong>Data</strong>base layout for exampl normal sysobjectDm_sysobject_s and _rDmi_object_typeDmr_content_s & _rAn object’smeta-data isspread overseveral tablesjo<strong>in</strong>ed byr_object_idDm_document_sMytype_sMyparent_type_s


© Copyright 2007 EMC Corporation. All rights reserved.8Lightweight objects consume less space per object• Each Lightweight objects will nothave a correspond<strong>in</strong>g separateset <strong>of</strong> entries <strong>in</strong>:– Dm_sysobject_s & r– Dm_document_s– And the parent custom typetables• Light weight objects share those<strong>with</strong> other light weight objectsDmi_object_typeDmr_content_s & _rMytype_s


© Copyright 2007 EMC Corporation. All rights reserved.9Background: Leverag<strong>in</strong>g RDBMS <strong>Partition<strong>in</strong>g</strong>• Scheme• Hash partition<strong>in</strong>g vs. Range <strong>Partition<strong>in</strong>g</strong>• Local Indexes


© Copyright 2007 EMC Corporation. All rights reserved.10<strong>Data</strong>base Management Challenges at a Billion ObjectsLarge underly<strong>in</strong>g DCTM tableCommon operationsbecome slow and difficult:-Index creation-Statistics update


© Copyright 2007 EMC Corporation. All rights reserved.11Range Partition on D6.5 partition_id0 < ID < 10P0One Table spaceper partition10 ≤ ID < 2020 ≤ ID < 30P2P31,000 ≤ ID < 1,100P1011,100 ≤ ID < MAXVALUE


© Copyright 2007 EMC Corporation. All rights reserved.12<strong>Partition<strong>in</strong>g</strong> Scheme• <strong>Partition<strong>in</strong>g</strong> on partition_id for the tables associated <strong>with</strong> sysobjects:– Dm_sysobject_s, dm_sysobject_r, dm_document_s– dmi_object_type,– Dmr_content_s and dmr_content_r– Aspect tables– Acl tables– Your custom type tables• Partition id is an <strong>in</strong>teger value


© Copyright 2007 EMC Corporation. All rights reserved.13Some Advantages <strong>of</strong> <strong>Partition<strong>in</strong>g</strong>• Indexes can be managed on partition basis (“Local”)• Statistics computed on partition basis• Partition-only lookups fast• Partition exchange for high speed <strong>in</strong>gest• Compression <strong>of</strong> cold partitions• Mass drop for partition aligned data


© Copyright 2007 EMC Corporation. All rights reserved.14Partition Key Index lookupsSELECT …. FROM … WHERE partition_id = ‘3’<strong>Data</strong>base can take advantage <strong>of</strong> key tolimit search to s<strong>in</strong>gle partition


© Copyright 2007 EMC Corporation. All rights reserved.15Non-Partition Key Index lookupsSELECT …. FROM … WHERE OBJECT_NAME = ‘FOO’<strong>Data</strong>base must iterate through eachpartition to locate results


© Copyright 2007 EMC Corporation. All rights reserved.16“Global” <strong>in</strong>dexes span the entire set <strong>of</strong> partitionsSELECT …. FROM … WHERE OBJECT_NAME = ‘FOO’Faster lookup’s than Non-partition KeyLocal ones, but ma<strong>in</strong>tenance cost is high


© Copyright 2007 EMC Corporation. All rights reserved.17Index notes• A partitioned set <strong>of</strong> Documentum tables will only leveragelocal <strong>in</strong>dexes– Partition exchange can’t be done <strong>with</strong> a table that has a global <strong>in</strong>dex• Limits on partitions:– Oracle : 64K – 1 partitions per table/<strong>in</strong>dex– SQL Server 2005: 1000 partitions per table/<strong>in</strong>dex• Non-partition key lookups will get more expensive as thenumber <strong>of</strong> partitions grows– However, parallel <strong>in</strong>dex scans over partitions possible


© Copyright 2007 EMC Corporation. All rights reserved.18Range vs. Hash <strong>Partition<strong>in</strong>g</strong>ID = 1ID = 2ID = 3Consecutive Objects mapped to same partition <strong>in</strong> range


© Copyright 2007 EMC Corporation. All rights reserved.19Hash <strong>Partition<strong>in</strong>g</strong>ID = 1ID = 2ID = 3Consecutive Objects to different partitions Less Complicated Table Setup and Ma<strong>in</strong>tenance However, poor data management qualities


© Copyright 2007 EMC Corporation. All rights reserved.20Partition Exchange ExampleDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 0Partition 2Num recs = 0Partition 3Num recs = 0Partition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = UP


© Copyright 2007 EMC Corporation. All rights reserved.21Direct Path Load<strong>in</strong>g <strong>in</strong>to <strong>of</strong>fl<strong>in</strong>e tableDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 0Partition 2Num recs = 10MDirect Path LoadPartition 3Num recs = 0Partition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = UP


© Copyright 2007 EMC Corporation. All rights reserved.22Indexes rebuilt <strong>in</strong> <strong>of</strong>fl<strong>in</strong>e tableDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 0Partition 2Num recs = 10MIndex RebuildPartition 3Num recs = 0Partition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = UP


© Copyright 2007 EMC Corporation. All rights reserved.23<strong>Content</strong> Server needs to be brought downDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 0Partition 2Num recs = 10MPartition 3Num recs = 0Partition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = DOWN


Partition ExchangeDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 10MPartition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = UP© Copyright 2007 EMC Corporation. All rights reserved.Partition 2Num recs = 0Partition 3Num recs = 0SQL> ALTER TABLE DMR_CONTENT_SEXCHANGE PARTITION 2 WITH TABLEDMR_CONTENT_S _<strong>of</strong>fl<strong>in</strong>e INCLUDINGINDEXES WITHOUT VALIDATION;Table altered.24


© Copyright 2007 EMC Corporation. All rights reserved.25<strong>Content</strong> Server can be brought back upDmr_content_rDmr_content_r_<strong>of</strong>fl<strong>in</strong>ePartition 1Num recs = 1,000Partition 1Num recs = 0Partition 2Num recs = 10MPartition 2Num recs = 0Partition 3Num recs = 0Partition 3Num recs = 0Onl<strong>in</strong>e version <strong>of</strong> table<strong>Content</strong> Server = DOWN UP


© Copyright 2007 EMC Corporation. All rights reserved.26Other notes for partition exchange• Indexes associated <strong>with</strong> partitioned tables will be ‘local’<strong>in</strong>dexes• <strong>Content</strong> server will generate scripts to exchange a partition<strong>with</strong> some <strong>in</strong> an <strong>of</strong>fl<strong>in</strong>e tables– Apply method determ<strong>in</strong>es the tables <strong>in</strong>volved and generates a script• The content must be shutdown before (and dur<strong>in</strong>g) theexecution <strong>of</strong> the scripts• Note: dmi_object_type should be ranged partitioned tosupport this (currently not done by default)– A new getObjectWithOptions() method will be provided to allow thecaller to supply the object type and partition id <strong>in</strong> order to makefetch <strong>in</strong>expensive– Otherwise a fetch will require a local <strong>in</strong>dex lookup per partition


© Copyright 2007 EMC Corporation. All rights reserved.27Best Use-cases for partitions• Ingest <strong>of</strong> legacy data for onl<strong>in</strong>e system• Massive <strong>in</strong>gest <strong>of</strong> few <strong>in</strong>put streams <strong>in</strong> short load w<strong>in</strong>dow• Mass drop <strong>of</strong> partition aligned data

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!