30.07.2015 Views

Collect, Extract, Transform, Load - Practical Computer Applications

Collect, Extract, Transform, Load - Practical Computer Applications

Collect, Extract, Transform, Load - Practical Computer Applications

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Collect</strong>, <strong>Extract</strong>, <strong>Transform</strong>, <strong>Load</strong> - 09-26-2010<strong>Collect</strong>, <strong>Extract</strong>, <strong>Transform</strong>, <strong>Load</strong>by admin-Andrei - Sunday, September 26, 2010http://www.practicaldb.com/blog/collect-extract-transform-load/Part of PCA process is a business discovery, including customer's DataFlow and WorkFlow, starting withData <strong>Collect</strong>ion, <strong>Extract</strong>ion, <strong>Transform</strong>ation and <strong>Load</strong>ingData operations, which are processing and flowing Data into appropriate Databases, such as DataSourcing and Staging Datastores, Data Warehouses and Data Marts.Data <strong>Collect</strong>ion is a starting step(s) in most Dataflows and can be implemented as a combination ofpage 1 / 3


<strong>Collect</strong>, <strong>Extract</strong>, <strong>Transform</strong>, <strong>Load</strong> - 09-26-2010various Data <strong>Collect</strong>ors: Data <strong>Collect</strong>ing Client (e.g. Smart Client with Desktop UI as Data <strong>Collect</strong>or) andAgent (Remote API) <strong>Applications</strong>, WebForms, Meters (both hardware and/or software based automatedcollectors), Data Replication and Scraping Software (based on triggers, schedules and/or on-demand).Almost 99% of data in the world collected automatically, like electricity usage by meters, GHG emissionby environmental monitoring devices or tax collections by payment applications. However manyimportant data collected by human Data <strong>Collect</strong>ors, surveying other humans, like Census 2010 ,observing natural, manufacturing or logistical (e.g. sales, inventory, purchasing, customer support,product registration, compliance, HR, shipping, customer feedback) processes or scanning objects (withscanners or just … eyes) and than entering data through Web or Desktop Data <strong>Collect</strong>or (application/form)into data collecting database file(s).<strong>Extract</strong>, transform, and load (ETL) are 3 data processes, followed after data collection.<strong>Extract</strong>ion takes data, collected in data sources like flat files, databases (relational, hierarchical etc.),transactional datastores, semi-structured repositories (e.g. email systems or document libraries) withdifferent structure and format, pre-validating extracted data and parsing valid data to destination (e.g.staging database)<strong>Transform</strong>ation takes extracted data and applies predefined rules and functions to it, including selection(e.g. ignore or remove NULLs), data cleansing, encoding (e.g. mapping “Male” to “M”), deriving (e.g.calculating designated value as a product of extracted value and predefined constant) , sorting, joiningdata from multiple sources (e.g. lookup or merge), aggregation (e.g. summary for each month),transposing (columns to rows or vice versa), splitting, disaggregation, lookups (e.g. validation throughdictionaries), predefined validation etc. which may lead to rejection of some data. <strong>Transform</strong>ed data canbe stored into Data Warehouse (DW).<strong>Load</strong> takes transformed data and places it into end target, in most cases called Data Mart ( sometimesthey called Data Warehouse too). <strong>Load</strong> can append, refresh or/and overwrite preexisting data, applyconstraints and execute appropriate triggers (to enforce data integrity, uniqueness, mandatory fields,provide log etc.) and may start additional processes, like data backup or replication.Andrei Pandre, Ph.D., VP of Data Visualization, <strong>Practical</strong> <strong>Computer</strong> <strong>Applications</strong>, Inc.page 2 / 3


Powered by TCPDF (www.tcpdf.org)<strong>Collect</strong>, <strong>Extract</strong>, <strong>Transform</strong>, <strong>Load</strong> - 09-26-2010_______________________________________________page 3 / 3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!