IBM InfoSphere DataStage v11.5 – Advanced Data Processing (KM423G) – Outline

Detailed Course Outline

Unit 1 –Accessing databases Topic 1:  Connector stage overview • Use Connector stages to read from and write to relational tables • Working with the Connector stage properties Topic 2:  Connector stage functionality • Before / After SQL • Sparse lookups • Optimize insert/update performance Topic 3:  Error handling in Connector stages • Reject links • Reject conditions Topic 4:  Multiple input links • Designing jobs using Connector stages with multiple input links • Ordering records across multiple input links Topic 5:  File Connector stage • Read and write data to Hadoop file systems Demonstration 1: Handling database errors Demonstration 2:  Parallel jobs with multiple Connector input links Demonstration 3:  Using the File Connector stage to read and write HDFS files

Unit 2 – Processing unstructured data Topic 1:  Using the Unstructured Data stage in DataStage jobs • Extract data from an Excel spreadsheet • Specify a data range for data extraction in an Unstructured Data stage • Specify document properties for data extraction. Demonstration 1:  Processing unstructured data

Unit 3 – Data masking Topic 1:  Using the Data Masking stage in DataStage jobs • Data masking techniques • Data masking policies • Applying policies for masquerading context-aware data types • Applying policies for masquerading generic data types • Repeatable replacement • Using reference tables • Creating custom reference tables Demonstration 1: Data masking

Unit 4 – Using data rules Topic 1:  Introduction to data rules • Using the Data Rules Editor • Selecting data rules • Binding data rule variables • Output link constraints • Adding statistics and attributes to the output information Topic 2:  Use the Data Rules stage to valid foreign key references in source data Topic 3:  Create custom data rules Demonstration 1:  Using data rules

Unit 5 – Processing XML data Topic 1:  Introduction to the Hierarchical stage • Hierarchical stage Assembly editor • Use the Schema Library Manager to import and manage XML schemas Topic 2:  Composing XML data • Using the HJoin step to create parent-child relationships between input lists • Using the Composer step Topic 3:  Writing Hierarchical data to a relational table Topic 4:  Using the Regroup step Topic 5:  Consuming XML data • Using the XML Parser step • Propagating columns Topic 6:  Transforming XML data • Using the Aggregate step • Using the Sort step • Using the Switch step • Using the H-Pivot step Demonstration 1:  Importing XML schemas Demonstration 2: Compose hierarchical data Demonstration 3: Consume hierarchical data Demonstration 4:  Transform hierarchical data

Unit 6:  Updating a star schema database Topic 1:  Surrogate keys • Design a job that creates and updates a surrogate key source key file from a dimension table Topic 2:  Slowly Changing Dimensions (SCD) stage • Star schema databases • SCD stage Fast Path pages • Specifying purpose codes • Dimension update specification • Design a job that processes a star schema database with Type 1 and Type 2 slowly changing dimensions Demonstration 1: Build a parallel job that updates a star schema database with two dimensions