IBM InfoSphere Advanced DataStage – Parallel Framework v11.5 (KM404G) – Outline

Detailed Course Outline

1: Introduction to the parallel framework architecture • Describe the parallel processing architecture • Describe pipeline and partition parallelism • Describe the role of the configuration file • Design a job that creates robust test data

2: Compiling and executing jobs • Describe the main parts of the configuration file • Describe the compile process and the OSH that the compilation process generates • Describe the role and the main parts of the Score • Describe the job execution process

3: Partitioning and collecting data • Understand how partitioning works in the Framework • Viewing partitioners in the Score • Selecting partitioning algorithms • Generate sequences of numbers (surrogate keys) in a partitioned, parallel environment

4: Sorting data • Sort data in the parallel framework • Find inserted sorts in the Score • Reduce the number of inserted sorts • Optimize Fork-Join jobs • Use Sort stages to determine the last row in a group • Describe sort key and partitioner key logic in the parallel framework

5: Buffering in parallel jobs • Describe how buffering works in parallel jobs • Tune buffers in parallel jobs • Avoid buffer contentions

6: Parallel framework data types • Describe virtual data sets • Describe schemas • Describe data type mappings and conversions • Describe how external data is processed • Handle nulls • Work with complex data

7: Reusable components • Create a schema file • Read a sequential file using a schema • Describe Runtime Column Propagation (RCP) • Enable and disable RCP • Create and use shared containers

8: Balanced Optimization • Enable Balanced Optimization functionality in Designer • Describe the Balanced Optimization workflow • List the different Balanced Optimization options. • Push stage processing to a data source • Push stage processing to a data target • Optimize a job accessing Hadoop HDFS file system • Understand the limitations of Balanced Optimizations