Skip to navigation (Press Enter)
Skip to search (Press Enter)
Skip to course offerings (Press Enter)
Skip to content (Press Enter)

Contact one of our regional offices

CR-CDAT

Classroom Training

Modality: G

Duration 4 days

Price

on request

Dates and Booking

Request a date

Cloudera Data Analyst Training (CDAT) – Outline

Detailed Course Outline

1. Introduction

About this Course
About Cloudera
Course Logistics
Introductions

2. Hadoop Fundamentals

The Motivation for Hadoop
Hadoop Overview
HDFS
MapReduce
The Hadoop Ecosystem
Lab Scenario Explanation
Hands-On Exercise: Data Ingestion and Processing with Hadoop Tools

3. Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

4. Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
Hands-On Exercise: Using Pig for ETL Processing

5. Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-in Functions for Complex Data
Iterating Grouped Data
Hands-On Exercise: Analyzing Ad Campaign Data with Pig

6. Multi-Dataset Operations with Pig

Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets
Hands-On Exercise: Analyzing Disparate Data Sets with Pig

7. Extending Pig

Adding Flexibility with Parameters
Macros and Imports
UDFs
Contributed Functions
Using Other Languages to Process Data with Pig
Accessing Pig from Other Languages
Hands-On Exercise: Extending Pig with Streaming and UDFs

8. Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Optional Demo: Troubleshooting a Failed Job with the Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs

9. Introduction to Hive

What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
Hive Use Cases
Interacting with Hive

10. Relational Data Analysis with Hive

Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Common Built-in Functions
Hands-On Exercise: Running Hive Queries on Retail Data

11. Hive Data Management

Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data
Hands-On Exercise: Data Management with Hive

12. Text Processing with Hive

Overview of Text Processing
Important String Functions
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Hands-On Exercise: Gaining Insight with Sentiment Analysis

13. Hive Optimization

Understanding Query Performance
Controlling Job Execution Plan
Partitioning
Bucketing
Indexing Data

14. Extending Hive

SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries
Hands-On Exercise: Data Transformation with Hive

15. Introduction to Impala

What is Impala?
How Impala Differs from Relational Databases
How Impala Differs from Hive and Pig
Using the Impala Shell
Limitations and Future Directions

16.Analyzing Data with Impala

Basic Syntax
Data Types
Filtering, Sorting, and Limiting Results
Joining and Grouping Data
Improving Impala Performance
Hands-On Exercise: Interactive Analysis with Impala

17. Interoperability and Workflows

Picking the Best Tool for the Job
Tips for Better Interoperability
Database and Tool Integration
Managing Recurring Workflows

18. Conclusion

Essential Points
Next Steps

Contact