Name: Cloudera Administrator Training for Apache Hadoop
Brand: Cloudera
Price: 2235 USD

Who should attend

System Administrators
IT Managers
Application Support Professionals

Prerequisites

Basic Linux experience
Prior knowledge of Apache Hadoop is not required

Course Objectives

By the end of this course, you will learn:

Cloudera Manager features that make managing your clusters easier, such as aggregated logging, configuration management, resource management, reports, alerts and service management
The internals of YARN, MapReduce, Spark, and HDFS
Determining the correct hardware and infrastructure for your cluster
Proper cluster configuration and deployment to integrate with the data center
How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
Best practices for preparing and maintaining Apache Hadoop in production
Troubleshooting, diagnosing, tuning, and solving Hadoop issues

Product Description

Cloudera University’s four-day administrator training course for Apache Hadoop provides you with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.

Outline

Module 1: The Case for Apache Hadoop

Why Hadoop?
Fundamental Concepts
Core Hadoop Components

Module 2: Hadoop Cluster Installation

Rationale for a Cluster Management Solution
Cloudera Manager Features
Cloudera Manager Installation
Hadoop (CDH) Installation

Module 3: The Hadoop Distributed File System (HDFS)

HDFS Features
Writing and Reading Files
NameNode Memory Considerations
Overview of HDFS Security
Web UIs for HDFS
Using the Hadoop File Shell

Module 4: MapReduce and Spark on YARN

The Role of Computational Frameworks
YARN: The Cluster Resource Manager
MapReduce Concepts
Apache Spark Concepts
Running Computational Frameworks on YARN
Exploring YARN Applications Through the Web UIs, and the Shell
YARN Application Logs

Module 5: Hadoop Configuration and Daemon Logs

Cloudera Manager Constructs for Managing Configurations
Locating Configurations and Applying Configuration Changes
Managing Role Instances and Adding Services
Configuring the HDFS Service
Configuring Hadoop Daemon Logs
Configuring the YARN Service

Module 6: Getting Data Into HDFS

Ingesting Data From External Sources With Flume
Ingesting Data From Relational Databases With Sqoop
REST Interfaces
Best Practices for Importing Data

Module 7: Planning Your Hadoop Cluster

General Planning Considerations
Choosing the Right Hardware
Virtualization Options
Network Considerations
Configuring Nodes

Module 8: Installing and Configuring Hive, Impala and Pig

Hive
Impala
Pig

Module 9: Hadoop Clients Including Hue

What Are Hadoop Clients?
Installing and Configuring Hadoop Clients
Installing and Configuring Hue
Hue Authentication and Authorization

Module 10: Advanced Cluster Configuration

Advanced Configuration Parameters
Configuring Hadoop Ports
Configuring HDFS for Rack Awareness
Configuring HDFS High Availability

Module 11: Hadoop Security

Why Hadoop Security Is Important
Hadoop’s Security System Concepts
What Kerberos Is and how it Works
Securing a Hadoop Cluster With Kerberos
Other Security Concepts

Module 12: Managing Resources

Configuring cgroups with Static Service Pools
The Fair Scheduler
Configuring Dynamic Resource Pools
YARN Memory and CPU Settings
Impala Query Scheduling

Module 13: Cluster Maintenance

Checking HDFS Status
Copying Data Between Clusters
Adding and Removing Cluster Nodes
Rebalancing the Cluster
Directory Snapshots
Cluster Upgrading

Module 14: Cluster Monitoring and Troubleshooting

Cloudera Manager Monitoring Features
Monitoring Hadoop Clusters
Troubleshooting Hadoop Clusters
Common Misconfigurations

E-Learning

Price (excl. tax)

Cloudera Administrator Training for Apache Hadoop (CATAH)