Detailed Course Outline
Introduction
The Case for Apache Hadoop
- Why Hadoop?
- A Brief History of Hadoop
- Core Hadoop Components
- Fundamental Concepts
HDFS
- HDFS Features
- Writing and Reading Files
- NameNode Considerations
- Overview of HDFS Security
- Using the Namenode Web UI
- Using the Hadoop File Shell
Getting Data into HDFS
- Ingesting Data from External Sources with Flume
- Ingesting Data from Relational Databases with Sqoop
- REST Interfaces
- Best Practices for Importing Data
MapReduce
- What Is MapReduce?
- Features of MapReduce
- Basic Concepts
- Architectural Overview
- MapReduce Version 2
- Failure Recovery
- Using the JobTracker Web UI
Planning Your Hadoop Cluster
- General Planning Considerations
- Choosing the Right Hardware
- Network Considerations
- Configuring Nodes
- Planning for Cluster Management
Hadoop Installation and Initial Configuration
- Deployment Types
- Installing Hadoop
- Specifying the Hadoop Configuration
- Performing Initial HDFS Configuration
- Performing Initial MapReduce Configuration
- Log File Locations
Installing and Configuring Hive, Impala, and Pig
- Hive
- Impala
- Pig
Hadoop Clients
- What is a Hadoop Client?
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
- Hue Authentication and Configuration
Cloudera Manager
- The Motivation for Cloudera Manager
- Cloudera Manager Features
- Standard and Enterprise Versions
- Cloudera Manager Topology
- Installing Cloudera Manager
- Installing Hadoop Using Cloudera Manager
- Performing Basic Administration Tasks
- Using Cloudera Manager
Advanced Cluster Configuration
- Advanced Configuration Parameters
- Configuring Hadoop Ports
- Explicitly Including and Excluding Hosts
- Configuring HDFS for Rack Awareness
- Configuring HDFS High Availability
Hadoop Security
- Why Hadoop Security Is Important
- Hadoop’s Security System Concepts
- What Kerberos Is and How it Works
- Securing a Hadoop Cluster with Kerberos
Managing and Scheduling Jobs
- Managing Running Jobs
- Scheduling Hadoop Jobs
- Configuring the FairScheduler
Cluster Maintenance
- Checking HDFS Status
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- NameNode Metadata Backup
- Cluster Upgrading
Cluster Monitoring and Troubleshooting
- General System Monitoring
- Managing Hadoop’s Log Files
- Monitoring Hadoop Clusters
- Common Troubleshooting Issues
Conclusion