Cloudera Training for Apache HBase

 

Who should attend

  • Developers
  • Administrators
  • Engineers

Prerequisites

  • Prior experience with databases and data modeling is helpful
  • Knowledge of Java
  • Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.

Course Objectives

By the end of this course, you will learn:

  • The use cases and usage occasions for HBase, Hadoop, and RDBMS
  • Using the HBase shell to directly manipulate HBase tables
  • Designing optimal HBase schemas for efficient data storage and recovery
  • How to connect to HBase using the Java API to insert and retrieve data in real time
  • Best practices for identifying and resolving performance bottlenecks

Product Description

Cloudera University’s three-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.Through instructor-led discussion and interactive, hands-on exercises, you will learn to navigate the Hadoop ecosystem.

Outline

Module 1: Introduction to Hadoop and HBase

  • What Is Big Data?
  • Introducing Hadoop
  • Hadoop Components
  • What Is HBase?
  • Why Use HBase?
  • Strengths of HBase
  • HBase in Production
  • Weaknesses of HBase

Module 2: HBase Tables

  • HBase Concepts
  • HBase Table Fundamentals
  • Thinking About Table Design

Module 3: The HBase Shell

  • Creating Tables with the HBase Shell
  • Working with Tables
  • Working with Table Data

Module 4: HBase Architecture Fundamentals

  • HBase Regions
  • HBase Cluster Architecture
  • HBase and HDFS Data Locality

Module 5: HBase Schema Design

  • General Design Considerations
  • Application-Centric Design
  • Designing HBase Row Keys
  • Other HBase Table Features

Module 6: Basic Data Access with the HBase API

  • Options to Access HBase Data
  • Creating and Deleting HBase Tables
  • Retrieving Data with Get
  • Retrieving Data with Scan
  • Inserting and Updating Data
  • Deleting Data

Module 7: More Advanced HBase API Features

  • Filtering Scans
  • Best Practices
  • HBase Coprocessors

Module 8: HBase on the Cluster

  • How HBase Uses HDFS
  • Compactions and Splits

Module 9: HBase Reads and Writes

  • How HBase Writes Data
  • How HBase Reads Data
  • Block Caches for Reading

Module 10: HBase Performance Tuning

  • Column Family Considerations
  • Schema Design Considerations
  • Configuring for Caching
  • Dealing with Time Series and Sequential Data
  • Pre-Splitting Regions

Module 11: HBase Administration and Cluster Management

  • HBase Daemons
  • ZooKeeper Considerations
  • HBase High Availability
  • Using the HBase Balancer
  • Fixing Tables with hbck
  • HBase Security

Module 12: HBase Replication and Backup

  • HBase Replication
  • HBase Backup
  • MapReduce and HBase Clusters

Module 13: Using Hive and Impala with HBase

  • Using Hive and Impala with HBase

Module 14: Appendix A: Accessing Data with Python and Thrift

  • Thrift Usage
  • Working with Tables
  • Getting and Putting Data
  • Scanning Data
  • Deleting Data
  • Counters
  • Filters

Module 15: Appendix B: OpenTSDB

E-Learning
Price (excl. tax)
  • US$ 1,815.—

Subscription duration: 180 days