Who should attend
- Developers
- Administrators
- Engineers
Prerequisites
- Prior experience with databases and data modeling is helpful
- Knowledge of Java
- Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.
Course Objectives
By the end of this course, you will learn:
- The use cases and usage occasions for HBase, Hadoop, and RDBMS
- Using the HBase shell to directly manipulate HBase tables
- Designing optimal HBase schemas for efficient data storage and recovery
- How to connect to HBase using the Java API to insert and retrieve data in real time
- Best practices for identifying and resolving performance bottlenecks
Product Description
Cloudera University’s three-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.Through instructor-led discussion and interactive, hands-on exercises, you will learn to navigate the Hadoop ecosystem.
Outline
Module 1: Introduction to Hadoop and HBase
- What Is Big Data?
- Introducing Hadoop
- Hadoop Components
- What Is HBase?
- Why Use HBase?
- Strengths of HBase
- HBase in Production
- Weaknesses of HBase
Module 2: HBase Tables
- HBase Concepts
- HBase Table Fundamentals
- Thinking About Table Design
Module 3: The HBase Shell
- Creating Tables with the HBase Shell
- Working with Tables
- Working with Table Data
Module 4: HBase Architecture Fundamentals
- HBase Regions
- HBase Cluster Architecture
- HBase and HDFS Data Locality
Module 5: HBase Schema Design
- General Design Considerations
- Application-Centric Design
- Designing HBase Row Keys
- Other HBase Table Features
Module 6: Basic Data Access with the HBase API
- Options to Access HBase Data
- Creating and Deleting HBase Tables
- Retrieving Data with Get
- Retrieving Data with Scan
- Inserting and Updating Data
- Deleting Data
Module 7: More Advanced HBase API Features
- Filtering Scans
- Best Practices
- HBase Coprocessors
Module 8: HBase on the Cluster
- How HBase Uses HDFS
- Compactions and Splits
Module 9: HBase Reads and Writes
- How HBase Writes Data
- How HBase Reads Data
- Block Caches for Reading
Module 10: HBase Performance Tuning
- Column Family Considerations
- Schema Design Considerations
- Configuring for Caching
- Dealing with Time Series and Sequential Data
- Pre-Splitting Regions
Module 11: HBase Administration and Cluster Management
- HBase Daemons
- ZooKeeper Considerations
- HBase High Availability
- Using the HBase Balancer
- Fixing Tables with hbck
- HBase Security
Module 12: HBase Replication and Backup
- HBase Replication
- HBase Backup
- MapReduce and HBase Clusters
Module 13: Using Hive and Impala with HBase
- Using Hive and Impala with HBase
Module 14: Appendix A: Accessing Data with Python and Thrift
- Thrift Usage
- Working with Tables
- Getting and Putting Data
- Scanning Data
- Deleting Data
- Counters
- Filters