The Hadoop open-source software framework is widely used for reliable, and scalable distributed computing on a cluster of machines.
This competency area includes implementing advanced parallelism, implementing Counters, performing basic queries and subqueries in Hive, among others.
- Implement advanced parallelism in MapReduce using a Combiner - Difference between a reducer and a combiner, using custom writable data types. Applicable for Developer.
- Use Partitioners to control the number of reducers - Configure the right partition based on the use case. Applicable for Developer.
- Implement Counters - To log mapper and reducer statistics, customize statistics using counters in code. Applicable for Developer.
- Configure Map, Shuffle/Reduce, and Job parameters - Optimize disk space, memory, and other resource usages. Applicable for Administration, Developer.
- Configure High Availability for the namenode using QJM - Configure machines to run the JournalNodes for HA. Applicable for Administration, Developer.
- Install and set up Hive for data warehousing with Hadoop - Set up Hive to work with a Hadoop installation. Applicable for Operations, Developer.
- Perform basic queries and subqueries in Hive - Run basic queries using the Beeline or HCatalog CLI. Applicable for Analyst, Developer.
- Execute windowing and analytic functions and aggregations in Hive - Perform joins, window operations, grouping, rollup. Applicable for Analyst, Developer.
- Configure Hadoop Ozone for Object Storage with Hadoop - Work with Ozone using the command line and programming libraries. Applicable for Operations, Developer.