Useful map reduce features

09 Jan 2024

MapReduce Features

Best way to get metrics for your hadoop jobs. Theres a few variants

Using mapreduce to sort stuff is actually quite useful.

You can join large datasets but you should probs use a framework like Pig, Hive, Cascading, Cruc or Spark.

Side data is extra readonly data you need during your tasks. Few ways to do this

Job Config: you can set small KV pairs here. JobConf
Distributed Cache: Can pass metadata with -files flag, which is copied at the start to your nodes and can be retrieved during your tasks.

Hadoop also provides prebuilt mappers and reducers to do basic stuff like select and map.

oboe