How do Hadoop ChainMapper and ChainReducer reduce disk IO -

- July 15, 2012

i want chain multiple mapreduce jobs i.e. previous mapreduce job's output input of next mapreduce job. because output large , disk io overload heavy, find alternative solution reduce io bottleneck. found chainmapper/chainreducer api. document mentioned following properties

"using chainmapper , chainreducer classes possible compose map/reduce jobs [map+ / reduce map*]. , immediate benefit of pattern dramatic reduction in disk io."

but don't quite understand why using chainmapper/chainreducer reduce disk io. , reduce io, how should use these 2 apis?

as per understanding, though have multiple mappers, chain mapper considers them single task.till task complete,there no intermediate write.

see below statement javadoc.

the mapper classes invoked in chained (or piped) fashion, output of first becomes input of second, , on until last mapper, output of last mapper written task's output.

Search This Blog

Code wiki

How do Hadoop ChainMapper and ChainReducer reduce disk IO -

Comments

Post a Comment

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

sql - Is there any inbuilt stored procedure which will return the output of a query as an XML document..? -

Unable to remove the www from url on https using .htaccess -