Map Reduce, does reducer automatically sorts?

up vote
0
down vote

favorite

there is something that i do not have clear about the whoel functioning view of a MapReduce programming environment.

Considering to have 1k of random unsorted words in the form (word, 1) coming out from a (or more than one) mapper. Suppose with the reducer i wanna save them all inside a single huge sorted file. How does it works? I mean, the reducer itself sort all the words automatically? What does the reducer function should do? What if i have just one reducer with limited ram and disk?

asked Nov 8 at 18:46

rollotommasi

3618

add a comment |

up vote
0
down vote

favorite

there is something that i do not have clear about the whoel functioning view of a MapReduce programming environment.

asked Nov 8 at 18:46

rollotommasi

3618

add a comment |

up vote
0
down vote

favorite

there is something that i do not have clear about the whoel functioning view of a MapReduce programming environment.

asked Nov 8 at 18:46

rollotommasi

3618

there is something that i do not have clear about the whoel functioning view of a MapReduce programming environment.

hadoop mapreduce reduce

asked Nov 8 at 18:46

rollotommasi

3618

asked Nov 8 at 18:46

rollotommasi

3618

asked Nov 8 at 18:46

rollotommasi

3618

asked Nov 8 at 18:46

rollotommasi

3618

asked Nov 8 at 18:46

rollotommasi

3618

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

enter image description here

when the reducer get the data ,the data has already be sorted in the map side .

the process is like this

Map side:

1. Each inputSplit will be processed by a map task, and the result of the map output will be temporarily placed in a circular memory buffer [ SHUFFLE ](the size of the buffer is 100M by default, controlled by the io.sort.mb property). When the buffer is about to overflow (the default is 80% of the buffer size), an overflow file will be created in the local file system .

2. Before writing to the disk, the thread first divides the data into the same number of partitions according to the number of reduce tasks, that is, a reduce task corresponds to the data of one partition. to avoid some of the reduction tasks being assigned to large amounts of data, even without data. In fact, the data in each partition is sorted. If the Combiner is set at this time, the sorted result is subjected to the Combiner operation.

3. When the local task outputs the last record, there may be a lot of overflow files, and these files need to be merged. The sorting and combining operations are continually performed during the merge process for two purposes: 1. Minimize the amount of data written to disk each time; 2. Minimize the amount of data transferred by the network during the next replication phase. Finally merged into a partitioned and sorted file. In order to reduce the amount of data transmitted over the network, you can compress the data here, just set mapred.compress.map.out to true.

4. Copy the data from the partition to the corresponding reduce task.

Reduce side:

1.Reduce will receive data from different map tasks, and the amount of data sent from each map is ordered. If the amount of data accepted by the reduce side is quite small, it is directly stored in the memory. If the amount of data exceeds a certain proportion of the size of the buffer, the data is merged and written to the disk.

2. As the number of overflow files increases, the background thread will merge them into a larger, more ordered file. In fact, regardless of the map side or the reduce side, MapReduce repeatedly performs sorting and merging operations.

3. The merge process will generate a lot of intermediate files (written to disk), but MapReduce will make the data written to the disk as small as possible, and the result of the last merge is not written to the disk, but directly input To reduce the function.

answered Nov 10 at 7:03

HbnKing

6021315

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53214246%2fmap-reduce-does-reducer-automatically-sorts%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

enter image description here

when the reducer get the data ,the data has already be sorted in the map side .

answered Nov 10 at 7:03

HbnKing

6021315

add a comment |

up vote
0
down vote

enter image description here

when the reducer get the data ,the data has already be sorted in the map side .

answered Nov 10 at 7:03

HbnKing

6021315

add a comment |

up vote
0
down vote

enter image description here

when the reducer get the data ,the data has already be sorted in the map side .

answered Nov 10 at 7:03

HbnKing

6021315

enter image description here

when the reducer get the data ,the data has already be sorted in the map side .

answered Nov 10 at 7:03

HbnKing

6021315

answered Nov 10 at 7:03

HbnKing

6021315

answered Nov 10 at 7:03

HbnKing

6021315

answered Nov 10 at 7:03

HbnKing

6021315

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

zskqFx9P,TO6bEKHH4 sK4LRh VD48lLVC8aHFauHinMhn85L

搜尋此網誌

Wsrtjtyk