Some error occured while loading page for you. Please try again.
Sort 10 Discussions, By:
Please Login in order to post a comment
This question should be clarified a bit to say how many workers are there. Otherwise both true and false can be correct. If there are less workers then maps which are running at given moment will be less. Otherwise if more or equal workers are available then all can run.
In an ideal implementation for your size M, it should be true but obviously it isn't always true and is therefore false. We don't have an infinite number of processors.
The question is wrong IMHO. Number of workers is totally independent on number of data partitions. You can have mn.
The question should be "the number of workers is typically...", "you can have M workers", "you must have at least M workers", "you can't have more than M workers", "if you have M workers, you get better performance than M/2 workers" or something like that...
True but on a logical perspective, the maximum number of physical workers on your problem is going to be M. I presume, one's operating system would hide this, no?
False. There are generally less worker nodes than partitions
Assuming I have my terminology correct, this is simply because nobody owns an infinite number of processors and therefore if the problem is large enough, we will have less?
very nice for beginner of given domain
Question should be little more clear as to how many workers we have instead of just giving the no. of input splits and asking for no. of mappers that could work parallely. If not given, we can assume for ideal scenario where-in we would have same no. of workers as same as the number of input splits and answer comes to TRUE instead of FALSE which is tagged as a correct answer in a system.
The number of mappers are related to the number of data blocks.
The number of mappers is equals to number of input splits.
The answere is true
i love it (sniper)
If there are M partitions of the input, there are M map workers running simultaneously. True or False?.....Why it is false.or n splits, n map tasks will be spanned???
I'm not an expert but my guess is it's possible that as soon as a worker finishes a task it then starts another. In this way if only one worker is available it could work through all tasks by itself.
Typically you have n < m mappers running in parallel, and each of the n mappers eventually gets through about n/m shards.
Hi Vijaygn the number of map tasks that run simultaneusly are equal to number of input splits and not number of partitions. That's the reason the answer is false
A more useful way to think about partitions is as chunks of work to be done by workers. When a map worker has finished mapping a parition, it can grab a new one from a coordinator and start working on that. This approach has a number of advantages, including workload balance (if you have a really fast worker then it may finish before other workers and would just be sitting there idle), and it allows for workers to not pull in enormous chunks of data that they would have to swap in and out of disk whenever doing work. By partitioning the data into smaller chunks, the workers can keep more data in memory which is much faster.
I won't be disclosing the answer, but any interesting info. source that somebody can quote here to further the understanding in this domain?
This is a nice example
Alright, will definitely check that out. One more thing, is it more beneficial/challenging to learn Apache Hadoop/Spark in Python or in Java??
Java is the most commonly used framework in this area.
Hi. Get best Apache Spark Training. Online training by real time experts. Schedule and attend free demo!
No more comments