hadoop - TestDFSIO benchmarking on cdh 5.8.0 -


environment details:

os : centos 7.2 cdh: cdh 5.8.0 hosts: 11 ( 2 masters, 4 dn+nm, 5 nm)

yarn.nodemanager.resource.memory-mb 32074mb (for nodemanager group1) 82384mb (for nodemanager group2)

i have hadoop cluster 11 nodes, 2 masters, 4 slaves datanode & nodemanager daemons running, 5 nodes nodemanager daemon running on them. on cluster, running testdfsio benchmarking job 8tb load having 10000 files , file size of 800mb each. have noticed few things not understand properly.

1) number of splits job shown 10000. how come 10000 splits, dfs.blocksize shows 128mb, going setting, number of splits should more 10000 right?

2) in resoucemanager web ui, saw on 5 computenodes ( nodes on nodemanager running) 32 map tasks have run on each of these nodes. other map tasks being run on 4 dn+nm nodes. why happening? have allocated 9 slave nodes 2 node groups. 4 dn+nm nodes in nodemanager group1 , other 5 slaves in nodemanager group2. yarn.nodemanager.resource.memory-mb slaves in nodemanager group1 32074mb , slaves in nodemanager group2 82384mb. think ideally, 5 slave nodes in nodemanager group2 should take more map taks. why not happening?

  1. afair - testdfsio allocated map task per file. why end same number of map tasks, tho block size smaller.

  2. how data locality configured? mappers prefer nodes data local. explain why more tasks on nodes datanodes being local.


Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -