Abstract:
It is difficult to predict the amount of memory for a mapreduce job. Based on the fact that Java virtual machine (JVM) divides the heap space managed by the JVM garbage collector into young and old generations, a generational memory prediction method is put forward. We build up a function that models the relationship between the amount of young generation and the total garbage collection time, and then we use a constrained nonlinear optimization model to find the rational footprint of young generation. The memory model for the map phase is established, the phase of a mapreduce job is reduced, then a relationship between map/reduce tasks' performance (runtime of a task) and the amount of memory of the old generation is set up, and finally, the reasonable old generation memory size is obtained. The experimental results show that the proposed approach can accurately predict the memory size of map and reduce the tasks of a mapreduce job. In comparison with the default configuration, the proposed approach can give us 6 times performance improvement than default settings.