一种Mapreduce作业内存精确预测方法

An Innovative Memory Prediction Approach for Mapreduce Job

  • 摘要: 针对准确预测mapreduce作业内存资源需求困难的问题,根据Java虚拟机(JVM)的分代(JVM将堆内存划分为年轻代和年长代)内存管理特点,该文提出一种分代内存预测方法。建立年轻代大小与垃圾回收时间的模型,将寻找合理年轻代大小的问题转换为一个受约束的非线性优化问题,并设计搜索算法求解该优化问题。建立mapreduce作业的map任务和reduce任务性能与内存的关系模型,求解最佳性能的内存需求,从而获得map任务和reduce任务的年长代内存大小。实验结果表明,本文提出的方法能准确预测作业的内存需求;与默认配置相比,能提供平均6倍的性能提升。

     

    Abstract: It is difficult to predict the amount of memory for a mapreduce job. Based on the fact that Java virtual machine (JVM) divides the heap space managed by the JVM garbage collector into young and old generations, a generational memory prediction method is put forward. We build up a function that models the relationship between the amount of young generation and the total garbage collection time, and then we use a constrained nonlinear optimization model to find the rational footprint of young generation. The memory model for the map phase is established, the phase of a mapreduce job is reduced, then a relationship between map/reduce tasks' performance (runtime of a task) and the amount of memory of the old generation is set up, and finally, the reasonable old generation memory size is obtained. The experimental results show that the proposed approach can accurately predict the memory size of map and reduce the tasks of a mapreduce job. In comparison with the default configuration, the proposed approach can give us 6 times performance improvement than default settings.

     

/

返回文章
返回