代价敏感的GEP分类算法实现
Cost-Sensitive Classification by Gene Expression Programming
-
摘要: 在数据挖掘领域中,通常以分类精度作为分类算法效果的评估标准。这一标准是建立在假设任意一实例被误分类为任意类时都具备同样代价的基础上的。当此假设不成立时,直接使用传统分类方法就无法取得良好的分类和预测效果。针对这一问题,通过改进编解码方法以及在适应度函数中集成样本的不同误分类代价,提出了一种基于基因表达式程序设计的代价敏感分类算法(CSC-GEP),并在三个UCI数据集上对该算法进行了测试,实验结果表明CSC-GEP是一种有效的代价敏感分类算法。Abstract: In data mining reseach, the classification algorithms generally pursue more highly accuracy. It is based on the assumption that all misclassifications have the same cost. However, the assumption is not correct in the real world, so that the normal classification algorithms do not perform well. By improving the encode/decode methods and taking different misclassification cost into account, this paper concerns a new cost-sensitive algorithm called CSC-GEP based on Gene Expression Programming (GEP). The experimental results show that the new algorithm is effective.