﻿ 基于机器学习的展示广告点击率预测研究 Research on Click-Through Rate Prediction in Display Advertising Based on Machine Learning

Hans Journal of Data Mining
Vol. 09  No. 02 ( 2019 ), Article ID: 29882 , 8 pages
10.12677/HJDM.2019.92008

Research on Click-Through Rate Prediction in Display Advertising Based on Machine Learning

Zhiyue Zhang, Hao Huang

College of Information, University of International Business and Economics, Beijing

Received: Apr. 5th, 2019; accepted: Apr. 18th, 2019; published: Apr. 25th, 2019

ABSTRACT

1. 引言

2. 相关工作介绍

3. 实验过程

3.1. 实验数据及描述

3.2. 数据预处理

Table 2. Label coding and one-hot coding effect comparison

Table 3. Post-processing feature list

3.3. 广告点击率预估模型

3.3.1. 单模型广告点击率预估算法

1) 逻辑回归模型

$X={W}_{0}+{W}_{1}{X}_{1}+\cdots +{W}_{n}{X}_{n}$ (1)

$f\left(x\right)=\frac{1}{1+{\text{e}}^{-X}}$ (2)

$p\left(C|X\right)=\frac{1}{1+{\text{e}}^{-f\left(X\right)}}$ (3)

2) 决策树模型

(4)

$\text{Gini}\left(D,A\right)=\frac{|{D}_{1}|}{|D|}Gini\left({D}_{1}\right)+\frac{|{D}_{2}|}{|D|}Gini\left({D}_{2}\right)$ (5)

CART决策树通过上述方式不断选择最优划分特征从而得到一颗分类决策树，从而对样本进行分类与预测。

3.3.2. 集成学习模型广告点击率预估算法

1) 自助聚集法(Bagging)

2) 提升法(Boosting)

3.4. 模型评价标准

AUC值大于等于0.5且小于等于1，当AUC为0.5则代表与随机猜想效果相同。因此，AUC值通常应大于0.5。在这种情况下，AUC的值越大，则说明模型效果越好。

4. 实验结果

4.1. 不同特征组合对广告点击率预测的影响

Table 4. AUC and accuracy with different feature combinations

4.2. 不同机器学习模型对比分析

Table 5. Comparative analysis of different machine learning models

Table 6. Different machine learning models consume time in test set

5. 结语

Research on Click-Through Rate Prediction in Display Advertising Based on Machine Learning[J]. 数据挖掘, 2019, 09(02): 60-67. https://doi.org/10.12677/HJDM.2019.92008

1. 1. 艾瑞咨询. 2018年中国网络广告市场年度监测报告–简版. http://report.iresearch.cn/report/201808/3264.shtml

2. 2. Richardson, M., Dominowska, E. and Ragno, R. (2007) Predicting Clicks: Estimating the Click-Through Rate for New Ads. International Conference on Word Wide Web, ACM, 521-530. https://doi.org/10.1145/1242572.1242643

3. 3. 肖垚, 毕军芳, 韩易, 董启文. 在线广告中点击率预测研究[J]. 华东师范大学学报(自然科学版), 2017(5): 80-86+100.

4. 4. Dave, K. and Varma, V. (2010) Predicting the Click-Through Rate for Rare/New Ads. Center for Search and Information Extraction Lab International Institute of Information Technology, Hyderabad.

5. 5. He, X., Pan, J., Jin, O., et al. (2014) Practical Lessons from Predicting Clicks on Ads at Facebook. Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ACM, 1-9.

6. 6. Bagherjeiran, A., Hatch, A., Ratnaparkhi, A., et al. (2010) Large-Scale Customized Models for Advertisers. IEEE International Conference on. Data Mining Workshops (ICDMW), 1029-1036. https://doi.org/10.1109/icdmw.2010.157

7. 7. Zhang, W., Du, T. and Wang, J. (2016) Deep Learning over Mul-ti-Field Categorical Data. European Conference on Information Retrieval, Springer, Cham, 45-57.

8. 8. Cheng, H.T., Koc, L., Harmsen, J., et al. (2016) Wide & Deep Learning for Recommender Systems. Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, ACM, 7-10. https://doi.org/10.1145/2988450.2988454

9. 9. Qu, Y., Cai, H., Ren, K., et al. (2016) Product-Based Neural Networks for User Response Prediction. IEEE 16th International Conference on Data Mining (ICDM), 1149-1154. https://doi.org/10.1109/icdm.2016.0151

10. 10. Xiao, J., Ye, H., He, X., et al. (2017) Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. arXiv:1708.04617. https://doi.org/10.24963/ijcai.2017/435

11. 11. Zhou, G., Song, C., Zhu, X., et al. (2017) Deep Interest Network for Click-Through Rate Prediction. arXiv:1706.06978.

12. 12. 施梦圜, 顾津吉. 基于平衡采样的轻量级广告点击率预估方法[J]. 计算机应用研究, 2014, 31(1): 33-36+39.

13. 13. 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012: 55-60.