设为首页 加入收藏 期刊导航 网站地图
  • 首页
  • 期刊
    • 数学与物理
    • 地球与环境
    • 信息通讯
    • 经济与管理
    • 生命科学
    • 工程技术
    • 医药卫生
    • 人文社科
    • 化学与材料
  • 会议
  • 合作
  • 新闻
  • 我们
  • 招聘
  • 千人智库
  • 我要投稿
  • 办刊

期刊菜单

  • ●领域
  • ●编委
  • ●投稿须知
  • ●最新文章
  • ●检索
  • ●投稿

文章导航

  • ●Abstract
  • ●Full-Text PDF
  • ●Full-Text HTML
  • ●Full-Text ePUB
  • ●Linked References
  • ●How to Cite this Article
AdvancesinAppliedMathematicsA^êÆ?Ð,2022,11(12),9026-9038
PublishedOnlineDecember2022inHans.http://www.hanspub.org/journal/aam
https://doi.org/10.12677/aam.2022.1112952
˜«‘Ý•›1þg·A• ~Ž{
ÁÁÁ???]]]
1
§§§XXX[[[
2∗
1
úô“‰ŒÆ1•Æ§úô7u
2
úô“‰ŒÆ§úô7u
ÂvFϵ2022c1126F¶¹^Fϵ2022c1221F¶uÙFϵ2022c1229F
Á‡
ÃõÅìÆS.ÔöL§Œ±8(•¦)˜‡`z¯K§DÚFÝeüŽ{9ÙC« Â
ñ„Ý¿ØU4<÷¿"©3‘Å• ~FÝŽ{Ä:þ§(Üg·AÆSÇÚÝ•›
A:§¿|^1þgާÀŒ1þ“OÜ?1ÛFÝOާÀ1 þ
?1ëêS“•#§JÑ‘Ý•›1þg·A• ~ Ž{§¿éŽ{Âñ5? 1`
²"ÏLÄuMNISTꊢ§yTŽ{k5§¿&ÄÑ¢ëêéŽ{-½5
K•"
'…c
ÅìÆS§g·AÆSǧݕ›§1þ§‘Å• ~FÝ{
AnAdaptiveandMomentalBoundMethod
forStochasticVarianceReducedGradient
GuiyongZhu
1
,JialinLei
2∗
1
XingzhiCollege,ZhejiangNormalUniversity,JinhuaZhejiang
2
ZhejiangNormalUniversity,JinhuaZhejiang
Received:Nov.26
th
,2022;accepted:Dec.21
st
,2022;published:Dec.29
th
,2022
∗ÏÕŠö"
©ÙÚ^:Á?],X[.˜«‘Ý•›1þg·A• ~Ž{[J].A^êÆ?Ð,2022,11(12):9026-9038.
DOI:10.12677/aam.2022.1112952
Á?]§X[
Abstract
Thetrainingprocessofmanymachinelearningmodelscanbereducedtosolvingan
optimizationproblem,andtheconvergencespeedofthetraditionalgradientdescent
algorithmanditsvariantsisnotsatisfactory.Inthispaper,basedonthestochas-
ticvariancereductiongradientalgorithm,wecombinethecharacteristicsofadaptive
learningrateandmomentlimitation,andusethebatchideatoselectlargebatch
samplesinsteadofallsamplesforthecalculationofglobalgradient,andselectsmall
batch samples for the iterative update of parameters, then propose the batch adaptive
variancereductionalgorithmwithmomentlimitation,andillustratetheconvergence
ofthealgorithm.Theeffectiveness ofthealgorithmis verified throughMNIST-based
numericalexperiments,andtheinfluenceoftheexperimentalparametersonthesta-
bilityofthealgorithmisexplored.
Keywords
MachineLearing,AdaptiveLearningRate,MomentalBound,Mini-Batch,SVRG
Copyright
c
2022byauthor(s)andHansPublishersInc.
This work is licensed undertheCreative Commons Attribution InternationalLicense(CCBY4.0).
http://creativecommons.org/licenses/by/4.0/
1.Úó
•ÄXeÃå4z¯K[1]
minP(w),P(w) =
1
n
n
X
i=1
ψ
i
(w),(1)
Ù¥ψ
i
: R
d
→R,i=1,...n.ù«.3ÅìÆS¯K¥š~ÊH,'X3ki ÒÆS¥,nL«
êþ,ψ
i
L«©a½£8.¥~i›”¼ê.
éu±þ8I¼ê,Ï~¦^FÝeü•{?1¦)[2–5],FÝeükn«CN,©O´
FÝeüŽ{!‘ÅFÝeüŽ{Ú1þ‘ÅFÝeüŽ{,ÙØÓƒ?3u¦^õ5
DOI:10.12677/aam.2022.11129529027A^êÆ?Ð
Á?]§X[
OŽ8I¼êFÝ,ŠâêþØÓ,3ëê•#O(5Ú‰1•#¤Ižmƒm?1
ï.
FÝeüŽ{[6]I‡OޤkFÝ5?1S“•#,U¼4ÐFÝeü••
¿Jø•ªÂñ„Ýy.FGDØU3‚•#.,=ØU|^Ä~f,Ù•#5KXe
w
t
= w
t−1
−
α
t
n
n
X
i=1
∇ψ
i
(w
t−1
).(2)
FGD 3¡éŒêâ8ž¬ÑyP{OŽœ¹,Ï•§3•z‡ëê•#ƒcѬéƒcƒ
Ó-#OŽ,‘ÅFÝeüŽ{[7](StochasticGradientDescent,SGD)zg•I‡Àü
‡OŽCqF݉1ëê•#5žØù«P{
w
t
= w
t−1
−α
t
∇ψ
i
t
(w
t−1
),(3)
SGD Oޤ=•FGD 1/n,Óž•Œ^u3‚ÆS,d uÀ‘Å5Ú\•,
—8I¼ê3ÂñL§¥ÑyìÅÄ.
‘ÑyU?1þ‘ÅFÝeüŽ{[8](Mini-BatchStochasticGradientDescent,
Mini-BatchSGD)
w
t
= w
t−1
−
α
t
|B|
X
i∈|B|
∇ψ
i
(w
t−1
),(4)
(4) ª¿©(ÜFÝeüŽ{Ú‘ÅFÝeüŽ{A:,=3zgëê•#žl¥‘Å
ÀÜ©?1FÝO,3˜½§ÝþQyFÝeü••,qü$‘ÅÀ•.
ƒuFÝeüó,1þ‘ÅFÝeü•U‘?Œ5ÅìÆS?Ö‡¦,E,‘ÖØ
ÙÂñ„Ý…ú":.
(4) ª¿©(ÜFÝeüŽ{Ú‘ÅFÝeüŽ{A:,=3zgëê•#žl¥
‘ÅÀÜ©?1FÝO,3˜½§ÝþQyFÝeü••,qü$‘ÅÀ
•.ƒuFÝeüó,1þ‘ÅFÝeü•U‘?Œ5ÅìÆS?Ö‡¦, E,‘
ÖØÙÂñ„Ý…ú":.
Cc5,‘ÅFÝeüŽ{®¤•ÅìÆSAO´ÝÆS:,‘XéFÝeü••ÚS
“Ú•Øä&¢,ÅìZyÑNõÄuSGD U?Ž{.MomentumŽ{[9]3DÚ‘ÅFÝe
üŽ{Ä:þV\Äþ‘,ÄuFÝÄþ\ÈgŽ¿(Ü{¤Fݦ^•ê\£Ä²þ5k
;•,\„%C•`),Nesterov\„FÝŽ{[10,11]K´3MomentumŽ{Ä:þéF
Ý‘ëê‰˜gS“•#.
3FÝeüL§¥,du`zëêé8I¼ê•6ˆØƒÓ,Ùð½ÆSǬ—ë
êS“ÑyFÝuѽöÂñ…ú¯K,•d•Ä`zŽ{UÄg·A/N!ÆSÇŒ .g
·AFÝ{[12](AdaptiveGradient,AdaGrad)ÏLÅÚ Ú•5ÆS;þ•ŠDÂ{[13](Root
DOI:10.12677/aam.2022.11129529028A^êÆ?Ð
Á?]§X[
MeanSquarePropagation,RMSprop)òÆSÇ©)¤²•FÝ•êP~²þ5 ››AdaGrad Ž
{ÆSÇeüL¯¯K;g·AÝO{[14](AdaptiveMomentEstimation,Adam)Œ±wŠ
MomentumŽ{ÚRMSPropŽ{(ÜN,|^FÝ˜ÝÚÝO5éÆSÇ?1å;
g·AÝ•›{[15](AdaptiveMomentalBound,AdaMod)^•ÏPÁ•›LpÆSÇ,ü$
AdamŽ{éÆSÇ¯a5.
,‘ÅFÝeüŽ{3‘Å)•¿‘XS“gêO\Øä\È,?
Ã{ˆ‚5Âñ.•d,ïÄö‚JÑ˜XÄu• ~‘ÅeüŽ{,~X‘Ų
þFÝeü{[16](StochasticAverageGradient,SAG)±/#FÝ“OÎFÝ0•ª,¿©•
Ä{¤FÝ…ˆ~OŽþ8;‘Åéó‹Iþ,{[17](StochasticDualCoordinate
Ascent,SDCA¤3ëê•#L§¥•;éóCþ,ü$FÝ•\„Âñ;‘Å• ~Ž{[18]
(StochasticVarianceReductionGradient,SVRG¤ÙØ%gŽ´|^ÛFÝ&E^u.•#
FÝ?1?,nØ©ÛL²,ùA«•{3A½^‡eÑU ~•¿ˆ‚5Âñ.
©12 Ü©Äug·AÆSÇŽ{ÚÚ• ~Ž{Ä:þ,JÑ‘Ý•›1þg
·A• ~Ž{(AnAdaptiveandMomentalBoundMethodforStochasticVarianceReduced
Gradient,AmbSVRG),(Üg·AÆSÇÚÝ•›A:,¿|^1þgŽéëê?1S“•
#;13 Ü©‰ÑŽ{Âñ5©Û;14 Ü©ÄuMNIST êâ8éŽ{k5?1y,
X&ÄÑ¢ëêéŽ{-½5K•;15 Ü©‰Ñ©o(.
2.Ž{0
2.1.g·AÆSÇŽ{
3FÝeüL§¥À˜‡Ü·ÆSÇ´é(J,LÆSǬ—Âñ…ú,LŒ
ÆS Ǭ—ëê¬3•ŠNCÅĽuÑ,,3¡éØÓêâ8ž,AÀØÓÆS
Çl·Aêâ8A,3FÝeüŽ{L§¥,¤këê•#Ѧ^´ð½ÆSÇ,d
uz‡ ëê•#žÂñ„ÝÑØ¦ƒÓ,I‡ŠâëêÂñœ¹©O˜ØÓÆSÇ,l
ˆéŽ{\„8.
éu½Ú•ëê•#úª•
w
t
= w
t−1
−αg
t
.
é½Ú•,AdaGrad òzg•#FÝ\\,¦ÆSÇ·Aëê,éDÕêâ8‰1
Œ•#!éÈ—êâ8‰1•#
w
t
= w
t−1
−
α
√
G
t
+
g
t
,
CþG
t
•;{¤Fݲ•Ú,´•“Ž©1•0,ÏLéÆSǦ±ù˜‡©1‘5?U
z‡ëêS“L§¥ÆSºÝ,lžØÃÄNÆSÇI‡.AdaGrad Ž{̇":
DOI:10.12677/aam.2022.11129529029A^êÆ?Ð
Á?]§X[
´duz˜‘G
t
Ñ´,©1¬‘XÔöL§ØäO\,‡L5—ÆSÇ5ªCu0,
lªŽS“.•)ûAdaGrad ÆSÇ:ìeü¯K,RMSPropÚ\•ê\²þê
E

g
2

t
= βE

g
2

t−1
+(1−β)g
2
t
,
w
t
= w
t−1
−
η
p
E[g
2
]
t
+
g
t
,
Ù¥β••êP~Xê,AdaGrad ¥Fݲ•Ú•†¤Fݲ•P~²þÏ".
Adam Óž¼AdaGrad ÚRMSProp `:,Adam Ø=XRMSProp@Äu˜ÝO
Ž·A5ëêÆSÇ,§Óž„¿©|^FÝÝþŠ.äNL«•
m
t
= β
1
m
t−1
+(1−β
1
)g
t
,
v
t
= β
2
v
t−1
+(1−β
2
)g
2
t
,
Ù¥m
t
Úv
t
´FÝ˜ÝÚÝO,du§‚ÐϬkeéÄ¯K,¤±I‡é§‚‰
˜ ?
ˆm
t
=
m
t
1−β
t
1
,
ˆv
t
=
v
t
1−β
t
2
,
¦^ùëê5S“•#Adam•#5K
w
t
= w
t−1
−
α
√
ˆv
t
+
ˆm
t
.
¦+g·AÆSÇ•{3Nõœ¹eÑéÉ•H,Adam •@•´ÝÆSµe¥¦^%
@Ž{,§E,¬‘-½5¯KµØ-½Ú4 àÆSÇ.AdaMod æ^Ä>.5•›4
àÆS„Ç,3ÔöÓžOŽg·AÆSÇ•ê•ϲþŠ,¿¦^T²þŠ5?}ÔöL
§¥LpÆSÇ
η
t
= α
t
/
p
ˆv
t
+,
S
t
= β
3
S
t−1
+(1−β
3
)η
t
,
ˆη
t
= min(η
t
,s
t
),
ÏL1‡•§½Âc²wŠÚL/•ÏPÁ0'X.w,,β
3
=0ž,AdaMod 
duAdam,3OŽÑc²wŠ,3§ÚcAdam ŽÑÆSÇη
t
¥Àј‡•Š,l
;•ÑyLpÆSÇœ¹.
duSGD ‘Å5Ú\•¦SGD3½Ú•œ¹e•Uˆg‚5Âñ„Ý,©z
3[18]•Ñ,‘ÅŽ{°ÝÚæ•¤ƒ',•ªu0 ž,Ž{ •¬•0.XÛ3
‘ÅŽ{¥~‘ÅFÝ•,òSGD‰ÚFÝeü˜‚5ÂñQ?ïÄö‚qJÑ
˜a• ~Ž{.
DOI:10.12677/aam.2022.11129529030A^êÆ?Ð
Á?]§X[
2.2.• ~Ž{
Cc5,Äu• ~‘ÅFÝeüŽ{¤•`zŽ{ïÄ¥9:¯K,ïÄö‚ƒU
JÑSAGÚSDCAƒ'Ž{,Ù¥SAGŽ{•#5K•
d= d−y
i
+∇ψ
i
(w),
w
t
= w
t−1
−
α
n
d.
SAG 3S•¥•z‡‘o˜‡ÎFÝ,‘ÅÀJ˜‡5•#ëêd,•#‘¥d
´^#FÝO†ÎFÝ,¤±zgOŽ¥•I‡9ü‡FÝ.SDCA ŸþÏLEéó
Cþ?1• ~,•ìéóCþwÚéóCþαƒm'X,w•#Œ±L«¤
w
t
= w
t−1
−γ(∇f
j
(w
t−1
)+λnα
k
j
),
‘XS“?1,(w,α)ò¬Âñ(w
∗
,α
∗
),u´(∇f
j
(w)+λnα
j
) →0,∇f
j
(w)+λnα
j
••
¬ªCu0.ùÒˆ• ~8.ùü«Ž{ÑUˆ‚5Âñ„Ý,3ëê•#L§
¥Ñ‡¦•;ÜFÝ£½éóCþ¤,éS•˜mäk˜½‡¦.SVRGŽÑSAGÚSDCA
éŒþ•;˜mI¦¯K
˜µ=
1
n
n
X
i=1
∇ψ
i
(˜w),
w
t
= w
t−1
−α(∇ψ
it
(w
t−1
)−∇ψ
it
(˜w)+˜µ).
ÙØ%gŽ´| ^ÛFÝ&EéëêS“•#¥FÝ?1?,z²{˜ÓSÌ‚OŽ˜
gÛFÝ,zg•#•õOŽügFÝ.3SGD Âñ5©Û¥I‡bFÝ•´k~
êþ.,,´Ï•ù‡þ.—SGD Ã{‚5Âñ,SVRG |^AÏFÝ•#‘¦
•k˜‡Œ±Øä~þ.,lˆü$•8,Ïd•Ò‰‚5Âñ.
1þ‘Å• ~FÝŽ{[19]£BatchingSVRG¤|^Ü©FÝCqÛFÝ,3
˜½§Ýþü$SVRGOŽþ.SVRGÚBatchingSVRG3S“L§¥Ñ¦^½ÆSÇ?
1`z¦),ÀJ·ÜÆSÇé`zŽ{–'-‡.d,þ˜gS“ëêlcS“ëêå
l,cëêl•`Š?Cž,SVRGÚBatchingSVRG q¬Lõ/|^{¤FÝ&E
ŠécFÝeü••?1?,—Ž{3•`Š?,ü$Ž{Âñ„Ý.
éuæ^½ÆSÇÚLÝ|^{¤FÝ&E¯K,©ÄuSVRG,ŠâFÝ˜
ÝÚÝO,OŽØÓëêe‡Ng·AÆS„Ç,JÑ‘Ý•›1þg·A•
~Ž{(AnAdaptiveandMomentalBoundMethodforStochasticVarianceReducedGradient,
AmbSVRG).
2.3.AmbSVRGŽ{
éu¯K(1),©òAdaMod ¥˜ÚÝO†BatchingSVRG ¥FÝ•#•ª‰(
DOI:10.12677/aam.2022.11129529031A^êÆ?Ð
Á?]§X[
Ü,JÑ‘Ý•›1þg·A• ~Ž{.
AmbSVRGŽ{:
Ñ\µÛÆSÇα
t
,•#ªÇm,ÄþXêβ
1
,β
2
,β
3
∈[0,1);
Ú0µÐ©z˜w
0
= 0,m
0
= 0,v
0
= 0,s
0
= 0;
Ú1µéÌ‚s= 1,2...T,-˜w=˜w
s−1
;
Ú2µÀJÛFÝ•#1þº€Œ|B
s
|= B,OŽÛFÝ:˜µ=
1
B
P
B
i=1
∇ψ
i
(˜w);
Ú3µ-w
0
=˜w,s:= s+1;
Ú4µéSÌ‚t= 1,2,...m,ÀJëê•#1þº€Œ|B
t
|= b,
OŽFÝ‘:g
t
=
1
b
P
b
i=1
[∇ψ
i
(w
t−1
)−∇ψ
i
(˜w)]+˜µ;
Ú5µOŽ˜ÝÚÝ:
m
t
= β
1
m
t−1
+(1−β
1
)g
t
,v
t
= β
2
v
t−1
+(1−β
2
)g
2
t
;
Ú6µ?1 ?:ˆm
t
=
m
t
1−β
t
1
,ˆv
t
=
v
t
1−β
t
2
;
Ú7µé•#‘?1•ê\²þ,•›LpÆSÇ:
η
t
= α
t
/
√
ˆv
t
+,S
t
= β
3
S
t−1
+(1−β
3
)η
t
,ˆη
t
= min(η
t
,s
t
);
Ú8µëêS“•#:w
t
= w
t−1
−ˆη
t
ˆm
t
;
Ú9µ-t:= t+1,et<m=Ú4,et= m=Ú1.
3AmbSVRG Ž{¥,Ú2 (ÜBatchingSVRG Ž{¥1þgŽ,^Ü©OŽFÝ
CqÛFÝ,3˜½^‡e,U~Ž{OŽþ,\¯Âñ;3Ú4 ¥,ò1þgŽK
\SVRGFÝ•#5K¥éF݉à O,lü$S“L§¥FÝ•,O\ÂñL§
¥-½5;3Ú5 ¥q|^•ê\²þgŽ,é˜FÝÚFÝ\\Äþ,¿©|^FÝ
&E,¿3Ú6ém
t
Úv
t
ÐÏeéįK‰ ?,¦Ž{3¡éØÓêâ8žUgÄ
NëêÆSÇ,ŒŒJpŽ{-½5ÚÔö„Ý;3Ú7,O Žg·AÆSÇ•ê•ϲ
þŠ,¿ÀÆSÇ“\ëêS“•§,³›LpÆSÇ),?˜ÚO\Ž{
-½5.
3.Âñ5©Û
!?ØAmbSVRGŽ{Âñ5,ÄkѯK(1) ¤I‡÷vÄb.
b1éÃå4z¯K(1),8I¼êψ(w) ÷v
(1)ψ•à¼ê;
(2)ψ•34Š:w
∗
,=w
∗
= argmin
w
P(w);
DOI:10.12677/aam.2022.11129529032A^êÆ?Ð
Á?]§X[
(3)ψ•|ÊF]ëY,=
ψ(w)−ψ
i
(w
∗
)−
L
2
kw−w
∗
k
2
6∇ψ
i
(w
∗
)
T
(w−w
∗
).(5)
Ù¥L>0 •|ÊF]~ê.
b2P(w)´µ-rà¼ê
P(w)−P(w
∗
)−
µ
2
kw−w
∗
k
2
>∇P(w
∗
)
T
(w−w
∗
).(6)
½n3.1.3b1,b2^‡e,µ>0,bm¿©Œk
α=
b
µmR(b+2LR)
+
2LR
b+2LR
<1,
E[P(˜w
s
)−P(w
∗
)] ≤α
s
E[P(˜w
0
)−P(w
∗
)],@oAmbSVRGŽ{ÒPk‚5Âñ„Ý.
y²é?¿i,^g
t
L«1tÚž|¢••,Kk
g
t
=
1
b
b
X
i=1
[∇ψ
i
(w
t−1
)−∇ψ
i
(˜w)]+
1
B
B
X
i=1
∇ψ
i
(˜w),
d©z[18]Œ•
1
n
n
X
i=1
k∇ψ
i
(w)−∇ψ
i
(w
∗
)k
2
2
62L[P(w)−P(w
∗
)].(7)
ég
t
ÙÏ"
Ekg
t
k
2
2
= Ek
1
b
b
X
i=1
[∇ψ
i
(w
t−1
)−∇ψ
i
(˜w)]+
1
B
B
X
i=1
∇ψ
i
(˜w)k
2
2
62Ek
1
b
b
X
j=1
∇ψ
i
(w
t−1
)−
1
b
b
X
i=1
∇ψ
i
(w
∗
)k
2
2
+2Ek
1
b
b
X
i=1
∇ψ
i
(˜w)−
1
b
b
X
i=1
(w
∗
)k
2
2
−∇p(˜w)k
2
2
= 2Ek
1
b
b
X
j=1
∇ψ
i
(w
t−1
)−
1
b
b
X
i=1
∇ψ
i
(w
∗
)kv
2
+2Ek
1
b
b
X
i=1
∇ψ
i
(˜w)
−
1
b
b
X
i=1
(w
∗
)k
2
2
−E[∇ψ
i
(˜w)−∇ψ
i
(w
∗
)]k
2
2
62Ek
1
b
b
X
j=1
∇ψ
i
(w
t−1
)−
1
b
b
X
i=1
∇ψ
i
(w
∗
)k
2
2
+2Ek
1
b
b
X
i=1
∇ψ
i
(˜w)−
1
b
b
X
i=1
∇ψ
i
(w
∗
)k
2
2
64
L
b
[P(w
t−1
)−P(w
∗
)+P(˜w)−P(w
∗
)].(8)
DOI:10.12677/aam.2022.11129529033A^êÆ?Ð
Á?]§X[
1˜‡Øª¦^ka+bk
2
2
62kak
2
2
+2kbk
2
2
,^˜µ=
1
B
P
B
i=1
∇ψ
i
(˜w) “O∇P(˜w),…∇P(˜w)=
E[∇ψ
i
(˜w)−∇ψ
i
(w
∗
)];1‡Øª¦^é?¿ξ, ÷vEkξ−Eξk
2
2
= Ekξk
2
2
−kEξk
2
2
6Ekξk
2
2
;
1n‡Øª¦^(7)ª.
d(8)ªŒ•kg
t
k´kþ.,Ø”•G,3AmbSVRG Ž{Ú5¥,m
t
=β
1
m
t−1
+
(1−β
1
)g
t
,m
0
= 0ž,éªfÐmŒ
m
t
= (1−β
1
)

g
t
+β
1
g
t−1
+β
2
1
g
t−2
+···+β
t−1
1
g
1

,
¦ÙÏ"•
Ekˆm
t
k
2
2
= Ek
1−β
1
(1−β
t
1
)
(g
t
+β
1
g
t−1
+β
2
1
g
t−2
+···+β
t−1
1
g
1
k
2
2
=
1−β
1
(1−β
t
1
)
Ekg
t
+β
1
g
t−1
+β
2
1
g
t−2
+···+β
t−1
1
g
1
k
2
2
6
1−β
1
(1−β
t
1
)
(1+β
1
+β
2
1
+···+β
t−1
1
)G= G.(9)
d(9)ªŒ•kˆm
t
kk.,ÓnŽ{Ú6 ¥kˆv
t
kÚÚ7 ¥kˆη
t
k•´kþ.,=•3R>0,¦
kˆη
t
k6R,k
Ekw
t
−w
∗
k
2
2
= Ekw
t−1
−ˆη
t
ˆm
t
−w
∗
k
2
2
= kw
t−1
−w
∗
k
2
2
+2R(w
t−1
−w
∗
)
>
E[bm
t
]+R
2
Ekˆm
t
k
2
2
6kw
t−1
−w
∗
k
2
2
+2R(w
t−1
−w
∗
)
>
∇P(w
t−1
)
+
4LR
2
b
[p(w
t−1
)−p(w
∗
)+p(˜w)−p(w
∗
)]
6kw
t−1
−w
∗
k
2
2
−2R[p(w
t−1
)−p(w
∗
)]
+
4LR
2
b
[p(w
t−1
)−p(w
∗
)+p(˜w)−p(w
∗
)]
= kw
t−1
−w
∗
k
2
2
−2R(1+2
LR
b
)[p(w
t−1
)−p(w
∗
)]
+
4LR
2
b
[p(˜w)−p(w
∗
)],
Ù¥1˜‡ØªdE[bm
t
] 6E[g
t
] = ∇P(w
t−1
)Œ,1‡Øª¦^P(w)à5
p(w
t−1
)−p(w
∗
) 6(w
t−1
−w
∗
)
>
∇P(w
t−1
).
DOI:10.12677/aam.2022.11129529034A^êÆ?Ð
Á?]§X[
òþãØªét= 1,2,...,m¦Ú,Œ
Ekw
m
−w
∗
k
2
2
+2mR(1+2
LR
b
)E[P(˜w
s
)−P(w
∗
)]
6Ekw
0
−w
∗
k
2
2
+
4mLR
2
b
E[P(˜w)−P(w∗)]
= Ek˜w−w
∗
k
2
2
+
4mLR
2
b
E[P(˜w)−P(w
∗
)]
6
2
µ
E[P(˜w)−P(w
∗
)]+
4mLR
2
b
E[P(˜w)−P(w
∗
)]
= 2(µ
−1
+
2mLR
2
b
)E[P(˜w)−p(w
∗
)],
é1‡Øª¦^(7)ª,ddŒ±µ
E[P(˜w
s
)−P(w
∗
)] 6
"
b
µmR(b+2LR)
+
2LR
b+2LR
#
E[P(˜w
s−1
)−P(w
∗
)],
lŽ{Âñ5µE[P(˜w
s
)−P(w
∗
)] ≤α
s
E[P(˜w
0
)−P(w
∗
)]y.
dŽ{Âñ5Œ±wÑ,Ž{Âñ„݆ëêbÚmk','u1þëêbÚ•#ªÇ
méŽ{Âñ„Ý9-½5K•ò3ꊢÜ©Ðm`².
4.ꊢ
©Ž{ÄuPython3.7¤,Näeë•©z[20],Ù¥-¹¼ê•ReLU ¼ê,éu
Ž{¢y,ëê•CPU:Intel(R)Core(TM)i5-6200U2.40GHz,S•:4.00GB.¦^Ì
6êâ8MNISTéŽ{k5?1y,êâ8̇ëê„L1µ
Table1.Datasetintroduction
L1.êâ80
8ã¡a.㡌Ôö8êþÿÁ8êþ
MNIST•Ýã28*282000010000
ꊢo•)üÜ©µ11 ܩ̇´©Ž{†cÌ6Ž{ƒmé';12Ü©K
´Ž{ëêéŽ{-½5K•.
4.1.Ì6Ž{é'
ÄuPython3.7¤AmbSVRGŽ{?,òÙ†Ì6Ž{Mini-BatchSGD!Batching
SVRG!AdaGrad!RMSPropŽ{?1é'.
DOI:10.12677/aam.2022.11129529035A^êÆ?Ð
Á?]§X[
lã1Œ±wѵ
(1)lЩS“O(Çþw,AmbSVRG 3S“1˜‡epochž,ÙÿÁ8O(Lj60%
±þ,BatchingSVRG ÚRMSProp ©Oˆ53% Ú48%,Mini-BatchSGD ÚAdagrad•ˆ
40%,Œ„,3ЩS“O(Çþ,AmbSVRG‡`uÙ¦Ž{.
(2)lÂñL§þw,ØAdaGrad Ž{ÂñL§Ø²wƒ,Ù¦Ž{ÑU-½/Âñ
•`);lÿÁ8O(Çþ,·Ý5w,3S“ÐÏ,AmbSVRG þ,·Ý•Í,Œ„ÙÂñ„
Ý•¯.
(3)lŽ{O(Çw,AmbSVRG Ž{O(Lj91.4%,RMSProp•'Ù$0.8 ‡z©
:,BatchingSVRG O(Lj85.7%,E'AdaGrad ÚMini-BatchSGD Ž{`D.
Figure1.Accuracycomparisonofmainstreamalgorithms
ã1.Ì6Ž{O(Çé'
4.2.-½5&Ä
duŽ{-½5̇É1þº€Œb±9•#ªÇmK•,©òéØÓëꜹeŽ
{ÿÁ8O(Ç?1&Ä,©OlMNIST êâ8¥À1þº€Œ•100!300!500 
,“\•#ªÇ•10!30!50 Ž{¥, ÑØÓëꜹeŽ{ÔöžmÚÿÁ8O(Ç,
¢ëê9éAµd•I„L2.•;•¢ó,5,é¢êâ83ƒÓëê^‡e?1
ngÓ¢,¿ng¢²þŠŠ••ª(J.
Table2.Theinfluenceofexperimentalparametersontheconvergenceeffectofthealgorithm
L2.¢ëêéŽ{ÂñJK•&Ä
•#ªÇm
b= 100b= 300b= 500
traintimetestaccuracytraintimetestaccuracytraintimetestaccuracy
m= 1019.34s85.2%22.26s88.9%23.86s91.4%
m= 3024.22s87.5%28.69s89.1%35.12s89.5%
m= 5032.05s86.0%36.71s87.2%46.89s86.7%
DOI:10.12677/aam.2022.11129529036A^êÆ?Ð
Á?]§X[
dL2Œ•,3Ôöžmþ,‘X1þº€ŒbÚ•#ªÇmO\,Ž{OŽ“d\
Œ,ÙƒéÔöžm•ÅìO\;3ÿÁ8O(Çþ,^‡b= 500 žŽ{ÿÁ8 O(ÇoN‡
'b= 300Úb= 100œ¹e••Œ*.ddŒ±wÑ,1þº€ŒbéŽ{ÿÁ8O(Çk
†K•,…äk˜½'~'X;3b†m'X&Äþ,Œ±wm=10 Úm=30 ž,
‘XbO\,Ž{ÿÁ8O(Ç•ÅìO\,m=50 ž,bO\ŒU¬‘5O(Çeü,
ŒUmO\—SÌ‚ëê•#FÝÚÌ‚ÛF݃CŒ,éÂñ„ÝÚÂñO(Ç
E¤˜½K•.
µØÏL4.1!Œ±wÑ,AmbSVRGŽ{Âñ„ÝÚÂñ°ÝLyâÑ,ÙÂñ°ÝÚ
RMSPropŽ{ƒ,¿…3ÐÏŽ{Âñ„Ý`uÙ¦Ž{.ÏL4.2!,)1þº€Œ
bÚ•#ªÇméŽ{Âñ„ÝÚÂñ°ÝK•,éŽ{ÂñJNëäk˜½ë•¿Â.
5.(Ø
©Äu‘Å• ~FÝŽ{,(Üg·AÆSÇÚÝ•›A:,¿|^1þgŽ,À
Œ1þ “OÜ?1ÛFÝOŽ,À1þ?1ëêS“•#,ŽÑ‘Å
FÝeüŽ{duÀ‘Å5Ú\p•¯K,±9 FÝeü{ð½ÆSÇ—ëê3
•`:5£!Âñ„Ý…ú":,Óžég·A ÆS ÇŽ{³›LpÆSÇ).Ï
dAmbSVRGŽ{äkûÐ-½5Ú¯Âñ„Ý,¿…¢(JL²,3†Ì6Ž{é'e,
AmbSVRGŽ{k5•y.
É›uÏ,Ã{yŽ{3E,ä(þk5,ÓžduëêÀéŽ{
OŽÇk˜½K•,ÏdXÛÀëêéJpŽ{OŽÇ•äk˜½y¢¿Â.
ë•©z
[1]Bottou,L.(2010)Large-ScaleMachineLearningwithStochasticGradientDescent.In:
Lechevallier,Y.andSaporta,G.,Eds.,ProceedingsofCOMPSTAT’2010,Physica-VerlagHD,
177-186.https://doi.org/10.1007/978-3-7908-2604-316
[2]Sebastian,R.(2016)AnOverviewofGradientDescentOptimizationAlgorithms.Computer
ScienceMachineLearning,1-12.
[3]Lecun, Y.and Bottou,L.(1998)Gradient-BasedLearning Appliedto Document Recognition.
ProceedingsoftheIEEE,86,2278-2324.https://doi.org/10.1109/5.726791
[4]Bubeck, S.(2015) Convex Optimization:Algorithms and Complexity. FoundationsandTrends
inMachineLearning,8,231-357.https://doi.org/10.1561/2200000050
[5]Bottou,L., Curtis,F.E.andNocedal, J.(2018) OptimizationMethods forLarge-Scale Machine
Learning.SIAMReview,60,223-311.https://doi.org/10.1137/16M1080173
DOI:10.12677/aam.2022.11129529037A^êÆ?Ð
Á?]§X[
[6]Nesterov,Y.(2013)GradientMethodsforMinimizingCompositeFunctions.Mathematical
Programming,140,125-161.https://doi.org/10.1007/s10107-012-0629-5
[7]Robbins, H. andMonro,S. (1951) AStochastic Approximation Method. AnnalsofMathemat-
icalStatistics,22,400-407.https://doi.org/10.1214/aoms/1177729586
[8]Li,M.,Zhang,T.,Chen,Y.,etal.(2014)EfficientMini-BatchTrainingforStochasticOp-
timization.Proceedingsofthe20thACMSIGKDDConferenceonKnowledgeDiscoveryand
DataMining,NewYork,24-27August2014,661-670.
[9]Qian,N.(1999)OntheMomentumTerminGradientDescentLearningAlgorithms.Neural
Networks,12,145-151.https://doi.org/10.1016/S0893-6080(98)00116-6
[10]Nesterov,Y.(1983)AMethodofSolvingaConvexProgrammingProblemwithConvergence
RateO(1/k
2
).SovietMathematicsDoklady,27,372-376.
[11]Botev,A.,Lever,G.andBarber,D.(2017)Nesterov’sAcceleratedGradientandMomentum
asApproximationstoRegularisedUpdateDescent.2017InternationalJointConferenceon
NeuralNetworks,Anchorage,AK,14-19May2017,1899-1903.
[12]Duchi, J., Hazan,E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning
andStochasticOptimization.JournalofMachineLearningResearch,12,2121-2159.
[13]Tieleman, T. and Hinton, G. (2012)Lecture 6.5-RMSProp:Dividethe Gradient by a Running
AverageofItsRecentMagnitude.COURSERA:NeuralNetworksforMachineLearning,4,
26-30.
[14]Kingma,D.andBa,J.(2015)Adam:AMethodforStochasticOptimization.Proceedingsof
the3rdInternationalConferenceonLearningRepresentations,San Diego,7-9May 2015, 1-13.
[15]Ding,J.,Ren,X.,Luo,R.,etal.(2019)AnAdaptiveandMomentalBoundMethodfor
StochasticLearning.arXiv:1910.12249
[16]Schmidt,M.,LeRoux,N.andBach,F.(2017)MinimizingFiniteSumswiththeStochastic
AverageGradient.MathematicalProgramming,162,83-112.
[17]Shalev,S.andZhang,T.(2013)StochasticDualCoordinateAscentMethodsforRegularized
LossMinimization.JournalofMachineLearningResearch,14,567-599.
https://doi.org/10.1007/s10107-016-1030-6
[18]Johnson,R.andZhang,T.(2013)AcceleratingStochasticGradientDescentUsingPredictive
VarianceReduction.Proceedingsofthe27thNeuralInformationProcessingSystems,Lake
Tahoe,5-10December2013,315-323.
[19]Babanezhad,R., Ahmed, M.O., Virani,A., etal.(2015) Stop Wasting MyGradients:Practical
SVRG.Proceedingsofthe28thInternationalConferenceonNeuralInformationProcessing
Systems,2,2251-2259.
[20]Qiu, X.P. (2020)NeuralNetworksandDeepLearning.ChinaMachine Press,Beijing,160-169.
DOI:10.12677/aam.2022.11129529038A^êÆ?Ð

版权所有:汉斯出版社 (Hans Publishers) Copyright © 2021 Hans Publishers Inc. All rights reserved.