设为首页 加入收藏 期刊导航 网站地图
  • 首页
  • 期刊
    • 数学与物理
    • 地球与环境
    • 信息通讯
    • 经济与管理
    • 生命科学
    • 工程技术
    • 医药卫生
    • 人文社科
    • 化学与材料
  • 会议
  • 合作
  • 新闻
  • 我们
  • 招聘
  • 千人智库
  • 我要投稿
  • 办刊

期刊菜单

  • ●领域
  • ●编委
  • ●投稿须知
  • ●最新文章
  • ●检索
  • ●投稿

文章导航

  • ●Abstract
  • ●Full-Text PDF
  • ●Full-Text HTML
  • ●Full-Text ePUB
  • ●Linked References
  • ●How to Cite this Article
AdvancesinAppliedMathematicsA^êÆ?Ð,2022,11(7),4248-4267
PublishedOnlineJuly2022inHans.http://www.hanspub.org/journal/aam
https://doi.org/10.12677/aam.2022.117452
¦)Ãå`z¯K‘Ån‘
ÝFÝ{
444ZZZ§§§ÅÅÅûûû
∗
“ŒÆ§êƆÚOÆ§ìÀ“
ÂvFϵ2022c64F¶¹^Fϵ2022c629F¶uÙFϵ2022c76F
Á‡
•¦)Ãå‘Å`z¯K§·‚JÑ˜«‘• ~‘Ån‘ÝFÝ{(STCGVR)§
d•{Œ±^5)ûšà‘ůK"3Ž{zgSÌ‚S“m©ž§n‘ÝFÝ••±•„
e ü••-#m©S“§k/JpÂñ„Ý"3·^‡e§?ØTŽ{5ŸÚÂñ
5"êŠ(JL²§·‚•{éu¦)ÅìÆS¯KäkãŒdå"
'…c
‘ÅCq§²ºx•z§n‘ÝFݧÅìÆS§• ~
AStochasticThree-TermConjugate
GradientMethodforUnconstrained
OptimizationProblems
LeiLiu,DanXue
∗
SchoolofMathematicsandStatistics,QingdaoUniversity,QingdaoShandong
Received:Jun.4
th
,2021;accepted:Jun.29
th
,2022;published:Jul.6
th
,2022
∗ÏÕŠö"
©ÙÚ^:4Z,Åû.¦)Ãå`z¯K‘Ån‘ÝFÝ{[J].A^êÆ?Ð,2022,11(7):4248-4267.
DOI:10.12677/aam.2022.117452
4Z§Åû
Abstract
To solve unconstrained stochastic optimization problems,a stochastic three-term con-
jugategradientmethodwithvariancereduction(STCGVR)isproposed,whichcan
beusedtosolvenonconvexstochasticproblems.Atthebeginningofeachinnerloop
iteration,thethreeconjugategradientdirectionsrestarttheiterationinthesteepest
descentdirection,whicheffectivelyimprovestheconvergencespeed.Theproperties
andconvergenceofthealgorithmarediscussedunderappropriateconditions.The
numericalresultsdemonstratethatourmethodhasdramaticalpotentialformachine
learningproblems.
Keywords
StochasticApproximation,EmpiricalRiskMinimization,Three-TermConjugate
Gradient,MachineLearning,VarianceReduction
Copyright
c
2022byauthor(s)andHansPublishersInc.
This work is licensed undertheCreative Commons Attribution InternationalLicense(CCBY4.0).
http://creativecommons.org/licenses/by/4.0/
1.Úó
·‚•ıe‘Å`z¯K
min
x∈R
d
f(x) = E[F(x,ξ)],(1)
ùpF:R
n
×R
d
→R´ëYŒ‡§¿…ŒUšà"ξ´˜‡‘ÅCþ"E[·]L«éξÏ"§
f(x)= E[F(x,ξ)]¡•²þ¼ê"du3Nõ¢Sœ¹e§©Ù¼êP™•½¼êF(·,ξ)™²(
‰Ñ§ÙVÇ©ÙP3| 8Θ ⊆R
d
þ"•8I¼êŠ˜‡'ÐO§3¢S¯K
¥ |^ξ²©Ù5“O¢S©Ù"·‚)¤‘Åξ
1
,ξ
2
,...,ξ
n
§-f
i
(x)=F(x,ξ
i
)(i=
1,...,n)§²~Ñy3ÅìÆS¥²ºx4z¯K
min
x∈R
d
f(x) =
1
n
n
X
i=1
f
i
(x),(2)
DOI:10.12677/aam.2022.1174524249A^êÆ?Ð
4Z§Åû
Ù¥f
i
(x)L«†1i‡êâéA›”¼ê§nL«êâê"¯K(2)²~Ñy3ÅìÆ
S[1–6]±9ÂXÚ¥•Z]©[7,8]¥"
¦)¯K(2)ž§•3˜‡]Ô§=nŒUš~Œ"du°(FÝ&EØN´¼§ÏdÄ
u°(FÝ•{´Øƒ¢S"•ŽÑù˜(J§·‚|^ÄuêâFÝCq•{§
JÑ‘ÅFÝe ü(SGD)[9]•{§d•{À•¦)Œ5Ãå`z¯K̇•{"3p
‘¯K¥§Cq•`ëê¤IS“gêŒUš~Œ§SGD•{¢SáÚåE,k•"Ï
d§˜Œ1SGD\„•{JÑ"~X§‘ŲþFÝ(SAG)[10,11]ÚSAGA[12]•{ÏL\\
ƒc FÝŠPÁ5¢y•¯Âñ„ݧù•{Ï~`uykSGD•{"‘Å• ~F
Ý(SVRG)•{[13–15]kü‡Ì‚§3Ì‚¥OŽFÝ(z‡S“¡•˜‡{)§3SÌ
‚¥OŽ•‘ÅFÝ"S2GD[16,17]ŠâA۽Ƨ3z‡{¥$1‘ÅꇑÅF
Ý"d§˜˜•{§XAdaGrad[18]!RMSprop[19]ÚAdam[20]•y²3‘Å‚¸¥´
k"
•)û8I¼ê-Ç¯K§Nõ‘ÅŽ{JѧAO´BFGSŽ{"éurà
¯K§MokhtariÚRibeiro[21]JÑ˜«Kz‘ÅBFGS(RES)•{§ ¿‰ ÑÙÂñ5©Û"
3[22]¥§Byrd<JÑ˜«Äu‘Å%C‘Åk•PÁBFGS(L-BFGS)[23]•{§¿y²
Ùérà¯KÂñ5"Moritz<[24]Ú\L-BFGS˜«‘ÅCþ§ §(Ü• ~
gާÏdéurà¯K§§äk‚5Âñ„Ý"3[25]¥§Gower!GoldfarbÚRichtarikJÑ
˜«éà¼ê‚5Âñ• ~¬L-BFGS•{",§3k•PÁ‘Å[Úî•{¥§²~
I‡m‡•þé5k/OަÈH∇f(H´Hessian)"3S•k•œ¹e§éuŒ5ÅìÆ
S¯KŒU´š~(J"
ÝFÝ(CG)•{({ü§S•‡¦$§Ïd2•^u)ûŒ5`z¯K[26–28]"
FletcherÚReeves(FR)[26]ÄkJÑXÛò‚5ÝFÝ{*Ðš‚5¼ê§¡•FR•{"
3[29]¥§DaiÚLiaoJÑDai-Liaon‘ÝFÝ{§¿ò[ÚîEâ†Ý5Ÿƒ(ܧ¼
•ÐÂñ(J"d§Äu[Úî^‡§Babaie,KafakiÚGhanbari[30]§Andrei[31]¼
˜Xn‘ÝFÝ•{§ù•{éurà¼ê´ÛÂñ"3©z[32]¥§YaoJÑ˜«
U?Dai-Liaon‘ÝFÝ{"©3©z[32]Ä:þ§JÑ˜«‘•~‘Ån‘
Ý FÝ{(STCGVR)§§òU?Dai-Liaon‘ÝF݆‘Å• ~ƒ(ܧ^u¦)Ã
å‘Å`z¯K"
·‚3©¥zXeµ
1.ÄuSVRG•#?ЧJÑ¦)‘Å`z¯K(2)STCGVR•{§¿y²Ùérà
1w¼ê‚5Âñ5"
2.3STCGVRzgSÌ‚m©ž§-#éÄ•„eüS“••§k/JpÂñ„
Ý"
3.éA‡ÅìÆS¯KꊢL²§†SVRG•{ƒ'§STCGVR•{´š~k"
©•{Ü©|„Xe"12!0^u)û‘Å`z¯KU?Dai-Liaon‘ÝF
Ý{!SVRGŽ{Ú‘• ~ ‘Ån‘ÝFÝ{"313!¥§3·^‡ey²#Ž
{Âñ5"314!¥§w˜ÐÚêŠ(J"•§15!ј(Ø"
DOI:10.12677/aam.2022.1174524250A^êÆ?Ð
4Z§Åû
2.^uÃå`zSTCGVRŽ{
2.1.n‘ÝFÝ
ÝFÝ{ÏÙ{ü…•;þ$2•^u)ûŒ5`z¯K§§¬)¤˜XS“µ
x
k+1
= x
k
+α
k
d
k
,(3)
ùpÚ•α
k
d±eWolfe‚|¢(½:
f(x
k
+α
k
d
k
) ≤f(x
k
)+c
1
α
k
g
T
k
d
k
,
(4)
g
T
k+1
d
k
≥c
2
g
T
k
d
k
,
(5)
ùp0 <c
1
<c
2
<1,|¢••d
k
d±eúª(½µ
d
k
=



−g
0
,k= 0.
−g
k
+β
k
d
k−1
,k≥1.
(6)
Ù¥§β
k
´˜‡ëê§g
k
= ∇f(x
k
)´8I¼êf(x)3x
k
?FÝ"ÝFÝ{•;.A´
Ý5§=(6))¤|¢••Aäk±eÝ^‡µ
d
k+1
y
k
= 0,k≥1,(7)
Ù¥§y
k
=g
k+1
−g
k
"Cc5§Ý^‡˜†´ïÄö'5:"DaiÚLiao¼˜‡wÍ
(J[29]"3[29]¥§¦^IO•‚•§µ
B
k+1
s
k
= y
k
,(8)
Ù¥s
k
= x
k+1
−x
k
§B
k+1
´f(x)HessianCqé¡Ý",òÝ^‡(7) í2Dai-Liao
Ý^‡
d
T
k+1
y
k
= −t
1
g
T
k+1
s
k
,(9)
Ù¥t
1
•šKëê"ÄuDai-LiaoÝ^‡(9)Ú[ÚîEâ§Yao<JÑ˜«é¡?Dai-
LiaoÝ
Q
MP
t+1
= I+η
k
Q
k+1
2
+Q
k+1
1
,
(10)
Ù¥
Q
k+1
2
= −
s
k
y
T
k
−y
k
s
T
k
s
T
k
y
k
,Q
k+1
1
=
s
k
s
T
k
s
T
k
y
k
,
(11)
DOI:10.12677/aam.2022.1174524251A^êÆ?Ð
4Z§Åû
Ù¥η
k
´–(½ëê"|¢••d±eúª)¤
d
k+1
= −Q
MP
k+1
g
k+1
,k≥1,
(12)
ŠâQ
MP
k+1
½Â§d(12))¤|¢••ûuzgS“žëêη
k
"duT•{n´
d(12))¤|¢••A÷vDai-LiaoÝ^‡(9)"ŠâTn§(Ü(10)!(11)Ú(12)§·‚

η
k
=
g
T
k+1
y
k
+(1−t
1
)g
T
k+1
s
k
g
T
k+1
y
k
−
||y
k
||
2
s
T
k
y
k
g
T
k+1
s
k
,
(13)
Ù¥§ëêt
1
´(9)¥Dai-Liaoëêt
1
"
,˜•¡§lη
k
½Â5w§η
k
ŠŒUš~Œ§$–ªuáŒ"•¼Ž{ÛÂñ
5§η
k
•›Xeµ
η
k
= min{|
g
T
k+1
y
k
+(1−t
1
)g
T
k+1
s
k
g
T
k+1
y
k
−
||y
k
||
2
s
T
k
y
k
g
T
k+1
s
k
|,M
1
},
(14)
Ù¥M
1
•~ê"¢Sþ§d(10)−(12))¤••Œ±-•;.n‘ÝFÝ••µ
d
k+1
= −g
k+1
+β
k
d
k
+δ
k
y
k
,
(15)
Ù¥t
1
,β
k
,δ
k
d
t
1
=
||y
k
||
2
s
T
k
y
k
,
(16)
β
k
= max{
η
k
g
T
k+1
y
k
−g
T
k+1
s
k
d
T
k
y
k
,0},
(17)
δ
k
= −η
k
g
T
k+1
s
k
y
T
k
s
k
,
(18)
)¤§Ù¥ëêη
k
d(14)(½"
U?n‘ÝFÝ{ò Ý^‡† [ÚîEâƒ(ܧk/JpDÚÝFÝ{
Ç"Ïd§T•{3¦)Œ5`z¯K¥äkéŒuÐcµ"
2.2.‘Å• ~FÝ(SVRG)Ž{
3SGD¥§ü‡FÝ´N²þFÝ˜‡Ã O§FÝ•¬‘XS“
DOI:10.12677/aam.2022.1174524252A^êÆ?Ð
4Z§Åû
O\Øä\\§ù¬¦SGDÂñ„ÝCú§Ã{ˆ‚5Âñ§¤±·‚ Ú\• ~
üÑ"• ~üÑÏLEAÏ‘ÅFÝOþ§¦zgS“•k˜‡Øä ~þ
.§l¯Âñ„Ý"
·‚JÑ`z(2)SVRGŽ{§¿3Ž{1¥éÙ?1£ã"
Algorithm1.SVRG
Щz:
‰½˜‡Ð©:˜x
0
∈R
n
,½Ú•α§À~ê0 <c
1
<c
2
<1,M
1
>0.
1:fork=0,1,2,...do
2:x
k+1
0
=˜x
k
.
3:OŽFÝ∇f(˜x
k
) =
1
n
n
P
i=1
∇f
i
(˜x
k
).
4:for t=0,1,...,m-1do
5:li
t
⊂{1,2,...,n}¥‘ÅÄ˜‡.
6:OŽ‘ÅFÝ
g
k+1
t
= ∇f
i
t
(x
k+1
t
)−(∇f
i
t
(˜x
k
)−∇f(˜x
k
)).
7:OŽx
k+1
t+1
= x
k+1
t
−αg
k+1
t
.
8:endfor
9:˜x
k+1
=
1
m
m
P
t=1
x
k+1
t
.
10:endfor
Ž{1¥kü‡Ì‚"3Ì‚¥§FÝ∇f(˜x
k
)OŽ"˜x
k
z…mg•#•˜‡/¯ì0§
P•x
k
0
"3SÌ‚¥§·‚lêâ8X¥‘ÅÀJ˜‡^u)¤‘ÅFÝ"•Ò´`§˜x
k
•1k‡u:§,·‚I‡OŽ˜x
k
:?Fݵ
∇f(˜x
k
) =
1
n
n
X
i=1
∇f
i
(˜x
k
).(19)
3YS“¥§••g
k+1
t
^Š•#••µ
g
k+1
t
= ∇f
i
t
(x
k+1
t
)−(∇f
i
t
(˜x
k
)−∇f(˜x
k
)),(20)
Ù¥i
t
⊂{1,2,...,N}?¿Ä"5¿‘ÅFÝg
k+1
t+1
´∇f(x
k+1
t+1
)˜‡Ã FÝO§=µ
E[g
k+1
t+1
|x
k+1
t+1
] = ∇f(x
k+1
t+1
).
2.3.STCGVRŽ{
©8I´O˜«¦FÝ•$•{§Óžäk$S•‡¦"•d§·‚òSVRG
†U?Dai-Liaon‘ÝFÝ{ƒ(Ü"Ž{2¥o(STCGVRŽ{"3Ž{2¥§·‚ÏL
±eS“OŽ|¢••d
t+1
:
d
t+1
= −g
k+1
t+1
+β
t
d
t
+δ
t
y
t
,
(21)
DOI:10.12677/aam.2022.1174524253A^êÆ?Ð
4Z§Åû
wheret
1
,β
t
,δ
t
by
t
1
=
||y
t
||
2
s
T
t
y
t
,
(22)
β
t
= max{
η
t
(g
k+1
t+1
)
T
y
t
−(g
k+1
t+1
)
T
s
t
d
T
t
y
t
,0},
(23)
δ
k
= −η
t
(g
k+1
t+1
)
T
s
t
y
T
t
s
t
,
(24)
η
t
= min{|
(g
k+1
t+1
)
T
y
t
+(1−t
1
)(g
k+1
t+1
)
T
s
t
(g
k+1
t+1
)
T
y
t
−
||y
t
||
2
s
T
t
y
t
(g
k+1
t+1
)
T
s
t
|,M
1
},
(25)
Ù¥M
1
´˜‡~ê§s
t
= x
k+1
t+1
−x
k+1
t
,y
t
= g
k+1
t+1
−g
k+1
t
.
Algorithm2.STCGVR
Щz:
‰½˜‡Ð©:˜x
0
,Щڕα
0
,•#ªÇm,S“{x
k+1
t
:t= 0,...,m−1;k=0,1,2,...},À~
ê0 <c
1
<c
2
<1,M
1
>0.
1:h
0
= ∇f(˜x
0
).
2:fork=0,1,2,...do
3:OŽFÝ∇f(˜x
k
) =
1
n
n
P
i=1
∇f
i
(˜x
k
).
4:-x
k+1
0
=˜x
k
,g
k+1
0
= h
k
,d
0
= −g
k+1
0
.
5:for t=0,1,...,m-1do
6:N^‚|¢Ž{(4)and(5)OŽα
t
.
7:OŽx
k+1
t+1
= x
k+1
t
+α
t
d
t
.
8:‘ÅÄi
t
⊂{1,2,...,n}.
9:OŽ‘ÅFÝ
g
k+1
t
= ∇f
i
t
(x
k+1
t
)−(∇f
i
t
(˜x
k
)−∇f(˜x
k
)).
10:ÏL(21)−(25)OŽd
t+1
.
11:endfor
12:h
k+1
= g
k+1
m
,˜x
k+1
=
1
m
m
P
t=1
x
k+1
t
.
13:endfor
†SVRGaq§Ž{2©•ü‡Ì‚"3ÜÌ‚¥§OŽÜS“˜x
k
∈R
n
ÚF
Ý∇f(˜x
k
)"3SÌ‚¥§¦^SVRG•#éFÝg
k+1
t
O"d§·‚3z‡SÜÌ‚S“m©
ž±•„eüÚ-#m©S“§~Xµ[33,34]"-#éÄò½Ï-éŽ{¿ž ØŒUÃÃÎ&
E"Ïd§STCGVRŒ^u)ûŒ5Ãå‘Å`z¯K§äkûÐuÐcµ"
3.Âñ5©Û
3!¥§·‚y²Ž{2)¤S“S´‚5Âñ"
DOI:10.12677/aam.2022.1174524254A^êÆ?Ð
4Z§Åû
e¡·‚‰Ñ©¥‡^˜b"
b1 bY²8z= {x|f(x) ≤f(x
0
)}´k."d§¼êf(x)3z¥´k."
b2bf
i
:R
n
→RëYŒ‡§∇f´ÛLipschitzëY§ÙLipschitz~ê•L§ =µ
éu∀x,y∈R
n
,k±e¤áµ
||∇f(x)−∇f(y)||≤L||x−y||.
(26)
b3 STCGVRŽ{¥Ú•α
t
÷vα
t
∈[α
l
,α
r
](0 <α
l
<α
r
).
b4du‘ÅFÝg
k+1
t
´∇f(x
k+1
t
)˜‡Ã O§=µE[g
k+1
t
|x
k+1
t
] = ∇f(x
k+1
t
),•3
˜‡~êH,éu¤kt= 0,1,...,m−1;k= 0,1,2,...§k
||∇f(x
k+1
t
)−g
k+1
t
||≤H.
(27)
b5 •3ü‡~êκ,κ§k±e¤áµ
κIQ
MP
t
κI,∀t,
(28)
Ù¥ÎÒAB,A,B∈R
n×n
“LA−B´Œ½"
b6 éu¤kt= 0,1,...,m−1;k= 0,1,2,...§‘ÅFÝg
k+1
t
´k.§=µ
||g
k+1
t
||≤Λ.(29)
Ún3.1.bd
t
d(21)−(25))¤§XJÚ•α
t
´dWolfe|¢^‡(4)Ú(5))¤§K¿©e
ü5Ÿé?Ût= 0,...,m−1;k= 0,1,2,...¤á¶=•3˜‡~êρ
1
§¦
−(g
k+1
t
)
T
d
t
≥ρ
1
||g
k+1
t
||
2
.(30)
y²µÏ•(g
k+1
0
)
T
d
0
= −||g
k+1
0
||
2
,¿©eü^‡éut= 0¤á.dª(21)−(25),·‚
(d
k+1
t+1
)
T
g
k+1
t+1
= (−g
k+1
t+1
+β
t
d
t
+δ
t
y
t
)
T
g
k+1
t+1
= −||g
k+1
t+1
||
2
+β
t
(g
k+1
t+1
)
T
d
t
+δ
t
(g
k+1
t+1
)
T
y
t
= −||g
k+1
t+1
||
2
+
η
t
(g
k+1
t+1
)
T
y
t
−(g
k+1
t+1
)
T
s
t
s
T
t
y
t
(g
k+1
t+1
)
T
s
t
−η
t
(g
k+1
t+1
)
T
s
t
y
T
t
s
t
(g
k+1
t+1
)
T
y
t
= −||g
k+1
t+1
||
2
−
((g
k+1
t+1
)
T
s
t
)
2
s
T
t
y
t
(31)
d,Wolfe‚|¢^‡(4)Ú(5)Œ±(s
T
t
y
t
>0.ª(31)¿›X¿©eü^‡éuρ
1
= 1¤
á"
DOI:10.12677/aam.2022.1174524255A^êÆ?Ð
4Z§Åû
|¢••¿©eüA53STCGVRÂñ5©Û¥´7ØŒ"þãÚn3.1L²Ž{2
)¤|¢••3Wolfe‚|¢eäkT5Ÿ"
Ún3.2.bd
t
d(21)-(25))¤§XJÚ•α
t
äkWolfe‚|¢^‡(4)Ú(5)(½§f(x)÷v
b2Úb4,·‚µ
α
t
≥
(c
2
−1)(g
k+1
t
)
T
d
t
−2H||d
t
||
L||d
t
||
2
.
(32)
y²µdb2Úb4,·‚
||y
t
||= ||g
k+1
t+1
−g
k+1
t
||
= ||g
k+1
t+1
−∇f(x
k+1
t+1
)+∇f(x
k+1
t+1
)−∇f(x
k+1
t
)+∇f(x
k+1
t
)−g
k+1
t
||
≤||g
k+1
t+1
−∇f(x
k+1
t+1
)||+||∇f(x
k+1
t+1
)−∇f(x
k+1
t
)||+||∇f(x
k+1
t
)−g
k+1
t
||
≤2H+L||s
t
||,
(33)
(ÜLipschitzØª(27)ÚWolfe^‡§·‚UíÑ
(c
2
−1)(g
k+1
t
)
T
d
t
≤(g
k+1
t+1
−g
k+1
t
)
T
d
t
= y
T
t
d
t
≤||y
t
||||d
t
||
≤(2H+L||s
t
||)||d
t
||.
(34)
Ïd§Úny"
Ún3.3.bd
t
d(21) −(25))¤§XJÚ•α
t
äkWolfe‚|¢^‡(4)Ú(5)(½§f(x)÷
vb1Úb2,@o±eZoutendijk^‡¤áµ
X
t≥1
((g
k+1
t
)
T
d
t
)
2
||d
t
||
2
<∞.(35)
y²µdWolfe^‡(4)
f(x
k+1
t
)−f(x
k+1
t+1
) ≥−c
1
α
t
(g
k+1
t
)
T
d
t
,(36)
(Ü(32)§·‚k
f(x
k+1
t
)−f(x
k+1
t+1
) ≥−c
1
(c
2
−1)(g
k+1
t
)
T
d
t
−2H||d
t
||
L||d
t
||
2
(g
k+1
t
)
T
d
t
≥
c
1
(1−c
2
)((g
k+1
t
)
T
d
t
)
2
+2c
1
H||d
t
||(g
k+1
t
)
T
d
t
L||d
t
||
2
,
(37)
DOI:10.12677/aam.2022.1174524256A^êÆ?Ð
4Z§Åû
ÏLé(37)ü>ý銿¦Ú§
X
t≥1
|f(x
k+1
t
)−f(x
k+1
t
+α
t
d
t
)|
≥
X
t≥1
|
c
1
(1−c
2
)((g
k+1
t
)
T
d
t
)
2
L||d
t
||
2
+
2c
1
H||d
t
||(g
k+1
t
)
T
d
t
L||d
t
||
2
|
≥
X
t≥1
(|
c
1
(1−c
2
)((g
k+1
t
)
T
d
t
)
2
L||d
t
||
2
|−|
2c
1
H||(g
k+1
t
)||
L
|).
(38)
ÏLéª(38)ü>¦Ú§¿(Üb1Úkg
k+1
t
kk.§KZoutendijk^‡3‘Åœ¹e¤áµ
X
t≥1
((g
k+1
t
)
T
d
t
)
2
||d
t
||
2
<∞.(39)
Ún3.4.bd
t
d(21)−(25))¤§XJÚ•α
t
äkWolfe‚|¢^‡(4)Ú(5)(½§@oé
urà¼ê§Sd
t
‰ê´k.§=•3M>0¦
||d
t
||≤M,(40)
¤á"
y²µdª(10)−(12),b5Úb6§·‚
||d
t
||= ||−Q
MP
t
g
k+1
t
||≤κ||g
k+1
t
||≤κΛ = M.
(41)
½n3.1.bd
t
d(21)−(25))¤§XJÚ•α
t
äkWolfe‚|¢^‡(4)Ú(5)(½§8I¼
êf(x)´rà¿…÷vb1§@o·‚k
lim
t→∞
||g
k+1
t
||= 0.(42)
y²µÄuÚn3.4§·‚kd
t
≤Mk"Šâ¿©eü^‡µ−(g
k+1
t
)
T
d
t
≥ρ
1
kg
k+1
t
k
2
§2
(ÜÚn3.3§·‚k
∞>
X
t≥1
((g
k+1
t
)
T
d
t
)
2
||d
t
||
2
≥
X
t≥1
((g
k+1
t
)
T
d
t
)
2
M
2
≥
ρ
2
1
M
2
X
t≥1
||g
k+1
t
||
2
,(43)
líÑ(42)§½ny"
±þ½n3.1L²·‚JÑŽ{éurà¼ê´ÛÂñ"
Ún3.5.b½b2¤á§x
∗
´f(x)•˜•Š:"@oéu?¿x∈R
n
§·‚k
1
2L
||∇f(x)||
2
≤f(x)−f(x
∗
).
(44)
DOI:10.12677/aam.2022.1174524257A^êÆ?Ð
4Z§Åû
y²µdux
∗
´f(x)•˜4Š:§db2§·‚k
f(x
∗
) ≤f(y) ≤f(x)+∇f(x)
T
(y−x)+
L
2
||y−x||
2
,
(45)
½x,Ï•±þúªéu?¿yѤá§e(.Œ±3þãØªm>µ
f(x
∗
) ≤f(x)+∇f(x)
T
(y−x)+
L
2
||y−x||
2
= f(x)−
L
2
||∇f(x)||
2
.
(46)
Ïd§Øª(44)¤á"
Ún3.6.b½x
∗
•f(x)•˜4Š:§b2¤á§g
k+1
t
=∇f
i
t
(x
k+1
t
)−(∇f
i
t
(˜x
k
)−
∇f(˜x
k
))´•~‘ÅFÝ"ƒéui
t
Ï",·‚
E[||g
k+1
t
||
2
] ≤4L(E[f(x
k+1
t
)−f(x
∗
)]+E[f(˜x
k
)−f(x
∗
)]).
(47)
y²µdg
k+1
t
•#úª§·‚
E[||g
k+1
t
||
2
] = E[||∇f
i
t
(x
k+1
t
)−∇f
i
t
(˜x
k
)+∇f(˜x
k
)||
2
]
= E[||∇f
i
t
(x
k+1
t
)−∇f
i
t
(˜x
k
)+∇f(˜x
k
)+∇f
i
t
(x
∗
)−∇f
i
t
(x
∗
)||
2
]
≤2E[||∇f
i
t
(x
k+1
t
)−∇f
i
t
(x
∗
)||
2
]+2E[||∇f
i
t
(˜x
k
)−∇f(˜x
k
)−∇f
i
t
(x
∗
)||
2
].
(48)
e5§·‚E˜‡9ϼêµ
Φ
i
(x) = f
i
(x)−f
i
(x
∗
)−∇f
i
(x
∗
)(x−x
∗
),
(49)
5¿Φ
i
(x)´˜‡à¼ê§∇f´ÛLipschitzëY§ÙLipschitzëY~ê•L§(Ü(44)§·
‚k
1
2L
||∇Φ
i
(x)||
2
≤Φ
i
(x)−Φ
i
(x
∗
).
(50)
A^Φ
i
(x)Ú∇Φ
i
(x)Lˆª§·‚
||∇f
i
(x)−f
i
(x
∗
)||
2
≤2L[f
i
(x)−f
i
(x
∗
)−∇f
i
(x
∗
)(x−x
∗
)].
(51)
é(51)ü>l1n¦Ú§¿…5¿∇f(x
∗
) = 0§·‚
1
n
n
X
i=1
||∇f
i
(x)−f
i
(x
∗
)||
2
≤2L[f(x)−f(x
∗
)],∀x.
(52)
DOI:10.12677/aam.2022.1174524258A^êÆ?Ð
4Z§Åû
Ïd§·‚k
E[||∇f
i
t
(x
k+1
t
)−∇f
i
t
(x
∗
)||
2
]
= E[E[||∇f
i
t
(x
k+1
t
)−∇f
i
t
(x
∗
)||
2
]]
= E[
1
n
n
X
i=1
||∇f
i
(x
k+1
t
)−∇f
i
(x
∗
)||
2
]
≤2LE[f(x
k+1
t
)−f(x
∗
)],
(53)
Ú
E[||∇f
i
t
(˜x
k−1
)−∇f
i
t
(x
∗
)||
2
] ≤2LE[f(˜x
k−1
)−f(x
∗
)].
(54)
2(Ü(47),(53),(54)§·‚k
E[||g
k+1
t
||
2
] ≤4L(E[f(x
k+1
t
)−f(x
∗
)]+E[f(˜x
k
)−f(x
∗
)]).
(55)
½n3.2.b½b1-b4¤á¿…f(x)´rà§Ùràëê´u,-x
∗
´f(x)•˜4
Ч¿bmvŒ§¦
ρ=
(
2
u
+4Lα
2
r
m)
2α
l
m(κ+2α
l
Lκ
2
)
<1.
(56)
@o§éu¤kk≥0,·‚k
E[f(˜x
k
)−f(x
∗
)] ≤ρ
k
E[f(˜x
0
)−f(x
∗
)].
(57)
y²µ½Â∆
t
= kx
k+1
t
−x
∗
k§qdª(12)§b2Úb5§·‚k
E[∆
2
t+1
] = E[||x
k+1
t+1
−x
∗
||
2
]
= E[||x
k+1
t
−α
t
Q
MP
t
g
k+1
t
−x
∗
||
2
]
= E[∆
2
t
]−2α
t
E[<Q
MP
t
g
k+1
t
,x
k+1
t
−x
∗
>]+α
2
t
E[||Q
MP
t
g
k+1
t
||
2
]
= E[∆
2
t
]−2α
t
E[<Q
MP
t
∇f(x
k+1
t
),x
k+1
t
−x
∗
>]+α
2
t
||Q
MP
t
||
2
E[||g
k+1
t
||
2
]
≤E[∆
2
t
]−2α
t
κ
[f(x
k+1
t
)−f(x
∗
)]+α
2
t
κ
2
E[||g
k+1
t
||
2
],
(58)
(Ü(47)ÚÚn3.6§·‚
E[∆
2
t+1
] ≤E[∆
2
t
]−2α
t
κ[f(x
k+1
t
)−f(x
∗
)]
+4α
2
t
κ
2
L(E[f(x
k+1
t
)−f(x
∗
)]+E[f(˜x
k
)−f(x
∗
)])
= E[∆
2
t
]−(2α
t
κ−4α
2
t
Lκ
2
)[f(x
k+1
t
)−f(x
∗
)]+4α
2
t
κ
2
LE[f(˜x
k
)−f(x
∗
)].
(59)
DOI:10.12677/aam.2022.1174524259A^êÆ?Ð
4Z§Åû
étl0m-1¦Ú§(Üx
k+1
1
=˜x
k−1
§·‚k
E[∆
2
m+1
]+2α
t
(κ+2α
t
Lκ
2
)
m
X
t=1
E[f(x
k+1
t
)−f(x
∗
)]
≤E[||˜x
k−1
−x
∗
||
2
]+4α
2
t
κ
2
LmE[f(˜x
k−1
)−f(x
∗
)].
(60)
duf(x)´rà§·‚
E[∆
2
m+1
]+2α
t
(κ+2α
t
Lκ
2
)
m
X
t=1
E[f(x
k+1
t
)−f(x
∗
)]
≤
2
u
E[f(˜x
k−1
)−f(x
∗
)]+4α
2
t
κ
2
LmE[f(˜x
k−1
)−f(x
∗
)].
(61)
Šâ˜x
k+1
=
1
m
m
P
t=1
x
k+1
t
§·‚k
E[f(˜x
k
)−f(x
∗
)] ≤
1
m
m
X
t=1
E[f(x
k+1
t
)−f(x
∗
)]
≤
1
2α
t
(κ+2α
t
Lκ
2
)m
(
2
u
+4Lκ
2
mα
2
t
)E[f(˜x
k−1
)−f(x
∗
)]
≤
1
2α
l
(κ+2α
l
Lκ
2
)m
(
2
u
+4Lκ
2
mα
2
r
)E[f(˜x
k−1
)−f(x
∗
)].
(62)
ÏL8B§·‚k
E[f(˜x
k
)−f(x
∗
)] ≤ρ
k
E[f(˜x
0
)−f(x
∗
)],
(63)
Ù¥§
ρ=
(
2
u
+4Lα
2
r
m)
2α
l
m(κ+2α
l
Lκ
2
)
.
(64)
XJ-ρ<1§Kk
m≥
1
(α
l
κ+2α
2
l
Lκ
2
−2α
2
r
Lκ
2
)u
.
(65)
½n3.2¿›XSTCGVR3¼êŠÏ"¿Âþéuë•:˜x
k
´‚5Âñ"
3UYƒc§·‚0êŒÅØª½Â(k'•[&E§žë[35])"
½Â3.1.(ê‰ÅØª)eX•˜‡šK‘ÅCþ§@oéu?¿¢êa>0§·‚k
P(X≥a) ≤
E[X]
a
.(66)
½n3.3.b†½n3.2Ú(57)¥ƒÓb¤á",·‚kf(˜x
k
)−f(x
∗
)3k→∞ž•
DOI:10.12677/aam.2022.1174524260A^êÆ?Ð
4Z§Åû
VÇÂñ0§=éu?Ûε≥0§·‚k
lim
k→∞
P(f(˜x
k
)−f(x
∗
) ≥) = 0.(67)
y²µd½n3.2§·‚
E[f(˜x
k
)−f(x
∗
)] ≤ρ
k
E[f(˜x
0
)−f(x
∗
)],(68)
Ï•ρ<1§…E[f(˜x
0
)−f(x
∗
)]≤M
f
,¤±k→∞ž§E[f(˜x
k
)−f(x
∗
)]→0.qduf(˜x
k
)−
f(x
∗
)´˜‡šK‘ÅCþ§A^ê‰ÅØª(66),·‚k
P(f(˜x
k
)−f(x
∗
) ≥) ≤E[f(˜x
k
)−f(x
∗
)] →0.(69)
½n3.3L²·‚8I¼êŠ3•VÇÂñ"
4.ꊢ
3!¥§·‚)ûA‡61kiÒÅìÆS?Ö§•)*£8!Ü6£8Ú|±•
þůK§òŽ{STCGVR†SVRG?1'"w,§cü«´1w…rà`z.§1n«
´1wšà`z."©9Ž{§SÑ3MatlabR2016a?nìþ$1"
Ï•STCGVRÚSVRGI‡OŽFݧ¤±z‡epochÑI‡¤kêâ"•ü$
ù¤§·‚ãLw«¼ê›”ŠƒéuÏLêâgê'X"•Ò´`§î¶L
«êâkÏLgê§p¶L«›”¼êŠ"epoch•ŒŠ˜•20"3ùü«•{¥§·
‚˜c1 = 10
−4
Úc
2
= 0.1§M
1
= 1"
4.1.*£8¯K
3·‚1˜‡¢¥§·‚ÀJ*£8¯K5y·‚Ž{"*£8§•¡•Tikhonov
Kz§´ÅìÆS.¥Ÿþ–'-‡ÆS."*£88I´•z¤¼ê
min
x
1
n
n
X
i=1
(b
i
−a
T
i
x)
2
+λ||x||
2
2
,(70)
Ù¥a
i
∈R
n
Úb
i
∈{−1,1}©O´1i‡~fA•þÚ8IЧλ>0´Kzëê"
3¢¥§·‚STCGVRÚSVRG3¦)¯K(70)žêŠ(J"·‚òü«•{
Щ:˜•x
1
=5¯x
1
§Ù¥¯x
1
´õ‘IO•þ§ÙäkŒ10%š"ƒ"·‚±e
•ª)¤Ôö8ÚÿÁ8(a,b)"·‚Äk)¤˜‡‘Å•þa,§l[−0.5,0.5]
n
þþ!©Ù¥Ä
§,•˜lþ!©Ù¥Äa∈R
n
˜I\b∈{−1,1}§b= sign(h˜x,ai)3[−1,1]þ"d
§·‚˜Kzëêλ= 10
−4
§¿ÀJ‘ÅFÝÚêm= n/5"
ã1'SVRGÚSTCGVRS“ǧ٥î‹IL«ÛkS“gê§p‹IL«
›”¼êŠ"S“LJN8I¼ê›”Š‘S“gêO\¥yÑCzª³"lã1Œ±
DOI:10.12677/aam.2022.1174524261A^êÆ?Ð
4Z§Åû
wѧéuŽ{SVRG§·‚˜Ú•α=0.005¶éuSTCGVR§·‚¦^Wolfe‚|¢5ÀJ
Ü·Ú•"†SVRGƒ'§STCGVRŽ{•I‡Œ4g̂Ҍ±¯„%C¼ê•›
”Š10
−3
§=Œ400gS“Âñ"(JL²§STCGVRŽ{3• ~Ä:þV\n‘
ÝFݧÏdU34S“gêS%C•`)"
Figure1.TraininglossofSVRGandSTCGVRfortheridgeregression
problem
ã1.SVRGÚSTCGVRéu*£8¯KÔö›”
4.2.Ü6£8¯K
31‡¢¥§·‚•Ä`
2
Ü6£8(LR)¯Kµ
min
x
1
n
n
X
i=1
ln(1+exp(−b
i
a
T
i
x))+λ||x||
2
,(71)
Ù¥λ>0´Kzëê§a∈R
n
L«A•þ§b∈{−1,1}´•ƒAI\"
·‚-λ=10
−4
§m=n/5§½Ú•α=0.015§òü«•{Щ:˜•x
1
=5¯x
1
§Ù
¥¯x
1
´lþ!©Ù[0,1]
n
‘ÅÄ"·‚±e•ª)¤Ôö8ÚÿÁ8(a,b)"·‚Äk)¤
˜‡‘Å•þa§Ù¥5%š"ƒl[0,1]
n
þþ!©Ù¥Ä§,•˜lþ!©Ù¥Ä
a∈R
n
˜I\b∈{−1,1}§b= sign(h˜x,ai)3[−1,1]þ"
ã2'SVRGÚSTCGVRS“ǧ٥î‹IL«ÛkS“gê§p‹IL«
›”¼êŠ"S“LJN8I¼ê›”Š‘S“gêO\¥yÑCzª³"‘Xk
S“gêO\§·‚Œ±*SVRG›”¼êŠ3Œ9gFÝOŽÅì~¿ªu
-½"dum=n/5§•Ò´`§§3Œ900g S“Âñ§¿…Âñ•`Š:„Ýéú"
,§STCGVR•I‡300gS“=ŒˆƒÓ°Ý"(JL²§3)ûÜ6£8¯Kž§3
DOI:10.12677/aam.2022.1174524262A^êÆ?Ð
4Z§Åû
zgSÌ‚S“m©ž§n‘ÝFÝ••±•Íeü••-#m©S“§k/JpÂñ„
Ý"
Figure2.TraininglossofSVRGandSTCGVRforlogisticregression
problem
ã2.SVRGÚSTCGVRéuÜ6£8¯KÔö›”
4.3.šà|±•þůK
3·‚•˜‡¢¥§·‚•Ä˜‡šà|±•þÅ(SVM)¯K"‰½˜‡äk
®•a:Ôö8§SVM8I´é˜‡U•Ð/©lÔö8‡²¡"·‚-S=
{(a
i
,b
i
)}
n
i=1
´˜‡•¹né/ªÔö8(a
i
,b
i
)§Ù¥a
i
∈R
n
´A•þ§b
i
∈{−1,1}´éA
I\"8I´é˜‡d •þx∈R
n
|±‡²¡§T•þòÔö8©m§¦éub
i
= 1¤k
:x
T
a
i
>0§éub
i
= −1¤k:x
T
a
i
<0"XJêâØ´Œ©l§KT• þŒUØ•3§
½ö§XJêâ´Œ©l§KŒUkõ‡©l•þ"
·‚ÏL¦^sigmoid›”¼ê)û±ešà|±•þÅ(SVM)¯K5'SVRGÚSTCGVR
Âñ5U§ù®3[35]¥?1L•Ä:
min
x∈R
n
f(x) = E
a,b
[1−tanh(bhx,ai)]+λ||x||
2
2
,(72)
Ù¥λ>0´˜‡Këê"3·‚¢¥§λ˜•10
−4
"3¢Sœ¹e§ª(72)¤
min
x∈R
n
1
n
n
X
i=1
f
i
(x)+λ||x||
2
,(73)
Ù¥f
i
(x) = 1−tanh(b
i
hx,a
i
i),i= 1,...,n.
DOI:10.12677/aam.2022.1174524263A^êÆ?Ð
4Z§Åû
·‚ÏL¯K(4.73)3ܤêâþLySVRGÚSTCGVRêŠ(J"·‚òü«
•{Щ:˜•x
1
= 5¯x
1
§Ù¥¯x
1
´lþ!©Ù[0,1]
n
þ‘ÅÄ"·‚±e•ª)¤Ô
ö8ÚÿÁ8(a,b)"·‚Äk )¤˜‡äk80%š"©þDÕ•þa§ÙÑl[0,1]
n
þþ!
©Ù§,-b= sign(h¯x,ai)"
ã3'SVRGÚSTCGVRS“ǧ٥î‹IL«ÛkS“gê§p‹IL«
›”¼êŠ"S“LJN8I¼ê›”Š‘S“gêO\¥yÑCzª³"3ØäN
Ú•§·‚•ªÀJα= 5×10
−3
"dž§SVRG3¦)šàSVM¯KžÐÂñ
5"lã3Œ±wѧ·‚•{•I‡Œ200 g=ŒÂñ",§SVRGI‡Œ1200 gS“
âUˆƒq°Ý"¢(JL²§·‚JÑ•{3¦)šà|±•þůKžäk•¯
Âñ„Ý"
Figure3.TraininglossofSVRGandSTCGVRfornonconvexSVM
ã3.SVRGÚSTCGVRéušàSVMÔö›”
5.o(
©JÑ˜«^u)ûÃå‘Å`z¯KSTCGVRŽ{"ù«#LŽ{(Ü‘Å
•~EâÚU?Dai-Liaon‘ÝFݧ ±¼•ÐÂñ5"d§3zgSÌ‚S“
m©žÑ•Ä-#éÄEâ§ùò½Ï-éŽ{¿ÞØéŽ{ŒUØ|Î&E"3·^
‡e§kySTCGVR‚5Âñ5"·‚ÏL¦)A‡ÅìÆS¯K§ù¯KŒU´à
§•ŒU´šà§-<ÍêŠ(J"3™5ïÄ¥§‘XŽ{êŠFÝ¢y§
STCGVRŒ±éN´/A^uØÓ¢S¯K§~X$•Ý¡EÚDÕi;ÆS¯K"
DOI:10.12677/aam.2022.1174524264A^êÆ?Ð
4Z§Åû
Ä7‘8
I[g,‰ÆÄ7“cÄ7‘8(11601252)"
ë•©z
[1]Kawaguchi, K.and Lu, H.H.(2020) OrderedSGD: ANew StochasticOptimization Framework
forEmpirical RiskMinimization. Proceedingsof the23rdInternational Conferenceon Artificial
IntelligenceandStatistics(AISTATS),108.
[2]Shalev-Shwartz, S.and Ben-David, S.(2014) References.In:UnderstandingMachine Learning,
CambridgeUniversityPress,Cambridge,385-394.
https://doi.org/10.1017/CBO9781107298019.036
[3]Taheri,H.,Pedarsani,R.andThrampoulidis,C.(2021)FundamentalLimitsofRidge-
RegularizedEmpiricalRiskMinimizationinHighDimensions.Proceedingsofthe24thIn-
ternationalConferenceonArtificialIntelligenceandStatistics,130.
[4]Shalev-Shwartz,S. and Srebro, N. (2008) SVM Optimization:Inverse Dependence on Training
SetSize.Proceedingsofthe25thInternationalConferenceonMachineLearning,Helsinki,5-9
June2008.https://doi.org/10.1145/1390156.1390273
[5]Bottou,L.(2010)Large-ScaleMachineLearningwithStochasticGradientDescent.In:
Lechevallier,Y. andSaporta,G.,Eds.,ProceedingsofCOMPSTAT’2010,Physica-VerlagHD,
177-186.https://doi.org/10.1007/978-3-7908-2604-316
[6]Bottou, L., Curtis, F.E.and Nocedal, J.(2018)OptimizationMethodsfor Large-ScaleMachine
Learning.SIAMReview,60,223-311.https://doi.org/10.1137/16M1080173
[7]Mokhtari,A.and Ribeiro,A.(2013)ADual StochasticDFPAlgorithm forOptimal Resource
AllocationinWirelessSystems.2013IEEE14thWorkshoponSignalProcessingAdvancesin
WirelessCommunications(SPAWC),Darmstadt,16-19June2013,21-25.
https://doi.org/10.1109/SPAWC.2013.6612004
[8]Couillard,O.(2020)Fastand FlexibleOptimizationofPowerAllocationinWirelessCommu-
nicationSystemsUsingNeuralNetworks.McGillUniversity,Montreal,Canada.
[9]Robbins,H. andMonro, S.(1951)A Stochastic Approximation Method.TheAnnalsofMath-
ematicalStatistics,22,400-407.https://doi.org/10.1214/aoms/1177729586
[10]LeRoux,N.,Schmidt,M.andBach,F.(2012)AStochasticGradientMethodwithanExpo-
nentialConvergenceRateforFiniteTrainingSets.arXivpreprintarXiv:1202.6258
[11]Schmidt,M.,LeRoux,N.andBach,F.(2017)MinimizingfiniteSumswiththeStochastic
AverageGradient.MathematicalProgramming,162,83-112.
https://doi.org/10.1007/s10107-016-1030-6
DOI:10.12677/aam.2022.1174524265A^êÆ?Ð
4Z§Åû
[12]Defazio,A.,Bach,F.andLacoste-Julien,S.(2014)SAGA:AFastIncrementalGradient
Method with Support forNon-Strongly Convex Composite Objectives. In:AdvancesinNeural
InformationProcessingSystems27(NIPS2014).
[13]Kulunchakov,A.(2020)StochasticOptimizationforLarge-ScaleMachineLearning:Variance
ReductionandAcceleration.GrenobleAlpesUniversity,France.
[14]Wang,C., et al. (2013)Variance ReductionforStochastic GradientOptimization.In:Advances
inNeuralInformationProcessingSystems26(NIPS2013).
[15]Shen,Z.,etal.(2016)AdaptiveVarianceReducingforStochasticGradientDescent.Proceed-
ingsoftheTwenty-FifthInternationalJointConferenceonArtificialIntelligence,NewYork,
July2016,1990-1996.
[16]Koneˇcn´y, J.andRicht´arik,P.(2017) Semi-StochasticGradientDescent Methods.Frontiersin
AppliedMathematicsandStatistics,3,Article9.https://doi.org/10.3389/fams.2017.00009
[17]Shang,F.,etal.(2021)EfficientAsynchronousSemi-StochasticBlockCoordinateDescent
MethodsforLarge-ScaleSVD.IEEEAccess,9,126159-126171.
https://doi.org/10.1109/ACCESS.2021.3094282
[18]Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning
andStochasticOptimization.JournalofMachineLearningResearch,12,2121-2159.
[19]Tieleman,T.andHinton,G.(2012)Lecture6.5-rmsprop:DividetheGradientbyaRunning
AverageofItsRecentMagnitude.COURSERA:NeuralNetworksforMachineLearning,4,
26-31.
[20]Kingma,D.P.and Ba,J. (2014) Adam:A Method for Stochastic Optimization. arXiv preprint
arXiv:1412.6980
[21]Mokhtari,A.andRibeiro,A.(2013)RegularizedStochasticBFGSAlgorithm.2013IEEE
GlobalConferenceonSignalandInformationProcessing,Austin,TX,3-5December2013,
1109-1112.https://doi.org/10.1109/GlobalSIP.2013.6737088
[22]Byrd,R.H.,etal.(2016)AStochasticQuasi-NewtonMethodforLarge-ScaleOptimization.
SIAMJournalonOptimization,26,1008-1031.https://doi.org/10.1137/140954362
[23]Liu,D.C.andNocedal,J.(1989)OntheLimitedMemoryBFGSMethodforLargeScale
Optimization.MathematicalProgramming,45,503-528.https://doi.org/10.1007/BF01589116
[24]Moritz,P.,Nishihara,R.andJordan,M.(2016)ALinearly-ConvergentStochasticL-BFGS
Algorithm.Proceedingsofthe19thInternationalConferenceonArtificialIntelligenceandS-
tatistics(AISTATS),41.
[25]Gower,R.,Goldfarb,D.andRicht´arik,P.(2016)StochasticBlockBFGS:SqueezingMore
Curvature Out of Data. InternationalConferenceonMachineLearning,New York,June 2016.
[26]Fletcher,R.andReeves,C.M.(1964)FunctionMinimizationbyConjugateGradients.The
ComputerJournal,7,149-154.https://doi.org/10.1093/comjnl/7.2.149
DOI:10.12677/aam.2022.1174524266A^êÆ?Ð
4Z§Åû
[27]Andrei,N.(2013)OnThree-TermConjugateGradientAlgorithmsforUnconstrainedOpti-
mization.AppliedMathematicsandComputation,219,6316-6327.
https://doi.org/10.1016/j.amc.2012.11.097
[28]Yao,S.W.,etal.(2020)AClassofGloballyConvergentThree-TermDai-LiaoConjugate
GradientMethods.AppliedNumericalMathematics,151,354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[29]Dai, Y.-H. andLiao, L.-Z.(2001) NewConjugacy Conditionsand RelatedNonlinear Conjugate
GradientMethods.AppliedMathematicsandOptimization,43,87-101.
https://doi.org/10.1007/s002450010019
[30]Babaie-Kafaki,S. andGhanbari,R. (2014)A Descent FamilyofDai-Liao ConjugateGradient
Methods.OptimizationMethodsandSoftware,29,583-591.
https://doi.org/10.1080/10556788.2013.833199
[31]Andrei,N.(2015)ANewThree-TermConjugateGradientAlgorithmforUnconstrainedOp-
timization.NumericalAlgorithms,68,305-321.https://doi.org/10.1007/s11075-014-9845-9
[32]Yao,S.W.,etal.(2020)AClassofGloballyConvergentThree-TermDai-LiaoConjugate
GradientMethods.AppliedNumericalMathematics,151,354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[33]Powell,M.J.D.(1977)RestartProceduresfortheConjugateGradientMethod.Mathematical
Programming,12,241-254.https://doi.org/10.1007/BF01593790
[34]Jiang,X.z., etal. (2021)AnImprovedPolak-Ribi`ere-Polyak Conjugate Gradient Method with
anEfficientRestartDirection.ComputationalandAppliedMathematics,40,ArticleNo.174.
https://doi.org/10.1007/s40314-021-01557-9
[35]Zoutendijk,G.(1966)NonlinearProgramming:ANumericalSurvey.SIAMJournalonCon-
trol,4,194-210.https://doi.org/10.1137/0304019
DOI:10.12677/aam.2022.1174524267A^êÆ?Ð

版权所有:汉斯出版社 (Hans Publishers) Copyright © 2021 Hans Publishers Inc. All rights reserved.