设为首页
加入收藏
期刊导航
网站地图
首页
期刊
数学与物理
地球与环境
信息通讯
经济与管理
生命科学
工程技术
医药卫生
人文社科
化学与材料
会议
合作
新闻
我们
招聘
千人智库
我要投稿
办刊
期刊菜单
●领域
●编委
●投稿须知
●最新文章
●检索
●投稿
文章导航
●Abstract
●Full-Text PDF
●Full-Text HTML
●Full-Text ePUB
●Linked References
●How to Cite this Article
AdvancesinAppliedMathematics
A^
ê
Æ
?
Ð
,2022,11(7),4248-4267
PublishedOnlineJuly2022inHans.http://www.hanspub.org/journal/aam
https://doi.org/10.12677/aam.2022.117452
¦
)
Ã
å
`
z
¯
K
‘
Å
n
‘
Ý
F
Ý
{
444
ZZZ
§§§
ÅÅÅ
ûûû
∗
“
Œ
Æ
§
ê
Æ
†
Ú
O
Æ
§
ì
À
“
Â
v
F
Ï
µ
2022
c
6
4
F
¶
¹
^
F
Ï
µ
2022
c
6
29
F
¶
u
Ù
F
Ï
µ
2022
c
7
6
F
Á
‡
•
¦
)
Ã
å
‘
Å
`
z
¯
K
§
·
‚
J
Ñ
˜
«
‘
•
~
‘
Å
n
‘
Ý
F
Ý
{
(STCGVR)
§
d
•{
Œ
±
^
5
)û
š
à
‘
Å
¯
K
"
3
Ž
{
z
g
S
Ì
‚
S
“
m
©ž
§
n
‘
Ý
F
Ý
•
•
±
•
„
e
ü
•
•
-
#
m
©
S
“
§
k
/
J
p
Â
ñ
„
Ý
"
3
·
^
‡
e
§
?
Ø
T
Ž
{
5
Ÿ
Ú
Â
ñ
5
"
ê
Š
(
J
L
²
§
·
‚
•{
é
u
¦
)
Å
ì
Æ
S
¯
K
ä
k
ã
Œ
d
å
"
'
…
c
‘
Å
C
q
§
²
º
x
•
z
§
n
‘
Ý
F
Ý
§
Å
ì
Æ
S
§
•
~
AStochasticThree-TermConjugate
GradientMethodforUnconstrained
OptimizationProblems
LeiLiu,DanXue
∗
SchoolofMathematicsandStatistics,QingdaoUniversity,QingdaoShandong
Received:Jun.4
th
,2021;accepted:Jun.29
th
,2022;published:Jul.6
th
,2022
∗
Ï
Õ
Š
ö
"
©
Ù
Ú
^
:
4
Z
,
Å
û
.
¦
)
Ã
å
`
z
¯
K
‘
Å
n
‘
Ý
F
Ý
{
[J].
A^
ê
Æ
?
Ð
,2022,11(7):4248-4267.
DOI:10.12677/aam.2022.117452
4
Z
§
Å
û
Abstract
To solve unconstrained stochastic optimization problems,a stochastic three-term con-
jugategradientmethodwithvariancereduction(STCGVR)isproposed,whichcan
beusedtosolvenonconvexstochasticproblems.Atthebeginningofeachinnerloop
iteration,thethreeconjugategradientdirectionsrestarttheiterationinthesteepest
descentdirection,whicheffectivelyimprovestheconvergencespeed.Theproperties
andconvergenceofthealgorithmarediscussedunderappropriateconditions.The
numericalresultsdemonstratethatourmethodhasdramaticalpotentialformachine
learningproblems.
Keywords
StochasticApproximation,EmpiricalRiskMinimization,Three-TermConjugate
Gradient,MachineLearning,VarianceReduction
Copyright
c
2022byauthor(s)andHansPublishersInc.
This work is licensed undertheCreative Commons Attribution InternationalLicense(CCBY4.0).
http://creativecommons.org/licenses/by/4.0/
1.
Ú
ó
·
‚
•
Ä
±
e
‘
Å
`
z
¯
K
min
x
∈
R
d
f
(
x
) =
E
[
F
(
x,ξ
)]
,
(1)
ù
p
F
:
R
n
×
R
d
→
R
´
ë
Y
Œ
‡
§
¿
…
Œ
U
š
à
"
ξ
´
˜
‡
‘
Å
C
þ
"
E
[
·
]
L
«
é
ξ
Ï
"
§
f
(
x
)=
E
[
F
(
x,ξ
)]
¡
•
²
þ
¼
ê
"
du
3
N
õ
¢
S
œ
¹
e
§
©
Ù
¼
ê
P
™
•
½
¼
ê
F
(
·
,ξ
)
™
²
(
‰
Ñ
§
Ù
V
Ç
©
Ù
P
3
|
8
Θ
⊆
R
d
þ
"
•
8
I
¼
ê
Š
˜
‡
'
Ð
O
§
3
¢
S
¯
K
¥
|
^
ξ
²
©
Ù
5
“
O
¢
S
©
Ù
"
·
‚
)
¤
‘
Å
ξ
1
,ξ
2
,...,ξ
n
§
-
f
i
(
x
)=
F
(
x,ξ
i
)(
i
=
1
,...,n
)
§
²
~Ñ
y
3
Å
ì
Æ
S
¥
²
º
x
4
z
¯
K
min
x
∈
R
d
f
(
x
) =
1
n
n
X
i
=1
f
i
(
x
)
,
(2)
DOI:10.12677/aam.2022.1174524249
A^
ê
Æ
?
Ð
4
Z
§
Å
û
Ù
¥
f
i
(
x
)
L
«
†
1
i
‡
ê
â
é
A
›
”
¼
ê
§
n
L
«ê
â
ê
"
¯
K
(2)
²
~Ñ
y
3
Å
ì
Æ
S
[1–6]
±
9
Ã
‚X
Ú
¥
•
Z
]
©
[7,8]
¥
"
¦
)
¯
K
(2)
ž
§
•
3
˜
‡
]
Ô
§
=
n
Œ
U
š
~
Œ
"
du
°
(
F
Ý
&
E
Ø
N
´
¼
§
Ï
d
Ä
u
°
(
F
Ý
•{
´
Ø
ƒ
¢
S
"
•
Ž
Ñ
ù
˜
(
J
§
·
‚
|
^
Ä
u
ê
â
F
Ý
C
q
•{
§
J
Ñ
‘
Å
F
Ý
e
ü
(SGD)[9]
•{
§
d
•{
À
•
¦
)
Œ
5
Ã
å
`
z
¯
K
Ì
‡
•{
"
3
p
‘¯
K
¥
§
C
q
•
`
ë
ê
¤
I
S
“g
ê
Œ
U
š
~
Œ
§
SGD
•{
¢
S
á
Ú
å
E,
k
•
"
Ï
d
§
˜
Œ
1
SGD
\
„
•{
J
Ñ
"
~
X
§
‘
Å
²
þ
F
Ý
(SAG)[10,11]
Ú
SAGA[12]
•{
Ï
L
\
\
ƒ
c
F
Ý
Š
P
Á
5
¢
y
•
¯
Â
ñ
„
Ý
§
ù
•{
Ï
~
`u
y
k
SGD
•{
"
‘
Å
•
~
F
Ý
(SVRG)
•{
[13–15]
k
ü
‡
Ì
‚
§
3
Ì
‚
¥
O
Ž
F
Ý
(
z
‡
S
“
¡
•
˜
‡
{
)
§
3
S
Ì
‚
¥
O
Ž
•
‘
Å
F
Ý
"
S2GD[16,17]
Š
â
A
Û
½
Æ
§
3
z
‡
{
¥
$
1
‘
Å
ê
‡
‘
Å
F
Ý
"
d
§
˜
˜
•{
§
X
AdaGrad[18]
!
RMSprop[19]
Ú
Adam[20]
•
y
²
3
‘
Å‚
¸
¥
´
k
"
•
)û
8
I
¼
ê
-
Ç
¯
K
§
N
õ
‘
Å
Ž
{
J
Ñ
§
A
O
´
BFGS
Ž
{
"
é
u
r
à
¯
K
§
Mokhtari
Ú
Ribeiro[21]
J
Ñ
˜
«
K
z
‘
Å
BFGS(RES)
•{
§
¿
‰
Ñ
Ù
Â
ñ
5
©
Û
"
3
[22]
¥
§
Byrd
<
J
Ñ
˜
«
Ä
u
‘
Å
%
C
‘
Å
k
•
P
Á
BFGS(L-BFGS)[23]
•{
§
¿
y
²
Ù
é
r
à
¯
K
Â
ñ
5
"
Moritz
<
[24]
Ú
\
L-BFGS
˜
«
‘
Å
C
þ
§
§
(
Ü
•
~
g
Ž
§
Ï
d
é
u
r
à
¯
K
§
§
ä
k
‚
5
Â
ñ
„
Ý
"
3
[25]
¥
§
Gower
!
Goldfarb
Ú
Richtarik
J
Ñ
˜
«
é
à
¼
ê
‚
5
Â
ñ
•
~
¬
L-BFGS
•{
"
,
§
3
k
•
P
Á
‘
Å
[Ú
î
•{
¥
§
²
~
I
‡
m
‡
•
þ
é
5
k
/
O
Ž
¦
È
H
∇
f
(H
´
Hessian)
"
3
S
•
k
•
œ
¹
e
§
é
u
Œ
5
Å
ì
Æ
S
¯
K
Œ
U
´
š
~
(
J
"
Ý
F
Ý
(CG)
•{
(
{
ü
§
S
•
‡
¦
$
§
Ï
d
2
•
^u
)û
Œ
5
`
z
¯
K
[26–28]
"
Fletcher
Ú
Reeves(FR)[26]
Ä
k
J
Ñ
X
Û
ò
‚
5
Ý
F
Ý
{
*
Ð
š
‚
5
¼
ê
§
¡
•
FR
•{
"
3
[29]
¥
§
Dai
Ú
Liao
J
Ñ
Dai-Liao
n
‘
Ý
F
Ý
{
§
¿
ò
[Ú
î
E
â
†
Ý
5
Ÿ
ƒ
(
Ü
§
¼
•
Ð
Â
ñ
(
J
"
d
§
Ä
u
[Ú
î
^
‡
§
Babaie,Kafaki
Ú
Ghanbari[30]
§
Andrei[31]
¼
˜
X
n
‘
Ý
F
Ý
•{
§
ù
•{
é
u
r
à
¼
ê´
Û
Â
ñ
"
3
©
z
[32]
¥
§
Yao
J
Ñ
˜
«
U
?
Dai-Liao
n
‘
Ý
F
Ý
{
"
©
3
©
z
[32]
Ä
:
þ
§
J
Ñ
˜
«
‘
•
~
‘
Å
n
‘
Ý
F
Ý
{
(STCGVR)
§
§
ò
U
?
Dai-Liao
n
‘
Ý
F
Ý
†
‘
Å
•
~
ƒ
(
Ü
§
^u
¦
)
Ã
å
‘
Å
`
z
¯
K
"
·
‚
3
©
¥
z
X
e
µ
1.
Ä
u
SVRG
•
#
?
Ð
§
J
Ñ
¦
)
‘
Å
`
z
¯
K
(2)
STCGVR
•{
§
¿
y
²
Ù
é
r
à
1
w
¼
ê
‚
5
Â
ñ
5
"
2.
3
STCGVR
z
g
S
Ì
‚
m
©ž
§
-
#
é
Ä
•
„
e
ü
S
“
•
•
§
k
/
J
p
Â
ñ
„
Ý
"
3.
é
A
‡
Å
ì
Æ
S
¯
K
ê
Š
¢
L
²
§
†
SVRG
•{
ƒ
'
§
STCGVR
•{
´
š
~
k
"
©
•
{
Ü
©
|
„
X
e
"
1
2
!0
^u
)û
‘
Å
`
z
¯
K
U
?
Dai-Liao
n
‘
Ý
F
Ý
{
!
SVRG
Ž
{
Ú
‘
•
~
‘
Å
n
‘
Ý
F
Ý
{
"
3
1
3
!
¥
§
3
·
^
‡
e
y
²
#
Ž
{
Â
ñ
5
"
3
1
4
!
¥
§
w
˜
Ð
Ú
ê
Š
(
J
"
•
§
1
5
!
Ñ
˜
(
Ø
"
DOI:10.12677/aam.2022.1174524250
A^
ê
Æ
?
Ð
4
Z
§
Å
û
2.
^u
Ã
å
`
z
STCGVR
Ž
{
2.1.
n
‘
Ý
F
Ý
Ý
F
Ý
{
Ï
Ù
{
ü
…
•;
þ
$
2
•
^u
)û
Œ
5
`
z
¯
K
§
§
¬
)
¤
˜
X
S
“
µ
x
k
+1
=
x
k
+
α
k
d
k
,
(3)
ù
p
Ú
•
α
k
d
±
e
Wolfe
‚
|¢
(
½
:
f
(
x
k
+
α
k
d
k
)
≤
f
(
x
k
)+
c
1
α
k
g
T
k
d
k
,
(4)
g
T
k
+1
d
k
≥
c
2
g
T
k
d
k
,
(5)
ù
p
0
<c
1
<c
2
<
1,
|¢
•
•
d
k
d
±
e
ú
ª
(
½
µ
d
k
=
−
g
0
,k
= 0
.
−
g
k
+
β
k
d
k
−
1
,k
≥
1
.
(6)
Ù
¥
§
β
k
´
˜
‡
ë
ê
§
g
k
=
∇
f
(
x
k
)
´
8
I
¼
ê
f
(
x
)
3
x
k
?
F
Ý
"
Ý
F
Ý
{
•
;
.
A
´
Ý
5
§
=
(6)
)
¤
|¢
•
•
A
ä
k
±
e
Ý
^
‡
µ
d
k
+1
y
k
= 0
,k
≥
1
,
(7)
Ù
¥
§
y
k
=
g
k
+1
−
g
k
"
C
c
5
§
Ý
^
‡
˜
†
´
ï
Ä
ö
'
5
:
"
Dai
Ú
Liao
¼
˜
‡
w
Í
(
J
[29]
"
3
[29]
¥
§
¦
^
I
O
•
‚
•
§
µ
B
k
+1
s
k
=
y
k
,
(8)
Ù
¥
s
k
=
x
k
+1
−
x
k
§
B
k
+1
´
f
(
x
)Hessian
C
q
é
¡
Ý
"
,
ò
Ý
^
‡
(7)
í
2
Dai-Liao
Ý
^
‡
d
T
k
+1
y
k
=
−
t
1
g
T
k
+1
s
k
,
(9)
Ù
¥
t
1
•
š
K
ë
ê
"
Ä
u
Dai-Liao
Ý
^
‡
(9)
Ú
[Ú
î
E
â
§
Yao
<
J
Ñ
˜
«
é
¡
?
Dai-
Liao
Ý
Q
MP
t
+1
=
I
+
η
k
Q
k
+1
2
+
Q
k
+1
1
,
(10)
Ù
¥
Q
k
+1
2
=
−
s
k
y
T
k
−
y
k
s
T
k
s
T
k
y
k
,Q
k
+1
1
=
s
k
s
T
k
s
T
k
y
k
,
(11)
DOI:10.12677/aam.2022.1174524251
A^
ê
Æ
?
Ð
4
Z
§
Å
û
Ù
¥
η
k
´
–
(
½
ë
ê
"
|¢
•
•
d
±
e
ú
ª)
¤
d
k
+1
=
−
Q
MP
k
+1
g
k
+1
,k
≥
1
,
(12)
Š
â
Q
MP
k
+1
½
Â
§
d
(12)
)
¤
|¢
•
•
û
u
z
g
S
“
ž
ë
ê
η
k
"
du
T
•{
n
´
d
(12)
)
¤
|¢
•
•
A
÷
v
Dai-Liao
Ý
^
‡
(9)
"
Š
â
T
n
§
(
Ü
(10)
!
(11)
Ú
(12)
§
·
‚
η
k
=
g
T
k
+1
y
k
+(1
−
t
1
)
g
T
k
+1
s
k
g
T
k
+1
y
k
−
||
y
k
||
2
s
T
k
y
k
g
T
k
+1
s
k
,
(13)
Ù
¥
§
ë
ê
t
1
´
(9)
¥
Dai-Liao
ë
ê
t
1
"
,
˜
•
¡
§
l
η
k
½
Â
5
w
§
η
k
Š
Œ
U
š
~
Œ
§
$
–
ª
u
Ã
¡
Œ
"
•
¼
Ž
{
Û
Â
ñ
5
§
η
k
•
›
X
e
µ
η
k
=
min
{|
g
T
k
+1
y
k
+(1
−
t
1
)
g
T
k
+1
s
k
g
T
k
+1
y
k
−
||
y
k
||
2
s
T
k
y
k
g
T
k
+1
s
k
|
,M
1
}
,
(14)
Ù
¥
M
1
•
~
ê
"
¢
S
þ
§
d
(10)
−
(12)
)
¤
•
•
Œ
±
-
•
;
.
n
‘
Ý
F
Ý
•
•
µ
d
k
+1
=
−
g
k
+1
+
β
k
d
k
+
δ
k
y
k
,
(15)
Ù
¥
t
1
,
β
k
,
δ
k
d
t
1
=
||
y
k
||
2
s
T
k
y
k
,
(16)
β
k
=
max
{
η
k
g
T
k
+1
y
k
−
g
T
k
+1
s
k
d
T
k
y
k
,
0
}
,
(17)
δ
k
=
−
η
k
g
T
k
+1
s
k
y
T
k
s
k
,
(18)
)
¤
§
Ù
¥
ë
ê
η
k
d
(14)
(
½
"
U
?
n
‘
Ý
F
Ý
{
ò
Ý
^
‡
†
[Ú
î
E
â
ƒ
(
Ü
§
k
/
J
p
D
Ú
Ý
F
Ý
{
Ç
"
Ï
d
§
T
•{
3
¦
)
Œ
5
`
z
¯
K
¥
ä
k
é
Œ
u
Ð
c
µ
"
2.2.
‘
Å
•
~
F
Ý
(SVRG)
Ž
{
3
SGD
¥
§
ü
‡
F
Ý
´
N
²
þ
F
Ý
˜
‡
Ã
O
§
F
Ý
•
¬
‘
X
S
“
DOI:10.12677/aam.2022.1174524252
A^
ê
Æ
?
Ð
4
Z
§
Å
û
O
\
Ø
ä
\
\
§
ù
¬
¦
SGD
Â
ñ
„
Ý
C
ú
§
Ã
{
ˆ
‚
5
Â
ñ
§
¤
±
·
‚
Ú
\
•
~
ü
Ñ
"
•
~
ü
Ñ
Ï
L
E
A
Ï
‘
Å
F
Ý
O
þ
§
¦
z
g
S
“
•
k
˜
‡
Ø
ä
~
þ
.
§
l
¯
Â
ñ
„
Ý
"
·
‚
J
Ñ
`
z
(2)
SVRG
Ž
{
§
¿
3
Ž
{
1
¥
é
Ù
?
1
£
ã
"
Algorithm1.
SVRG
Ð
©
z
:
‰
½
˜
‡
Ð
©
:
˜
x
0
∈
R
n
,
½
Ú
•
α
§
À
~
ê
0
<c
1
<c
2
<
1,
M
1
>
0.
1:
for
k=0,1,2,...
do
2:
x
k
+1
0
=˜
x
k
.
3:
O
Ž
F
Ý
∇
f
(˜
x
k
) =
1
n
n
P
i
=1
∇
f
i
(˜
x
k
).
4:
for
t=0,1,...,m-1
do
5:
l
i
t
⊂{
1
,
2
,...,n
}
¥
‘
Å
Ä
˜
‡
.
6:
O
Ž‘
Å
F
Ý
g
k
+1
t
=
∇
f
i
t
(
x
k
+1
t
)
−
(
∇
f
i
t
(˜
x
k
)
−∇
f
(˜
x
k
)).
7:
O
Ž
x
k
+1
t
+1
=
x
k
+1
t
−
αg
k
+1
t
.
8:
endfor
9:
˜
x
k
+1
=
1
m
m
P
t
=1
x
k
+1
t
.
10:
endfor
Ž
{
1
¥
k
ü
‡
Ì
‚
"
3
Ì
‚
¥
§
F
Ý
∇
f
(˜
x
k
)
O
Ž
"
˜
x
k
z
…
m
g
•
#
•
˜
‡
/
¯
ì
0
§
P
•
x
k
0
"
3
S
Ì
‚
¥
§
·
‚
l
ê
â
8
X
¥
‘
Å
À
J
˜
‡
^u
)
¤
‘
Å
F
Ý
"
•
Ò
´
`
§
˜
x
k
•
1
k
‡
u
:
§
,
·
‚
I
‡
O
Ž
˜
x
k
:
?
F
Ý
µ
∇
f
(˜
x
k
) =
1
n
n
X
i
=1
∇
f
i
(˜
x
k
)
.
(19)
3
Y
S
“
¥
§
•
•
g
k
+1
t
^
Š
•
#
•
•
µ
g
k
+1
t
=
∇
f
i
t
(
x
k
+1
t
)
−
(
∇
f
i
t
(˜
x
k
)
−∇
f
(˜
x
k
))
,
(20)
Ù
¥
i
t
⊂{
1
,
2
,...,N
}
?
¿
Ä
"
5
¿
‘
Å
F
Ý
g
k
+1
t
+1
´
∇
f
(
x
k
+1
t
+1
)
˜
‡
Ã
F
Ý
O
§
=
µ
E
[
g
k
+1
t
+1
|
x
k
+1
t
+1
] =
∇
f
(
x
k
+1
t
+1
).
2.3.STCGVR
Ž
{
©
8
I
´
O
˜
«
¦
F
Ý
•
$
•{
§
Ó
ž
ä
k
$
S
•
‡
¦
"
•
d
§
·
‚
ò
SVRG
†
U
?
Dai-Liao
n
‘
Ý
F
Ý
{
ƒ
(
Ü
"
Ž
{
2
¥
o
(
STCGVR
Ž
{
"
3
Ž
{
2
¥
§
·
‚
Ï
L
±
e
S
“
O
Ž|¢
•
•
d
t
+1
:
d
t
+1
=
−
g
k
+1
t
+1
+
β
t
d
t
+
δ
t
y
t
,
(21)
DOI:10.12677/aam.2022.1174524253
A^
ê
Æ
?
Ð
4
Z
§
Å
û
where
t
1
,
β
t
,
δ
t
by
t
1
=
||
y
t
||
2
s
T
t
y
t
,
(22)
β
t
=
max
{
η
t
(
g
k
+1
t
+1
)
T
y
t
−
(
g
k
+1
t
+1
)
T
s
t
d
T
t
y
t
,
0
}
,
(23)
δ
k
=
−
η
t
(
g
k
+1
t
+1
)
T
s
t
y
T
t
s
t
,
(24)
η
t
=
min
{|
(
g
k
+1
t
+1
)
T
y
t
+(1
−
t
1
)(
g
k
+1
t
+1
)
T
s
t
(
g
k
+1
t
+1
)
T
y
t
−
||
y
t
||
2
s
T
t
y
t
(
g
k
+1
t
+1
)
T
s
t
|
,M
1
}
,
(25)
Ù
¥
M
1
´
˜
‡
~
ê
§
s
t
=
x
k
+1
t
+1
−
x
k
+1
t
,y
t
=
g
k
+1
t
+1
−
g
k
+1
t
.
Algorithm2.
STCGVR
Ð
©
z
:
‰
½
˜
‡
Ð
©
:
˜
x
0
,
Ð
©
Ú
•
α
0
,
•
#
ª
Ç
m
,
S
“
{
x
k
+1
t
:
t
= 0
,...,m
−
1;
k
=0
,
1
,
2
,...
}
,
À
~
ê
0
<c
1
<c
2
<
1,
M
1
>
0.
1:
h
0
=
∇
f
(˜
x
0
).
2:
for
k=0,1,2,...
do
3:
O
Ž
F
Ý
∇
f
(˜
x
k
) =
1
n
n
P
i
=1
∇
f
i
(˜
x
k
).
4:
-
x
k
+1
0
=˜
x
k
,
g
k
+1
0
=
h
k
,
d
0
=
−
g
k
+1
0
.
5:
for
t=0,1,...,m-1
do
6:
N
^
‚
|¢Ž
{
(4)and(5)
O
Ž
α
t
.
7:
O
Ž
x
k
+1
t
+1
=
x
k
+1
t
+
α
t
d
t
.
8:
‘
Å
Ä
i
t
⊂{
1
,
2
,...,n
}
.
9:
O
Ž‘
Å
F
Ý
g
k
+1
t
=
∇
f
i
t
(
x
k
+1
t
)
−
(
∇
f
i
t
(˜
x
k
)
−∇
f
(˜
x
k
)).
10:
Ï
L
(21)
−
(25)
O
Ž
d
t
+1
.
11:
endfor
12:
h
k
+1
=
g
k
+1
m
,
˜
x
k
+1
=
1
m
m
P
t
=1
x
k
+1
t
.
13:
endfor
†
SVRG
a
q
§
Ž
{
2
©
•
ü
‡
Ì
‚
"
3
Ü
Ì
‚
¥
§
O
Ž
Ü
S
“
˜
x
k
∈
R
n
Ú
F
Ý
∇
f
(˜
x
k
)
"
3
S
Ì
‚
¥
§
¦
^
SVRG
•
#
é
F
Ý
g
k
+1
t
O
"
d
§
·
‚
3
z
‡
S
Ü
Ì
‚
S
“
m
©
ž
±
•
„
e
ü
Ú
-
#
m
©
S
“
§
~
X
µ
[33,34]
"
-
#
é
Ä
ò
½
Ï
-
é
Ž
{
¿
ž
Ø
Œ
U
Ã
Ã
Î
&
E
"
Ï
d
§
STCGVR
Œ
^u
)û
Œ
5
Ã
å
‘
Å
`
z
¯
K
§
ä
k
û
Ð
u
Ð
c
µ
"
3.
Â
ñ
5
©
Û
3
!
¥
§
·
‚
y
²
Ž
{
2
)
¤
S
“
S
´
‚
5
Â
ñ
"
DOI:10.12677/aam.2022.1174524254
A^
ê
Æ
?
Ð
4
Z
§
Å
û
e
¡
·
‚
‰
Ñ
©
¥
‡
^
˜
b
"
b
1
b
Y
²
8
z
=
{
x
|
f
(
x
)
≤
f
(
x
0
)
}
´
k
.
"
d
§
¼
ê
f
(
x
)
3
z
¥
´
k
.
"
b
2
b
f
i
:
R
n
→
R
ë
Y
Œ
‡
§
∇
f
´
Û
Lipschitz
ë
Y
§
Ù
Lipschitz
~
ê
•
L
§
=
µ
é
u
∀
x,y
∈
R
n
,
k
±
e
¤
á
µ
||∇
f
(
x
)
−∇
f
(
y
)
||≤
L
||
x
−
y
||
.
(26)
b
3
STCGVR
Ž
{
¥
Ú
•
α
t
÷
v
α
t
∈
[
α
l
,α
r
](0
<α
l
<α
r
).
b
4
du
‘
Å
F
Ý
g
k
+1
t
´
∇
f
(
x
k
+1
t
)
˜
‡
Ã
O
§
=
µ
E
[
g
k
+1
t
|
x
k
+1
t
] =
∇
f
(
x
k
+1
t
),
•
3
˜
‡
~
ê
H,
é
u
¤
k
t
= 0
,
1
,...,m
−
1;
k
= 0
,
1
,
2
,...
§
k
||∇
f
(
x
k
+1
t
)
−
g
k
+1
t
||≤
H.
(27)
b
5
•
3
ü
‡
~
ê
κ,κ
§
k
±
e
¤
á
µ
κI
Q
MP
t
κI,
∀
t,
(28)
Ù
¥
Î
Ò
A
B
,
A,B
∈
R
n
×
n
“
L
A
−
B
´
Œ
½
"
b
6
é
u
¤
k
t
= 0
,
1
,...,m
−
1;
k
= 0
,
1
,
2
,...
§
‘
Å
F
Ý
g
k
+1
t
´
k
.
§
=
µ
||
g
k
+1
t
||≤
Λ
.
(29)
Ú
n
3.1.
b
d
t
d
(21)
−
(25)
)
¤
§
X
J
Ú
•
α
t
´
d
Wolfe
|¢
^
‡
(4)
Ú
(5)
)
¤
§
K
¿
©
e
ü
5
Ÿ
é
?
Û
t
= 0
,...,m
−
1
;
k
= 0
,
1
,
2
,...
¤
á
¶
=
•
3
˜
‡
~
ê
ρ
1
§
¦
−
(
g
k
+1
t
)
T
d
t
≥
ρ
1
||
g
k
+1
t
||
2
.
(30)
y
²
µ
Ï
•
(
g
k
+1
0
)
T
d
0
=
−||
g
k
+1
0
||
2
,
¿
©
e
ü
^
‡
é
u
t
= 0
¤
á
.
d
ª
(21)
−
(25),
·
‚
(
d
k
+1
t
+1
)
T
g
k
+1
t
+1
= (
−
g
k
+1
t
+1
+
β
t
d
t
+
δ
t
y
t
)
T
g
k
+1
t
+1
=
−||
g
k
+1
t
+1
||
2
+
β
t
(
g
k
+1
t
+1
)
T
d
t
+
δ
t
(
g
k
+1
t
+1
)
T
y
t
=
−||
g
k
+1
t
+1
||
2
+
η
t
(
g
k
+1
t
+1
)
T
y
t
−
(
g
k
+1
t
+1
)
T
s
t
s
T
t
y
t
(
g
k
+1
t
+1
)
T
s
t
−
η
t
(
g
k
+1
t
+1
)
T
s
t
y
T
t
s
t
(
g
k
+1
t
+1
)
T
y
t
=
−||
g
k
+1
t
+1
||
2
−
((
g
k
+1
t
+1
)
T
s
t
)
2
s
T
t
y
t
(31)
d
,Wolfe
‚
|¢
^
‡
(4)
Ú
(5)
Œ
±
(
s
T
t
y
t
>
0.
ª
(31)
¿
›
X
¿
©
e
ü
^
‡
é
u
ρ
1
= 1
¤
á
"
DOI:10.12677/aam.2022.1174524255
A^
ê
Æ
?
Ð
4
Z
§
Å
û
|¢
•
•
¿
©
e
ü
A
5
3
STCGVR
Â
ñ
5
©
Û
¥
´
7Ø
Œ
"
þ
ã
Ú
n
3
.
1
L
²
Ž
{
2
)
¤
|¢
•
•
3
Wolfe
‚
|¢
e
ä
k
T
5
Ÿ
"
Ú
n
3.2.
b
d
t
d
(21)-(25)
)
¤
§
X
J
Ú
•
α
t
ä
k
Wolfe
‚
|¢
^
‡
(4)
Ú
(5)
(
½
§
f
(
x
)
÷
v
b
2
Ú
b
4,
·
‚
µ
α
t
≥
(
c
2
−
1)(
g
k
+1
t
)
T
d
t
−
2
H
||
d
t
||
L
||
d
t
||
2
.
(32)
y
²
µ
d
b
2
Ú
b
4,
·
‚
||
y
t
||
=
||
g
k
+1
t
+1
−
g
k
+1
t
||
=
||
g
k
+1
t
+1
−∇
f
(
x
k
+1
t
+1
)+
∇
f
(
x
k
+1
t
+1
)
−∇
f
(
x
k
+1
t
)+
∇
f
(
x
k
+1
t
)
−
g
k
+1
t
||
≤||
g
k
+1
t
+1
−∇
f
(
x
k
+1
t
+1
)
||
+
||∇
f
(
x
k
+1
t
+1
)
−∇
f
(
x
k
+1
t
)
||
+
||∇
f
(
x
k
+1
t
)
−
g
k
+1
t
||
≤
2
H
+
L
||
s
t
||
,
(33)
(
Ü
Lipschitz
Ø
ª
(27)
Ú
Wolfe
^
‡
§
·
‚
U
í
Ñ
(
c
2
−
1)(
g
k
+1
t
)
T
d
t
≤
(
g
k
+1
t
+1
−
g
k
+1
t
)
T
d
t
=
y
T
t
d
t
≤||
y
t
||||
d
t
||
≤
(2
H
+
L
||
s
t
||
)
||
d
t
||
.
(34)
Ï
d
§
Ú
n
y
"
Ú
n
3.3.
b
d
t
d
(21)
−
(25)
)
¤
§
X
J
Ú
•
α
t
ä
k
Wolfe
‚
|¢
^
‡
(4)
Ú
(5)
(
½
§
f
(
x
)
÷
v
b
1
Ú
b
2,
@
o
±
e
Zoutendijk
^
‡
¤
á
µ
X
t
≥
1
((
g
k
+1
t
)
T
d
t
)
2
||
d
t
||
2
<
∞
.
(35)
y
²
µ
d
Wolfe
^
‡
(4)
f
(
x
k
+1
t
)
−
f
(
x
k
+1
t
+1
)
≥−
c
1
α
t
(
g
k
+1
t
)
T
d
t
,
(36)
(
Ü
(32)
§
·
‚
k
f
(
x
k
+1
t
)
−
f
(
x
k
+1
t
+1
)
≥−
c
1
(
c
2
−
1)(
g
k
+1
t
)
T
d
t
−
2
H
||
d
t
||
L
||
d
t
||
2
(
g
k
+1
t
)
T
d
t
≥
c
1
(1
−
c
2
)((
g
k
+1
t
)
T
d
t
)
2
+2
c
1
H
||
d
t
||
(
g
k
+1
t
)
T
d
t
L
||
d
t
||
2
,
(37)
DOI:10.12677/aam.2022.1174524256
A^
ê
Æ
?
Ð
4
Z
§
Å
û
Ï
L
é
(37)
ü
>
ý
é
Š
¿
¦
Ú
§
X
t
≥
1
|
f
(
x
k
+1
t
)
−
f
(
x
k
+1
t
+
α
t
d
t
)
|
≥
X
t
≥
1
|
c
1
(1
−
c
2
)((
g
k
+1
t
)
T
d
t
)
2
L
||
d
t
||
2
+
2
c
1
H
||
d
t
||
(
g
k
+1
t
)
T
d
t
L
||
d
t
||
2
|
≥
X
t
≥
1
(
|
c
1
(1
−
c
2
)((
g
k
+1
t
)
T
d
t
)
2
L
||
d
t
||
2
|−|
2
c
1
H
||
(
g
k
+1
t
)
||
L
|
)
.
(38)
Ï
L
é
ª
(38)
ü
>
¦
Ú
§
¿
(
Ü
b
1
Ú
k
g
k
+1
t
k
k
.
§
K
Zoutendijk
^
‡
3
‘
Å
œ
¹
e
¤
á
µ
X
t
≥
1
((
g
k
+1
t
)
T
d
t
)
2
||
d
t
||
2
<
∞
.
(39)
Ú
n
3.4.
b
d
t
d
(21)
−
(25)
)
¤
§
X
J
Ú
•
α
t
ä
k
Wolfe
‚
|¢
^
‡
(4)
Ú
(5)
(
½
§
@
o
é
u
r
à
¼
ê
§
S
d
t
‰
ê´
k
.
§
=
•
3
M>
0
¦
||
d
t
||≤
M,
(40)
¤
á
"
y
²
µ
d
ª
(10)
−
(12),
b
5
Ú
b
6
§
·
‚
||
d
t
||
=
||−
Q
MP
t
g
k
+1
t
||≤
κ
||
g
k
+1
t
||≤
κ
Λ =
M.
(41)
½
n
3.1.
b
d
t
d
(21)
−
(25)
)
¤
§
X
J
Ú
•
α
t
ä
k
Wolfe
‚
|¢
^
‡
(4)
Ú
(5)
(
½
§
8
I
¼
ê
f
(
x
)
´
r
à
¿
…
÷
v
b
1
§
@
o
·
‚
k
lim
t
→∞
||
g
k
+1
t
||
= 0
.
(42)
y
²
µ
Ä
u
Ú
n
3.4
§
·
‚
k
d
t
≤
M
k
"
Š
â
¿
©
e
ü
^
‡
µ
−
(
g
k
+1
t
)
T
d
t
≥
ρ
1
k
g
k
+1
t
k
2
§
2
(
Ü
Ú
n
3.3
§
·
‚
k
∞
>
X
t
≥
1
((
g
k
+1
t
)
T
d
t
)
2
||
d
t
||
2
≥
X
t
≥
1
((
g
k
+1
t
)
T
d
t
)
2
M
2
≥
ρ
2
1
M
2
X
t
≥
1
||
g
k
+1
t
||
2
,
(43)
l
í
Ñ
(42)
§
½
n
y
"
±
þ
½
n
3.1
L
²
·
‚
J
Ñ
Ž
{
é
u
r
à
¼
ê´
Û
Â
ñ
"
Ú
n
3.5.
b
½
b
2
¤
á
§
x
∗
´
f
(
x
)
•
˜
•
Š
:
"
@
o
é
u
?
¿
x
∈
R
n
§
·
‚
k
1
2
L
||∇
f
(
x
)
||
2
≤
f
(
x
)
−
f
(
x
∗
)
.
(44)
DOI:10.12677/aam.2022.1174524257
A^
ê
Æ
?
Ð
4
Z
§
Å
û
y
²
µ
du
x
∗
´
f
(
x
)
•
˜
4
Š
:
§
d
b
2
§
·
‚
k
f
(
x
∗
)
≤
f
(
y
)
≤
f
(
x
)+
∇
f
(
x
)
T
(
y
−
x
)+
L
2
||
y
−
x
||
2
,
(45)
½
x
,
Ï
•
±
þ
ú
ª
é
u
?
¿
y
Ñ
¤
á
§
e
(
.
Œ
±
3
þ
ã
Ø
ª
m
>
µ
f
(
x
∗
)
≤
f
(
x
)+
∇
f
(
x
)
T
(
y
−
x
)+
L
2
||
y
−
x
||
2
=
f
(
x
)
−
L
2
||∇
f
(
x
)
||
2
.
(46)
Ï
d
§
Ø
ª
(44)
¤
á
"
Ú
n
3.6.
b
½
x
∗
•
f
(
x
)
•
˜
4
Š
:
§
b
2
¤
á
§
g
k
+1
t
=
∇
f
i
t
(
x
k
+1
t
)
−
(
∇
f
i
t
(˜
x
k
)
−
∇
f
(˜
x
k
))
´
•
~
‘
Å
F
Ý
"
ƒ
é
u
i
t
Ï
"
,
·
‚
E
[
||
g
k
+1
t
||
2
]
≤
4
L
(
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]+
E
[
f
(˜
x
k
)
−
f
(
x
∗
)])
.
(47)
y
²
µ
d
g
k
+1
t
•
#
ú
ª
§
·
‚
E
[
||
g
k
+1
t
||
2
] =
E
[
||∇
f
i
t
(
x
k
+1
t
)
−∇
f
i
t
(˜
x
k
)+
∇
f
(˜
x
k
)
||
2
]
=
E
[
||∇
f
i
t
(
x
k
+1
t
)
−∇
f
i
t
(˜
x
k
)+
∇
f
(˜
x
k
)+
∇
f
i
t
(
x
∗
)
−∇
f
i
t
(
x
∗
)
||
2
]
≤
2
E
[
||∇
f
i
t
(
x
k
+1
t
)
−∇
f
i
t
(
x
∗
)
||
2
]+2
E
[
||∇
f
i
t
(˜
x
k
)
−∇
f
(˜
x
k
)
−∇
f
i
t
(
x
∗
)
||
2
]
.
(48)
e
5
§
·
‚
E
˜
‡9
Ï
¼
ê
µ
Φ
i
(
x
) =
f
i
(
x
)
−
f
i
(
x
∗
)
−∇
f
i
(
x
∗
)(
x
−
x
∗
)
,
(49)
5
¿
Φ
i
(
x
)
´
˜
‡
à
¼
ê
§
∇
f
´
Û
Lipschitz
ë
Y
§
Ù
Lipschitz
ë
Y
~
ê
•
L
§
(
Ü
(44)
§
·
‚
k
1
2
L
||∇
Φ
i
(
x
)
||
2
≤
Φ
i
(
x
)
−
Φ
i
(
x
∗
)
.
(50)
A^
Φ
i
(
x
)
Ú
∇
Φ
i
(
x
)
L
ˆ
ª
§
·
‚
||∇
f
i
(
x
)
−
f
i
(
x
∗
)
||
2
≤
2
L
[
f
i
(
x
)
−
f
i
(
x
∗
)
−∇
f
i
(
x
∗
)(
x
−
x
∗
)]
.
(51)
é
(51)
ü
>
l
1
n
¦
Ú
§
¿
…
5
¿
∇
f
(
x
∗
) = 0
§
·
‚
1
n
n
X
i
=1
||∇
f
i
(
x
)
−
f
i
(
x
∗
)
||
2
≤
2
L
[
f
(
x
)
−
f
(
x
∗
)]
,
∀
x.
(52)
DOI:10.12677/aam.2022.1174524258
A^
ê
Æ
?
Ð
4
Z
§
Å
û
Ï
d
§
·
‚
k
E
[
||∇
f
i
t
(
x
k
+1
t
)
−∇
f
i
t
(
x
∗
)
||
2
]
=
E
[
E
[
||∇
f
i
t
(
x
k
+1
t
)
−∇
f
i
t
(
x
∗
)
||
2
]]
=
E
[
1
n
n
X
i
=1
||∇
f
i
(
x
k
+1
t
)
−∇
f
i
(
x
∗
)
||
2
]
≤
2
LE
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]
,
(53)
Ú
E
[
||∇
f
i
t
(˜
x
k
−
1
)
−∇
f
i
t
(
x
∗
)
||
2
]
≤
2
LE
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]
.
(54)
2
(
Ü
(47),(53),(54)
§
·
‚
k
E
[
||
g
k
+1
t
||
2
]
≤
4
L
(
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]+
E
[
f
(˜
x
k
)
−
f
(
x
∗
)])
.
(55)
½
n
3.2.
b
½
b
1-
b
4
¤
á
¿
…
f(x)
´
r
à
§
Ù
r
à
ë
ê´
u,
-
x
∗
´
f
(
x
)
•
˜
4
Š
§
¿
b
m
v
Œ
§
¦
ρ
=
(
2
u
+4
Lα
2
r
m
)
2
α
l
m
(
κ
+2
α
l
Lκ
2
)
<
1
.
(56)
@
o
§
é
u
¤
k
k
≥
0
,
·
‚
k
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
≤
ρ
k
E
[
f
(˜
x
0
)
−
f
(
x
∗
)]
.
(57)
y
²
µ
½
Â
∆
t
=
k
x
k
+1
t
−
x
∗
k
§
qd
ª
(12)
§
b
2
Ú
b
5
§
·
‚
k
E
[∆
2
t
+1
] =
E
[
||
x
k
+1
t
+1
−
x
∗
||
2
]
=
E
[
||
x
k
+1
t
−
α
t
Q
MP
t
g
k
+1
t
−
x
∗
||
2
]
=
E
[∆
2
t
]
−
2
α
t
E
[
<Q
MP
t
g
k
+1
t
,x
k
+1
t
−
x
∗
>
]+
α
2
t
E
[
||
Q
MP
t
g
k
+1
t
||
2
]
=
E
[∆
2
t
]
−
2
α
t
E
[
<Q
MP
t
∇
f
(
x
k
+1
t
)
,x
k
+1
t
−
x
∗
>
]+
α
2
t
||
Q
MP
t
||
2
E
[
||
g
k
+1
t
||
2
]
≤
E
[∆
2
t
]
−
2
α
t
κ
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]+
α
2
t
κ
2
E
[
||
g
k
+1
t
||
2
]
,
(58)
(
Ü
(47)
Ú
Ú
n
3.6
§
·
‚
E
[∆
2
t
+1
]
≤
E
[∆
2
t
]
−
2
α
t
κ
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]
+4
α
2
t
κ
2
L
(
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]+
E
[
f
(˜
x
k
)
−
f
(
x
∗
)])
=
E
[∆
2
t
]
−
(2
α
t
κ
−
4
α
2
t
Lκ
2
)[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]+4
α
2
t
κ
2
LE
[
f
(˜
x
k
)
−
f
(
x
∗
)]
.
(59)
DOI:10.12677/aam.2022.1174524259
A^
ê
Æ
?
Ð
4
Z
§
Å
û
é
t
l
0
m-1
¦
Ú
§
(
Ü
x
k
+1
1
=˜
x
k
−
1
§
·
‚
k
E
[∆
2
m
+1
]+2
α
t
(
κ
+2
α
t
Lκ
2
)
m
X
t
=1
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]
≤
E
[
||
˜
x
k
−
1
−
x
∗
||
2
]+4
α
2
t
κ
2
LmE
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]
.
(60)
du
f
(
x
)
´
r
à
§
·
‚
E
[∆
2
m
+1
]+2
α
t
(
κ
+2
α
t
Lκ
2
)
m
X
t
=1
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]
≤
2
u
E
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]+4
α
2
t
κ
2
LmE
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]
.
(61)
Š
â
˜
x
k
+1
=
1
m
m
P
t
=1
x
k
+1
t
§
·
‚
k
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
≤
1
m
m
X
t
=1
E
[
f
(
x
k
+1
t
)
−
f
(
x
∗
)]
≤
1
2
α
t
(
κ
+2
α
t
Lκ
2
)
m
(
2
u
+4
Lκ
2
mα
2
t
)
E
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]
≤
1
2
α
l
(
κ
+2
α
l
Lκ
2
)
m
(
2
u
+4
Lκ
2
mα
2
r
)
E
[
f
(˜
x
k
−
1
)
−
f
(
x
∗
)]
.
(62)
Ï
L8
B
§
·
‚
k
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
≤
ρ
k
E
[
f
(˜
x
0
)
−
f
(
x
∗
)]
,
(63)
Ù
¥
§
ρ
=
(
2
u
+4
Lα
2
r
m
)
2
α
l
m
(
κ
+2
α
l
Lκ
2
)
.
(64)
X
J
-
ρ<
1
§
K
k
m
≥
1
(
α
l
κ
+2
α
2
l
Lκ
2
−
2
α
2
r
Lκ
2
)
u
.
(65)
½
n
3.2
¿
›
X
STCGVR
3
¼
ê
Š
Ï
"
¿Â
þ
é
u
ë
•
:
˜
x
k
´
‚
5
Â
ñ
"
3
U
Y
ƒ
c
§
·
‚
0
ê
Œ
Å
Ø
ª
½
Â
(
k
'
•[
&
E
§
ž
ë
[35])
"
½
Â
3.1.
(
ê
‰
Å
Ø
ª
)
e
X
•
˜
‡
š
K
‘
Å
C
þ
§
@
o
é
u
?
¿
¢ê
a>
0
§
·
‚
k
P
(
X
≥
a
)
≤
E
[
X
]
a
.
(66)
½
n
3.3.
b
†
½
n
3.2
Ú
(57)
¥
ƒ
Ó
b
¤
á
"
,
·
‚
k
f
(˜
x
k
)
−
f
(
x
∗
)
3
k
→∞
ž
•
DOI:10.12677/aam.2022.1174524260
A^
ê
Æ
?
Ð
4
Z
§
Å
û
V
Ç
Â
ñ
0
§
=
é
u
?
Û
ε
≥
0
§
·
‚
k
lim
k
→∞
P
(
f
(˜
x
k
)
−
f
(
x
∗
)
≥
) = 0
.
(67)
y
²
µ
d
½
n
3.2
§
·
‚
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
≤
ρ
k
E
[
f
(˜
x
0
)
−
f
(
x
∗
)]
,
(68)
Ï
•
ρ<
1
§
…
E
[
f
(˜
x
0
)
−
f
(
x
∗
)]
≤
M
f
,
¤
±
k
→∞
ž
§
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
→
0.
qdu
f
(˜
x
k
)
−
f
(
x
∗
)
´
˜
‡
š
K
‘
Å
C
þ
§
A^
ê
‰
Å
Ø
ª
(66),
·
‚
k
P
(
f
(˜
x
k
)
−
f
(
x
∗
)
≥
)
≤
E
[
f
(˜
x
k
)
−
f
(
x
∗
)]
→
0
.
(69)
½
n
3.3
L
²
·
‚
8
I
¼
ê
Š
3
•
V
Ç
Â
ñ
"
4.
ê
Š
¢
3
!
¥
§
·
‚
)û
A
‡
6
1
k
i
Ò
Å
ì
Æ
S
?
Ö
§
•
)
*
£
8
!
Ü
6
£
8Ú
|
±
•
þ
Å
¯
K
§
ò
Ž
{
STCGVR
†
SVRG
?
1
'
"
w
,
§
c
ü
«
´
1
w
…r
à
`
z
.
§
1
n
«
´
1
w
š
à
`
z
.
"
©
9
Ž
{
§
S
Ñ
3
MatlabR2016a
?
n
ì
þ
$
1
"
Ï
•
STCGVR
Ú
SVRG
I
‡
O
Ž
F
Ý
§
¤
±
z
‡
epoch
Ñ
I
‡
¤
k
ê
â
"
•
ü
$
ù
¤
§
·
‚
ã
L
w
«
¼
ê
›
”
Š
ƒ
é
u
Ï
L
ê
â
g
ê
'
X
"
•
Ò
´
`
§
î
¶
L
«ê
â
k
Ï
L
g
ê
§
p
¶
L
«
›
”
¼
ê
Š
"
epoch
•
Œ
Š
˜
•
20
"
3ù
ü
«
•{
¥
§
·
‚
˜
c
1 = 10
−
4
Ú
c
2
= 0
.
1
§
M
1
= 1
"
4.1.
*
£
8
¯
K
3
·
‚
1
˜
‡
¢
¥
§
·
‚
À
J
*
£
8
¯
K
5
y
·
‚
Ž
{
"
*
£
8
§
•
¡
•
Tikhonov
K
z
§
´
Å
ì
Æ
S
.
¥
Ÿ
þ
–
'
-
‡
Æ
S
.
"
*
£
8
8
I
´
•
z
¤
¼
ê
min
x
1
n
n
X
i
=1
(
b
i
−
a
T
i
x
)
2
+
λ
||
x
||
2
2
,
(70)
Ù
¥
a
i
∈
R
n
Ú
b
i
∈{−
1
,
1
}
©
O
´
1
i
‡
~
f
A
•
þ
Ú
8
I
Š
§
λ>
0
´
K
z
ë
ê
"
3
¢
¥
§
·
‚
STCGVR
Ú
SVRG
3
¦
)
¯
K
(70)
ž
ê
Š
(
J
"
·
‚
ò
ü
«
•{
Ð
©
:
˜
•
x
1
=5¯
x
1
§
Ù
¥
¯
x
1
´
õ
‘
I
O
•
þ
§
Ù
ä
k
Œ
10%
š
"
ƒ
"
·
‚
±
e
•
ª)
¤
Ô
ö
8
Ú
ÿ
Á
8
(
a,b
)
"
·
‚
Ä
k
)
¤
˜
‡
‘
Å
•
þ
a,
§
l
[
−
0
.
5
,
0
.
5]
n
þ
þ
!
©
Ù
¥
Ä
§
,
•
˜
l
þ
!
©
Ù
¥
Ä
a
∈
R
n
˜
I
\
b
∈{−
1
,
1
}
§
b
=
sign
(
h
˜
x,a
i
)
3
[
−
1
,
1]
þ
"
d
§
·
‚
˜
K
z
ë
ê
λ
= 10
−
4
§
¿
À
J
‘
Å
F
Ý
Ú
ê
m
=
n/
5
"
ã
1
'
SVRG
Ú
STCGVR
S
“
Ç
§
Ù
¥
î
‹
IL
«
Û
k
S
“g
ê
§
p‹
IL
«
›
”
¼
ê
Š
"
S
“
Ç
‡
N
8
I
¼
ê
›
”
Š
‘
S
“g
ê
O
\
¥
y
Ñ
C
z
ª
³
"
l
ã
1
Œ
±
DOI:10.12677/aam.2022.1174524261
A^
ê
Æ
?
Ð
4
Z
§
Å
û
w
Ñ
§
é
u
Ž
{
SVRG
§
·
‚
˜
Ú
•
α
=0
.
005
¶
é
u
STCGVR
§
·
‚
¦
^
Wolfe
‚
|¢
5
À
J
Ü
·
Ú
•
"
†
SVRG
ƒ
'
§
STCGVR
Ž
{
•
I
‡
Œ
4
g
Ì
‚
Ò
Œ
±
¯
„
%
C
¼
ê
•
›
”
Š
10
−
3
§
=
Œ
400
g
S
“
Â
ñ
"
(
J
L
²
§
STCGVR
Ž
{
3
•
~
Ä
:
þ
V
\
n
‘
Ý
F
Ý
§
Ï
d
U
3
4
S
“g
ê
S
%
C
•
`
)
"
Figure1.
TraininglossofSVRGandSTCGVRfortheridgeregression
problem
ã
1.
SVRG
Ú
STCGVR
é
u
*
£
8
¯
K
Ô
ö
›
”
4.2.
Ü
6
£
8
¯
K
3
1
‡
¢
¥
§
·
‚
•
Ä
`
2
Ü
6
£
8
(LR)
¯
K
µ
min
x
1
n
n
X
i
=1
ln
(1+
exp
(
−
b
i
a
T
i
x
))+
λ
||
x
||
2
,
(71)
Ù
¥
λ>
0
´
K
z
ë
ê
§
a
∈
R
n
L
«
A
•
þ
§
b
∈{−
1
,
1
}
´
•
ƒ
A
I
\
"
·
‚
-
λ
=10
−
4
§
m
=
n/
5
§
½
Ú
•
α
=0
.
015
§
ò
ü
«
•{
Ð
©
:
˜
•
x
1
=5¯
x
1
§
Ù
¥
¯
x
1
´
l
þ
!
©
Ù
[0
,
1]
n
‘
Å
Ä
"
·
‚
±
e
•
ª)
¤
Ô
ö
8
Ú
ÿ
Á
8
(
a,b
)
"
·
‚
Ä
k
)
¤
˜
‡
‘
Å
•
þ
a
§
Ù
¥
5%
š
"
ƒ
l
[0
,
1]
n
þ
þ
!
©
Ù
¥
Ä
§
,
•
˜
l
þ
!
©
Ù
¥
Ä
a
∈
R
n
˜
I
\
b
∈{−
1
,
1
}
§
b
=
sign
(
h
˜
x,a
i
)
3
[
−
1
,
1]
þ
"
ã
2
'
SVRG
Ú
STCGVR
S
“
Ç
§
Ù
¥
î
‹
IL
«
Û
k
S
“g
ê
§
p‹
IL
«
›
”
¼
ê
Š
"
S
“
Ç
‡
N
8
I
¼
ê
›
”
Š
‘
S
“g
ê
O
\
¥
y
Ñ
C
z
ª
³
"
‘
X
k
S
“g
ê
O
\
§
·
‚
Œ
±
*
SVRG
›
”
¼
ê
Š
3
Œ
9
g
F
Ý
O
Ž
Å
ì~
¿
ª
u
-
½
"
du
m
=
n/
5
§
•
Ò
´
`
§
§
3
Œ
900
g
S
“
Â
ñ
§
¿
…
Â
ñ
•
`
Š
:
„
Ý
é
ú
"
,
§
STCGVR
•
I
‡
300
g
S
“
=
Œ
ˆ
ƒ
Ó
°
Ý
"
(
J
L
²
§
3
)û
Ü
6
£
8
¯
K
ž
§
3
DOI:10.12677/aam.2022.1174524262
A^
ê
Æ
?
Ð
4
Z
§
Å
û
z
g
S
Ì
‚
S
“
m
©ž
§
n
‘
Ý
F
Ý
•
•
±
•
Í
e
ü
•
•
-
#
m
©
S
“
§
k
/
J
p
Â
ñ
„
Ý
"
Figure2.
TraininglossofSVRGandSTCGVRforlogisticregression
problem
ã
2.
SVRG
Ú
STCGVR
é
u
Ü
6
£
8
¯
K
Ô
ö
›
”
4.3.
š
à
|
±
•
þ
Å
¯
K
3
·
‚
•
˜
‡
¢
¥
§
·
‚
•
Ä
˜
‡
š
à
|
±
•
þ
Å
(SVM)
¯
K
"
‰
½
˜
‡
ä
k
®
•
a
:
Ô
ö
8
§
SVM
8
I
´
é
˜
‡
U
•
Ð
/
©
l
Ô
ö
8
‡
²
¡
"
·
‚
-
S
=
{
(
a
i
,b
i
)
}
n
i
=1
´
˜
‡
•
¹
n
é
/
ª
Ô
ö
8
(
a
i
,b
i
)
§
Ù
¥
a
i
∈
R
n
´
A
•
þ
§
b
i
∈{−
1
,
1
}
´
é
A
I
\
"
8
I
´
é
˜
‡
d
•
þ
x
∈
R
n
|
±
‡
²
¡
§
T
•
þ
ò
Ô
ö
8
©
m
§
¦
é
u
b
i
= 1
¤
k
:
x
T
a
i
>
0
§
é
u
b
i
=
−
1
¤
k
:
x
T
a
i
<
0
"
X
J
ê
â
Ø
´
Œ
©
l
§
K
T
•
þ
Œ
U
Ø
•
3
§
½
ö
§
X
J
ê
â
´
Œ
©
l
§
K
Œ
U
k
õ
‡
©
l
•
þ
"
·
‚
Ï
L
¦
^
sigmoid
›
”
¼
ê
)û
±
e
š
à
|
±
•
þ
Å
(SVM)
¯
K
5
'
SVRG
Ú
STCGVR
Â
ñ
5
U
§
ù
®
3
[35]
¥
?
1
L
•
Ä
:
min
x
∈
R
n
f
(
x
) =
E
a,b
[1
−
tanh
(
b
h
x,a
i
)]+
λ
||
x
||
2
2
,
(72)
Ù
¥
λ>
0
´
˜
‡
K
ë
ê
"
3
·
‚
¢
¥
§
λ
˜
•
10
−
4
"
3
¢
S
œ
¹
e
§
ª
(72)
¤
min
x
∈
R
n
1
n
n
X
i
=1
f
i
(
x
)+
λ
||
x
||
2
,
(73)
Ù
¥
f
i
(
x
) = 1
−
tanh
(
b
i
h
x,a
i
i
)
,i
= 1
,...,n
.
DOI:10.12677/aam.2022.1174524263
A^
ê
Æ
?
Ð
4
Z
§
Å
û
·
‚
Ï
L
¯
K
(4.73)
3
Ü
¤
ê
â
þ
L
y
SVRG
Ú
STCGVR
ê
Š
(
J
"
·
‚
ò
ü
«
•{
Ð
©
:
˜
•
x
1
= 5¯
x
1
§
Ù
¥
¯
x
1
´
l
þ
!
©
Ù
[0
,
1]
n
þ
‘
Å
Ä
"
·
‚
±
e
•
ª)
¤
Ô
ö
8
Ú
ÿ
Á
8
(
a,b
)
"
·
‚
Ä
k
)
¤
˜
‡
ä
k
80%
š
"
©
þ
D
Õ
•
þ
a
§
Ù
Ñ
l
[0
,
1]
n
þ
þ
!
©
Ù
§
,
-
b
=
sign
(
h
¯
x,a
i
)
"
ã
3
'
SVRG
Ú
STCGVR
S
“
Ç
§
Ù
¥
î
‹
IL
«
Û
k
S
“g
ê
§
p‹
IL
«
›
”
¼
ê
Š
"
S
“
Ç
‡
N
8
I
¼
ê
›
”
Š
‘
S
“g
ê
O
\
¥
y
Ñ
C
z
ª
³
"
3
Ø
äN
Ú
•
§
·
‚
•
ª
À
J
α
= 5
×
10
−
3
"
d
ž
§
SVRG
3
¦
)
š
à
SVM
¯
K
ž
Ð
Â
ñ
5
"
l
ã
3
Œ
±
w
Ñ
§
·
‚
•{
•
I
‡
Œ
200
g
=
Œ
Â
ñ
"
,
§
SVRG
I
‡
Œ
1200
g
S
“
â
U
ˆ
ƒ
q
°
Ý
"
¢
(
J
L
²
§
·
‚
J
Ñ
•{
3
¦
)
š
à
|
±
•
þ
Å
¯
K
ž
ä
k
•
¯
Â
ñ
„
Ý
"
Figure3.
TraininglossofSVRGandSTCGVRfornonconvexSVM
ã
3.
SVRG
Ú
STCGVR
é
u
š
à
SVM
Ô
ö
›
”
5.
o
(
©
J
Ñ
˜
«
^u
)û
Ã
å
‘
Å
`
z
¯
K
STCGVR
Ž
{
"
ù
«
#
L
Ž
{
(
Ü
‘
Å
•
~
E
â
Ú
U
?
Dai-Liao
n
‘
Ý
F
Ý
§
±
¼
•
Ð
Â
ñ
5
"
d
§
3
z
g
S
Ì
‚
S
“
m
©ž
Ñ
•
Ä
-
#
é
Ä
E
â
§
ù
ò
½
Ï
-
é
Ž
{
¿Þ
Ø
é
Ž
{
Œ
U
Ø
|
Î
&
E
"
3
·
^
‡
e
§
k
y
STCGVR
‚
5
Â
ñ
5
"
·
‚
Ï
L
¦
)
A
‡
Å
ì
Æ
S
¯
K
§
ù
¯
K
Œ
U
´
à
§
•
Œ
U
´
š
à
§
-
<
Í
ê
Š
(
J
"
3
™
5
ï
Ä
¥
§
‘
X
Ž
{
ê
Š
F
Ý
¢
y
§
STCGVR
Œ
±
é
N
´
/
A^u
Ø
Ó
¢
S
¯
K
§
~
X
$
•
Ý
¡
E
Ú
D
Õ
i
;
Æ
S
¯
K
"
DOI:10.12677/aam.2022.1174524264
A^
ê
Æ
?
Ð
4
Z
§
Å
û
Ä
7
‘
8
I
[
g
,
‰
Æ
Ä
7
“
c
Ä
7
‘
8
(11601252)
"
ë
•
©
z
[1]Kawaguchi, K.and Lu, H.H.(2020) OrderedSGD: ANew StochasticOptimization Framework
forEmpirical RiskMinimization.
Proceedingsof the23rdInternational Conferenceon Artificial
IntelligenceandStatistics(AISTATS)
,
108
.
[2]Shalev-Shwartz, S.and Ben-David, S.(2014) References.In:
UnderstandingMachine Learning
,
CambridgeUniversityPress,Cambridge,385-394.
https://doi.org/10.1017/CBO9781107298019.036
[3]Taheri,H.,Pedarsani,R.andThrampoulidis,C.(2021)FundamentalLimitsofRidge-
RegularizedEmpiricalRiskMinimizationinHighDimensions.
Proceedingsofthe24thIn-
ternationalConferenceonArtificialIntelligenceandStatistics
,
130
.
[4]Shalev-Shwartz,S. and Srebro, N. (2008) SVM Optimization:Inverse Dependence on Training
SetSize.
Proceedingsofthe25thInternationalConferenceonMachineLearning
,Helsinki,5-9
June2008.https://doi.org/10.1145/1390156.1390273
[5]Bottou,L.(2010)Large-ScaleMachineLearningwithStochasticGradientDescent.In:
Lechevallier,Y. andSaporta,G.,Eds.,
ProceedingsofCOMPSTAT’2010
,Physica-VerlagHD,
177-186.https://doi.org/10.1007/978-3-7908-2604-316
[6]Bottou, L., Curtis, F.E.and Nocedal, J.(2018)OptimizationMethodsfor Large-ScaleMachine
Learning.
SIAMReview
,
60
,223-311.https://doi.org/10.1137/16M1080173
[7]Mokhtari,A.and Ribeiro,A.(2013)ADual StochasticDFPAlgorithm forOptimal Resource
AllocationinWirelessSystems.
2013IEEE14thWorkshoponSignalProcessingAdvancesin
WirelessCommunications(SPAWC)
,Darmstadt,16-19June2013,21-25.
https://doi.org/10.1109/SPAWC.2013.6612004
[8]Couillard,O.(2020)Fastand FlexibleOptimizationofPowerAllocationinWirelessCommu-
nicationSystemsUsingNeuralNetworks.McGillUniversity,Montreal,Canada.
[9]Robbins,H. andMonro, S.(1951)A Stochastic Approximation Method.
TheAnnalsofMath-
ematicalStatistics
,
22
,400-407.https://doi.org/10.1214/aoms/1177729586
[10]LeRoux,N.,Schmidt,M.andBach,F.(2012)AStochasticGradientMethodwithanExpo-
nentialConvergenceRateforFiniteTrainingSets.arXivpreprintarXiv:1202.6258
[11]Schmidt,M.,LeRoux,N.andBach,F.(2017)MinimizingfiniteSumswiththeStochastic
AverageGradient.
MathematicalProgramming
,
162
,83-112.
https://doi.org/10.1007/s10107-016-1030-6
DOI:10.12677/aam.2022.1174524265
A^
ê
Æ
?
Ð
4
Z
§
Å
û
[12]Defazio,A.,Bach,F.andLacoste-Julien,S.(2014)SAGA:AFastIncrementalGradient
Method with Support forNon-Strongly Convex Composite Objectives. In:
AdvancesinNeural
InformationProcessingSystems27(NIPS2014)
.
[13]Kulunchakov,A.(2020)StochasticOptimizationforLarge-ScaleMachineLearning:Variance
ReductionandAcceleration.GrenobleAlpesUniversity,France.
[14]Wang,C.,
et al.
(2013)Variance ReductionforStochastic GradientOptimization.In:
Advances
inNeuralInformationProcessingSystems26(NIPS2013)
.
[15]Shen,Z.,
etal.
(2016)AdaptiveVarianceReducingforStochasticGradientDescent.
Proceed-
ingsoftheTwenty-FifthInternationalJointConferenceonArtificialIntelligence
,NewYork,
July2016,1990-1996.
[16]Koneˇcn´y, J.andRicht´arik,P.(2017) Semi-StochasticGradientDescent Methods.
Frontiersin
AppliedMathematicsandStatistics
,
3
,Article9.https://doi.org/10.3389/fams.2017.00009
[17]Shang,F.,
etal.
(2021)EfficientAsynchronousSemi-StochasticBlockCoordinateDescent
MethodsforLarge-ScaleSVD.
IEEEAccess
,
9
,126159-126171.
https://doi.org/10.1109/ACCESS.2021.3094282
[18]Duchi, J., Hazan, E. and Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning
andStochasticOptimization.
JournalofMachineLearningResearch
,
12
,2121-2159.
[19]Tieleman,T.andHinton,G.(2012)Lecture6.5-rmsprop:DividetheGradientbyaRunning
AverageofItsRecentMagnitude.
COURSERA:NeuralNetworksforMachineLearning
,
4
,
26-31.
[20]Kingma,D.P.and Ba,J. (2014) Adam:A Method for Stochastic Optimization. arXiv preprint
arXiv:1412.6980
[21]Mokhtari,A.andRibeiro,A.(2013)RegularizedStochasticBFGSAlgorithm.
2013IEEE
GlobalConferenceonSignalandInformationProcessing
,Austin,TX,3-5December2013,
1109-1112.https://doi.org/10.1109/GlobalSIP.2013.6737088
[22]Byrd,R.H.,
etal.
(2016)AStochasticQuasi-NewtonMethodforLarge-ScaleOptimization.
SIAMJournalonOptimization
,
26
,1008-1031.https://doi.org/10.1137/140954362
[23]Liu,D.C.andNocedal,J.(1989)OntheLimitedMemoryBFGSMethodforLargeScale
Optimization.
MathematicalProgramming
,
45
,503-528.https://doi.org/10.1007/BF01589116
[24]Moritz,P.,Nishihara,R.andJordan,M.(2016)ALinearly-ConvergentStochasticL-BFGS
Algorithm.
Proceedingsofthe19thInternationalConferenceonArtificialIntelligenceandS-
tatistics(AISTATS)
,
41
.
[25]Gower,R.,Goldfarb,D.andRicht´arik,P.(2016)StochasticBlockBFGS:SqueezingMore
Curvature Out of Data.
InternationalConferenceonMachineLearning
,New York,June 2016.
[26]Fletcher,R.andReeves,C.M.(1964)FunctionMinimizationbyConjugateGradients.
The
ComputerJournal
,
7
,149-154.https://doi.org/10.1093/comjnl/7.2.149
DOI:10.12677/aam.2022.1174524266
A^
ê
Æ
?
Ð
4
Z
§
Å
û
[27]Andrei,N.(2013)OnThree-TermConjugateGradientAlgorithmsforUnconstrainedOpti-
mization.
AppliedMathematicsandComputation
,
219
,6316-6327.
https://doi.org/10.1016/j.amc.2012.11.097
[28]Yao,S.W.,
etal.
(2020)AClassofGloballyConvergentThree-TermDai-LiaoConjugate
GradientMethods.
AppliedNumericalMathematics
,
151
,354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[29]Dai, Y.-H. andLiao, L.-Z.(2001) NewConjugacy Conditionsand RelatedNonlinear Conjugate
GradientMethods.
AppliedMathematicsandOptimization
,
43
,87-101.
https://doi.org/10.1007/s002450010019
[30]Babaie-Kafaki,S. andGhanbari,R. (2014)A Descent FamilyofDai-Liao ConjugateGradient
Methods.
OptimizationMethodsandSoftware
,
29
,583-591.
https://doi.org/10.1080/10556788.2013.833199
[31]Andrei,N.(2015)ANewThree-TermConjugateGradientAlgorithmforUnconstrainedOp-
timization.
NumericalAlgorithms
,
68
,305-321.https://doi.org/10.1007/s11075-014-9845-9
[32]Yao,S.W.,
etal.
(2020)AClassofGloballyConvergentThree-TermDai-LiaoConjugate
GradientMethods.
AppliedNumericalMathematics
,
151
,354-366.
https://doi.org/10.1016/j.apnum.2019.12.026
[33]Powell,M.J.D.(1977)RestartProceduresfortheConjugateGradientMethod.
Mathematical
Programming
,
12
,241-254.https://doi.org/10.1007/BF01593790
[34]Jiang,X.z.,
etal.
(2021)AnImprovedPolak-Ribi`ere-Polyak Conjugate Gradient Method with
anEfficientRestartDirection.
ComputationalandAppliedMathematics
,
40
,ArticleNo.174.
https://doi.org/10.1007/s40314-021-01557-9
[35]Zoutendijk,G.(1966)NonlinearProgramming:ANumericalSurvey.
SIAMJournalonCon-
trol
,
4
,194-210.https://doi.org/10.1137/0304019
DOI:10.12677/aam.2022.1174524267
A^
ê
Æ
?
Ð