1、42 5 1Vol.42, No.52016 M5ACTA AUTOMATICA SINICA May, 2016$R L1 2 K1;2K18 M ?Z / # $(p). YV * y V dL ,M1.d An .K, $ % y10,i4 a SZE10f (Activationfunction)11. “ E9 ? l( m) (Graphics processing unit,GPU) ?9 (High-performance comput-ing, HPC) P), $.%B51 d1 X:b(Experience replay)# “Ss (TargetQ-separation

2、).b X ,B H, - V 91$ I n49.b| 5 - T(s; a; r; s0)i%B ,D. QQf H$ P Kl “Sf L( ) = E(s;a;r;s0)U(D)(r+maxbQ(s0; b; )Q(s; a; )2U(D)V U D B ( s. ( sbu M1, P s(i.i.d) .680142 b9 P B E .L N, WS X n $.“ -4d ,S=K Chinook n5V E 4 |yB1 7 o,YValpha-beta E#= f |KT68.K vV $_ j $,1 $W(Minecraft)“T k ? ,7DeepMind5 2 /

3、BG =(StarCraft).M1 $,t H $ , ,T4 bW9v.79pB ?1s. 5 $ ? - 682142 ?.1 9 K*?5 Ea ? T Ea + 5 Ea E,K f.| , A E X$ $ . Z_V$v _, $, j $.$ s %5,1 X M5a r E Ma 1 Ma mM$ . 9 A$ E$(1 D ), Y 3.References1 Werbos P. Beyond Regression: New Tools for PredictionandAnalysisintheBehavioralSciencesPh.D.dissertation,Har

