收藏 分享(赏)

深度学习在游戏中的应用.pdf

上传人:精品资料 文档编号:8438941 上传时间:2019-06-27 格式:PDF 页数:9 大小:1.02MB
下载 相关 举报
深度学习在游戏中的应用.pdf_第1页
第1页 / 共9页
深度学习在游戏中的应用.pdf_第2页
第2页 / 共9页
深度学习在游戏中的应用.pdf_第3页
第3页 / 共9页
深度学习在游戏中的应用.pdf_第4页
第4页 / 共9页
深度学习在游戏中的应用.pdf_第5页
第5页 / 共9页
点击查看更多>>
资源描述

1、42 5 1Vol.42, No.52016 M5ACTA AUTOMATICA SINICA May, 2016$R L1 2 K1;2K18 M ?Z / # $(p). YV * y V dL ,M1.d An .K, $ % y10,i4 a SZE10f (Activationfunction)11. “ E9 ? l( m) (Graphics processing unit,GPU) ?9 (High-performance comput-ing, HPC) P), $.%B51 d1 X:b(Experience replay)# “Ss (TargetQ-separation

2、).b X ,B H, - V 91$ I n49.b| 5 - T(s; a; r; s0)i%B ,D. QQf H$ P Kl “Sf L( ) = E(s;a;r;s0)U(D)(r+maxbQ(s0; b; )Q(s; a; )2U(D)V U D B ( s. ( sbu M1, P s(i.i.d) .680142 b9 P B E .L N, WS X n $.“ -4d ,S=K Chinook n5V E 4 |yB1 7 o,YValpha-beta E#= f |KT68.K vV $_ j $,1 $W(Minecraft)“T k ? ,7DeepMind5 2 /

3、BG =(StarCraft).M1 $,t H $ , ,T4 bW9v.79pB ?1s. 5 $ ? - 682142 ?.1 9 K*?5 Ea ? T Ea + 5 Ea E,K f.| , A E X$ $ . Z_V$v _, $, j $.$ s %5,1 X M5a r E Ma 1 Ma mM$ . 9 A$ E$(1 D ), Y 3.References1 Werbos P. Beyond Regression: New Tools for PredictionandAnalysisintheBehavioralSciencesPh.D.dissertation,Har

4、vard University, USA, 1974.2 Parker D B. Learning Logic, Technical Report TR-47, MITPress, Cambridge, 1985.3 LeCun Y. Une procedure d0apprentissage pour Reseaua seuil assymetrique (a learning scheme for asymmetricthreshold networks). In: Proceddings of the Cognitiva 85.Paris, France. 599604 (in Fren

5、ch)4 Rumelhart D E, Hinton G E, Williams R J. Learningrepresentations by back-propagating errors. Nature, 1986,323(6088): 5335365 Bengio Y. Learning Deep Architectures for AI. Hanover,MA: Now Publishers Inc, 2009.6 Hinton G E, Osindero S, Teh Y W. A fast learning algo-rithm for deep belief nets. Neu

6、ral Computation, 2006, 18(7):152715547 Ranzato M, Poultney C, Chopra S, LeCun Y. Ecient learn-ing of sparse representations with an energy-based model.In: Proceedings of the 2007 Advances in Neural InformationProcessing Systems. Cambridge, MA: MIT Press, 2007.8 Bengio Y, Lamblin P, Popovici D, Laroc

7、helle H. Greedylayer-wise training of deep networks. In: Proceedings of the2007 Advances in Neural Information Processing Systems.Cambridge, MA: MIT Press, 2007.9 Erhan D, Manzagol P A, Bengio Y, Bengio S, Vincent P.The diculty of training deep architectures and the efiect ofunsupervised pre-trainin

8、g. In: Proceedings of the 12th Inter-national Conference on Artiflcial Intelligence and Statistics.Clearwater, Florida, USA: AISTATS, 2009. 15316010 Glorot X, Bengio Y. Understanding the diculty of train-ing deep feedforward neural networks. In: Proceedings ofthe 13th International Conference on Art

9、iflcial Intelligenceand Statistics. Sardinia, Italy: ICAIS, 2010.11 Glorot X, Bordes A, Bengio Y. Deep sparse rectifler neuralnetworks. In: Proceedings of the 14th International Con-ference on Artiflcial Intelligence and Statistics. Fort Laud-erdale, United States: ICAIS, 2011.12 Simonyan K, Zisserm

10、an A. Very deep convolutional net-works for large-scale image recognition. In: Proceedings ofthe 2014 International Conference on Learning Representa-tions. Rimrock Resort Hotel, Banfi, Canada: ICRR, 2014.13 Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. Overfeat: Integrated recognitio

11、n, localization anddetection using convolutional networks. In: Proceedings ofthe 2013 International Conference on Learning Representa-tions. Scottsdale, Arizona: ICLR, 2013.14 SzegedyC,ToshevA,ErhanD.Deepneuralnetworksforob-ject detection. In: Proceedings of the 2013 Advances in Neu-ral Information

12、Processing Systems. Lake Tahoe, Nevada:NIPS, 2013.15 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R,Li F F. Large-scale video classiflcation with convolutionalneural networks. In: Proceedings of the 2014 IEEE Confer-ence on Computer Vision and Pattern Recognition. Colum-bus, OH, USA: IEEE,

13、2014.16 Farabet C, Couprie C, Najman L, LeCun Y. Learning hi-erarchical features for scene labeling. IEEE Transactionson Pattern Analysis and Machine Intelligence, 2013, 35(8):1915192917 Khan S H, Bennamoun M, Sohel F, Togneri R. Automaticfeature learning for robust shadow detection. In: Proceed-ing

14、s of the 2014 IEEE Conference on Computer Vision andPattern Recognition (CVPR). Columbus, OH, USA: IEEE,2014.18 Amodei D, Anubhai R, Battenberg E, Case C, Casper J,Catanzaro B, Chen J D, Chrzanowski M, Coates A, DiamosG, Elsen E, Engel J, Fan L X, Fougner C, Han T, HannunA, Jun B, LeGresley P, Lin L

15、, Narang S, Ng A, Ozair S,Prenger R, Raiman J, Satheesh S, Seetapun D, Sengupta S,Wang Y, Wang Z Q, Wang C, Xiao B, Yogatama D, ZhanJ, Zhu Z Y. Deep speech 2: End-to-end speech recognitionin English and Mandarin. preprint arXiv:1512.02595, 2015.19 Fernandez R, Rendel A, Ramabhadran B, Hoory R.Prosod

16、y contour prediction with long short-term memory,bi-directional, deep recurrent neural networks. In: Proceed-ings of the 15th Annual Conference of International SpeechCommunication Association. Singapore: Curran Associates,Inc., 2014.20 Fan Y C, Qian Q, Xie F L, Soong F K. TTS synthesis withbidirect

17、ional LSTM based recurrent neural networks. In:Proceedings of the 15th Annual Conference of InternationalSpeech Communication Association. Singapore: Curran As-sociates, Inc., 2014.21 Sak H, Vinyals O, Heigold G, Senior A, McDermott E,Monga R, Mao M. Sequence discriminative distributedtraining of lo

18、ng short-term memory recurrent neural net-works. In: Proceedings of the 15th Annual Conference ofthe International Speech Communication Association. Sin-gapore: Curran Associates, Inc., 2014.22 Socher R, Bauer J, Manning C D, Ng A Y. Parsing withcompositional vector grammars. In: Proceedings of the

19、51stAnnual Meeting of the Association for Computational Lin-guistics. Sofla, Bulgaria: ACL, 2013.23 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learn-ing with neural networks. In: Proceedings of the 2014 Ad-vances in Neural Information Processing Systems. Montreal,Canada: MIT Press, 2014.24

20、 Gao J F, He X D, Yih W T, Deng L. Learning continuousphrase representations for translation modeling. In: Pro-ceedings of the 52nd Annual Meeting of the Association forComputational Linguistics. Baltimore: ACL, 2014.25 Gao J F, Deng L, Gamon M, He X D, Pantel P. Model-ing Interestingness with Deep

21、Neural Networks, US Patent20150363688, December 17, 2015.5 R L: $68326 Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D,Ng A Y, Potts C. Recursive deep models for semantic com-positionality over a sentiment treebank. In: Proceedings ofthe 2013 Conference on Empirical Methods in Natural Lan-guage

22、 Processing (EMNLP). Seattle, Washington: EMNLP,2013.27 Shen Y L, He X D, Gao J F, Deng L, Mesnil G. A latentsemantic model with convolutional-pooling structure for in-formation retrieval. In: Proceedings of the 23rd ACM Inter-national Conference on Information and Knowledge Man-agement. New York, N

23、Y, USA: ACM, 2014.28 Huang P S, He X D, Gao J F, Deng L, Acero A, Heck L.Learning deep structured semantic models for web searchusing clickthrough data. In: Proceedings of the 22nd ACMInternationalConference on Information & Knowledge Man-agement. New York, NY, USA: ACM, 2013.29 Bengio Y, Courville

24、A, Vincent P. Representation learning:a review and new perspectives. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 2013. 17981828,DOI: 10.1109/TPAMI.2013.5030 Schmidhuber J. Deep learning in neural networks: anoverview. Neural Networks, 2015, 61: 8511731 LeCun Y, Bengio Y, Hinton

25、G. Deep learning. Nature, 2015,521(7553): 43644432 Lee H, Grosse R, Ranganath R, Ng A Y. Convolutional deepbelief networks for scalable unsupervised learning of hierar-chical representations. In: Proceedings of the 26th AnnualInternational Conference on Machine Learning. New York,NY, USA: ACM, 2009.

26、33 Yao A C C. Separating the polynomial-time hierarchy byoracles. In: Proceedings of the 26th Annual Symposiumon Foundations of Computer Science. Portland, OR, USA:IEEE, 1985. 11034 Hastad J. Almost optimal lower bounds for small depth cir-cuits. In: Proceedings of the 18th Annual ACM Symposiumon Th

27、eory of Computing. New York, NY, USA: ACM, 1986.35 Braverman M. Poly-logarithmic independence foolsbounded-depth Boolean circuits. Communications of theACM, 2011, 54(4): 10811536 Bengio Y, Delalleau O. On the expressive power of deeparchitectures. Algorithmic Learning Theory. Berlin Heidel-berg: Spr

28、inger, 2011. 183637 Le Cun Y, Boser B, Denker J S, Henderson D, Howard RE, Hubbard W, Jackel L D. Handwritten digit recognitionwith a back-propagation network. In: Proceedings of the1990 Advances in Neural Information Processing Systems.San Francisco: Morgan Kaufmann, 1990.38 Bengio Y, LeCun Y, DeCo

29、ste D, Weston J. Scaling learningalgorithms towards AI. Large-Scale Kernel Machines. Cam-bridge: MIT Press, 2007.39 Sutton R S, Barto A G. Reinforcement Learning: An Intro-duction. Cambridge: MIT Press, 1998.40 Kaelbling L P, Littman M L, Moore A W. Reinforcementlearning: A survey. Journal of Artifl

30、cial Intelligence Re-search, 1996, 4: 23728541 Hausknecht M, Stone P. Deep recurrent q-learning for par-tially observable MDPS. In: Proceedings of the 2015 AAAIFall Symposium Series. The Westin Arlington Gateway, Ar-lington, Virginia: AIAA, 2015.42 Bakker B, Zhumatiy V, Gruener G, Schmidhuber J. A r

31、obotthat reinforcement-learns to identify and memorize impor-tant previous observations. In: Proceedings of the 2013IEEE/RSJ International Conference on Intelligent Robotsand Systems. Manno-Lugano, Switzerland: IEEE, 200343 Wierstra D, Forster A, Peters J, Schmidhuber J. Recur-rent policy gradients.

32、 Logic Journal of IGPL, 2010, 18(5):62063444 Bellemare M, Naddaf Y, Veness J, Bowling M. The arcadelearning environment: an evaluation platform for generalagents. Journal of Artiflcial Intelligence Research, 2013, 47:25327945 Watkins C J H, Dayan P. Technical note: Q-learning. Ma-chine Learning, 199

33、2, 8(34): 27929246 Bellemare M G, Veness J, Bowling M. Investigating con-tingency awareness using Atari 2600 games. In: Proceed-ings of the 26th AAAI Conference on Artiflcial Intelligence.Toronto, Ontario: AIAA, 2012.47 Bellemare M G, Veness J, Bowling M. Sketch-based linearvalue function approximat

34、ion. In: Proceedings of the 26thAdvances in Neural Information Processing Systems. LakeTahoe, Nevada, USA: NIPS, 2012.48 Tesauro G. TD-Gammon, a self-teaching backgammon pro-gram, achieves master-level play. Neural Computation,1994, 6(2): 21521949 Riedmiller M. Neural fltted Q iterationflrst experie

35、nceswith a data ecient neural reinforcement learning method.In: Proceedings of the 16th European Conference on Ma-chine Learning. Porto, Portugal: Springer, 2005.50 Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J,Bellemare M G, Graves A, Riedmiller M, Fidjeland A K,Ostrovski G, Petersen S, Beatt

36、ie C, Sadik A, AntonoglouI, King H, Kumaran D, Wierstra D, Legg S, Hassabis D.Human-level control through deep reinforcement learning.Nature, 2015, 518(7540): 52953351 Schaul T, Quan J, Antonoglou I, Silver D. Prioritized ex-perience replay. In: Proceedings of the 2016 InternationalConference on Lea

37、rning Representations. Caribe Hilton, SanJuan, Puerto Rico: ICLR, 2016.52 Ross S, Gordon G J, Bagnell J A. A reduction of imitationlearning and structured prediction to no-regret online learn-ing. In: Proceedings of the 14th International Conferenceon Artiflcial Intelligence and Statistics. Ft. Laud

38、erdale, FL,USA: AISTATS 2011.53 Guo X X, Singh S, Lee H, Lewis R, Wang X S. Deep learn-ingfor real-timeATARIgame playusingoine Monte-Carlotree search planning. In: Proceedings of the 2014 Advancesin Neural Information Processing Systems. Cambridge: TheMIT Press, 2014.54 Schulman J, Levine S, Moritz

39、P, Jordan M, Abbeel P. Trustregion policy optimization. In: Proceedings of the 32nd In-ternational Conference on Machine Learning. Lille, France:ICML, 2015.55 van Hasselt H, Guez A, Silver D. Deep reinforcement learn-ing with double Q-learning. In: Proceedings of the 30thAAAI Conference on Artiflcia

40、l Intelligence. Phoenix, Ari-zona USA: AIAA, 2016.56 Bellemare M G, Ostrovski G, Guez A, Thomas P S, MunosR. Increasing the action gap: new operators for reinforce-mentlearning.In: Proceedingsofthe30thAAAIConferenceon Artiflcial Intelligence. Phoenix, Arizona USA: AIAA,2016.684142 57 Wang Z Y, Schau

41、l T, Hessel M, van Hasselt H, LanctotM, de Freitas N. Dueling network architectures for deepreinforcement learning. In: Proceedings of the 32nd In-ternational Conference on Machine Learning. Lille, France:ICML, 2016.58 Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T P,Harley T, Silver D, Kavukcuog

42、lu K. Asynchronous methodsfor deep reinforcement learning. preprint arXiv:1602.01783,2016.59 Rusu A A, Colmenarejo S G, Gulcehre C, Desjardins G,Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Had-sell R. Policy distillation. In: Proceedings of the 2016 In-ternational Conference on Learning Represe

43、ntations. CaribeHilton, San Juan, Puerto Rico: ICLR, 2016.60 Parisotto E, Ba J L, Salakhutdinov R. Actor-mimic: Deepmultitask and transfer reinforcement learning. In: Proceed-ings of the 2016 International Conference on Learning Rep-resentations. Caribe Hilton, San Juan, Puerto Rico: ICLR,2016.61 Cl

44、ark C, Storkey A. Training deep convolutional neuralnetworks to play go. In: Proceedings of the 32nd Inter-national Conference on Machine Learning. Lille, France:ICML, 2015.62 Maddison C J, Huang A, Sutskever I, Silver D. Move eval-uation in Go using deep convolutional neural networks. In:Proceeding

45、s of the 2014 International Conference on Learn-ing Representations. Rimrock Resort Hotel, Banfi, Canada:ICRR, 2014.63 Tian Y D, Zhu Y. Better computer go player with neu-ral network and long-term prediction. In: Proceeding of the2016 International Conference on Learning Representations.Caribe Hilton, San Juan, Puerto Ri

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 管理学资料

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报