1、Global Optimal Strategies of a Class of Finite-horizon Continuous-time Nonaffine Nonlinear Zero-sum Game Using a New Iteration Algorithm一类全局优化策略水平有限非仿射非线性连续时间零和博弈使用一种新的迭代算法Xin Zhang School of Information Science and Engineering,Northeastern University,Shenyang, 110004 ChinaEmail:Huaguang Zhang Schoo
2、l of Information Science and Engineering,Northeastern University,Shenyang, 110004 China Email:hgzhangieee.orgLili Cui School of Information Science and Engineering,Northeastern University,Shenyang, 110004 China Email:Yanhong Luo School of Information Science and Engineering,Northeastern University,S
3、henyang, 110004 China Email: 张 星 信息与科学与工程学院,东北大学,中国沈阳,110004,邮箱:Jackie-张华光 信息与科学与工程学院,东北大学,中国沈阳,110004,邮箱:hgzhangieee.org桂莉莉 信息与科学与工程学院,东北大学,中国沈阳,110004,邮箱:罗艳红 信息与科学与工程学院,东北大学,中国沈阳,110004,邮箱:AbstractIn this paper we ami to solve the global optimal strategies of a class of finite-horizon continuous-t
4、ime nonaffine nonlinear zero-sum game. The idea is to use a iterative algorithm to obtain the saddle point. The iterative algorithm is between two sequences which are a sequence of linear quadratic zero-sum game and a sequence of Riccati differential equation. The necessary conditions of global opti
5、mal strategies are established. A simulation example is given to illustrate the perfoermace of the proposed approach.摘要:在本文中,我们解决一类连续时间有限视距非仿射非线性的零和对策的最佳方法。我们的想法是使用一个迭代算法获得马鞍点。迭代算法是两个序列之间,这是一个序列的线性二次型的零和博弈和 Riccati 微分方程的序列。建立全局最优战略的必要条件。一个仿真例子说明了该方法的性能。I. INTRODUCTION1、简介Nowadays, game theory has be
6、en widely applied in management, military battles, power networks and different types of contest, which is concerned with the study of decision making in situations where two or more rational opponents are involved under conditions of conflicting interests 3-9,10. The two-player zero-sum game with a
7、 general quadratic performance index function plays an important role in the gametheory. Two players work on the performance index function together and minimax it如今,博弈论被广泛地应用于管理,军事,电力网络和不同类型的情况,关注的是研究决策的情况下,理性的对象是涉及两个或两个以上的条件下的利益冲突3-9,10。这两个对象的的一般二次型性能指标函数的零和博弈,在博弈中起着重要的作用的理论。两个对象的工作性能指标函数和极大极小它。Th
8、e optimal strategies of linear zero-sum game and affine nonlinear zero-sum game have received a great deal of attention in the literature 1,2,7-10,19. In 1, Al-Tamimi et al. applied the heuristic dynamic programming and dual heuristicdynamic programming structures to solve a discrete-time linear qua
9、dratic zero-sum game problem in which the state and action spaces are continuous. Then, they designed the optimal strategies of the discrete-time linear quadratic zero-sum game without knowing the system dynamical matrices by the model-free Q-learning approach 2. A Class of continuoustime affine non
10、linear quadratic zero-sum game problem was researched by Wei et al. in 11. Abu-Khalaf et al. studied the affine nonlinear zero-sum game problem in 7 and used neural networks to solve it in 8.零和博弈的线性仿射的最优策略非线性零和博弈都获得了极大的关注在文献中1,2,7-10,19。 1中的Al-塔米米等。采用启发式动态规划和双启发动态编程结构来解决一个离散时间线性二次零和游戏的问题,其中状态和行动空间是连
11、续的。然后,他们设计的最优策略的离散时间线性二次型零和不知道系统动力学矩阵的游戏无模型Q-学习方法2。一类连续时间仿射非线性二次零和博弈问题Wei等人研究。在11。阿布 - 哈拉夫等。研究仿射非线性的零和博弈的问题7 和使用神经网络来解决8 。It is worthy of mentioning that most of the above discussions are focused on the linear or affine nonlinear zero-sum game problems. However, many applications of practical zero-s
12、um game have the nonlinear structure.这是值得提的是,最上面的讨论主要集中在线性或仿射非线性的零和博弈的问题。然而,许多应用程序的实际零和博弈的非线性结构这是在控制输入非仿射 和 。在19,我们提出了基于迭代算法之间的()utwt序列的线性二次型零和博弈,一个序列的Riccati 零和差分方程处理仿射非线性博弈的问题。这种方法是改变非仿射非线性成一个序列的线性二次型的随时间变化的零和博弈。 HJI方程转化为Riccati 方程的不变系数的一个序列。在本文中,建立全局的最优的仿射非线性零和博弈的是基于迭代算法。The paper is organized as
13、 follows. Section 2 introduces a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game that we want to solve in this paper, and the iterative algorithm presented in 19 is reviewed. In Section 3, the necessary conditions for global optimality of nonaffine nonlinear zero-sum game a
14、re established. In Section 4, a numerical example is given to demonstrate the convergence and effectiveness of the proposed.本文的结构如下。第2节介绍一类基于水平有限仿射非线性连续时间的零和博弈,我们在本文中要解决迭代算法19中提出的审查。在第3节中基于非仿射全局最优的必要条件建立非线性零和博弈。在第4节,用数值例子证明收敛性和有效性。II. PROBLEM STATEMENT AND PRELIMINARIES2、问题陈述和预备知识We consider a conti
15、nuous-time nonaffine nonlinear two player zero-sum game described by the state equation我们考虑一个连续时间的非仿射非线性两种对象零和博弈的状态方程描述with the finite-horizon performance index function given by与有限视距性能指标函数由下式给出which is minimized by () and maximized by (). The state () takes valus in , the control vector of Player 1
16、() takes values in convex and compact set 1 ,and the Player 2 () takes values in convex and compact set 2 . The state-dependent weighting matrices (),(), (), () are with suitable dimension and () 0,() 0, () 0, () 0. In this paper, (), () and() sometimes are described as , and for brevity.这里最小值由 决定,最大值由 决定。状态 的值包含于 ,对象ut()wtxtnA一的矢量控制量 包含于凸面积 ,对象二的矢量控制量 包含于凸()1mUA()wt面积 。2mUA