1、DOI 10.1007/s00170-004-2069-8ORIGINAL ARTICLEInt J Adv Manuf Technol (2005) 26: 11841192C.O. Kim J. Jun J.K. Baek R.L. Smith Y.D. KimAdaptive inventory control models for supply chain managementReceived: 20 October 2003 / Accepted: 11 November 2003 / Published online: 7 July 2004 Springer-Verlag Lon
2、don Limited 2004Abstract Uncertainties inherent in customer demands make itdifficult for supply chains to achieve just-in-time inventory re-plenishment, resulting in loosing sales opportunities or keepingexcessive chain-wide inventories. In this paper, we propose twoadaptive inventory-control models
3、 for a supply chain consistingof one supplier and multiple retailers. The one is a centralizedmodel and the other is a decentralized model. The objective ofthe two models is to satisfy a target service level predefined foreach retailer. The inventory-control parameters of the supplierand retailers a
4、re safety lead time and safety stocks, respectively.Unlike most extant inventory-control approaches, modelling theuncertainty of customer demand as a statistical distribution isnot a prerequisite in the two models. Instead, using a reinforce-ment learning technique called action-value method, the co
5、n-trol parameters are designed to adaptively change as customer-demand patterns changes. A simulation-based experiment wasC.O. Kim (a117)School of Computer and Industrial Engineering,Yonsei University,Seoul, 120-749, Republic of KoreaE-mail: kimcoyonsei.ac.krTel: (+822) 2123-2711Fax: (+822) 364-7807
6、J. JunResearch Institute for Information and Communication Technologies,Korea University,Seoul, 136-701, Republic of KoreaJ.K. BaekDepartment of Industrial System Management,Seoul College,Seoul, 131-702, Republic of KoreaR.L. SmithDepartment of Industrial and Operations Engineering,University of Mic
7、higan,Ann Arbor, Michigan 48109-2117, USAY.D. KimDepartment of Industrial Engineering,Korea Advanced Institute of Science and Technology,Daejon, 305-701,Republic of Koreaperformed to compare the performance of the two inventory-control models.Keywords Adaptive inventory control reinforcementlearning
8、 simulation supply chain1 IntroductionIn supply-chain management, the effort of minimizing total costsin terms of reduction in chain-wide inventory has been increas-ingly addressed and attempted in industry. During the last twodecades, however, achieving this objective has been more dif-ficult, as c
9、ustomer demands become more diverse and the lifecycles of products are shorter. In most cases, due to unpre-dictable customer needs and economic situations, customer de-mands fluctuate with time, showing nonstationary patterns. Un-certainties inherent in customer-demand patterns make it difficultto
10、satisfy customer demands in just-in-time (JIT) mode, resultingin loosing sales opportunities or keeping excessive chain-wideinventories.In modelling inventory-control problems, it is not practical toassume that customer demands during a period are known a pri-ori in the form of a constant or a stati
11、stical distribution. In thisrespect, adaptive inventory control in supply-chain managementshould be addressed. By adaptive, we mean that the control pa-rameters of inventory-control models are dynamically adjustedtoward satisfying a target service level with the consideration ofthe nonstationarity o
12、f customer demand. The target service levelmeans the percentage of customer demands that have to be satis-fied during the time interval between order placement time andinventory replenishment time. This time interval is commonlycalled lead time.In this paper, we deal with a two-echelon supply-chain
13、sys-tem consisting of one supplier and multiple retailers. The cus-tomer demand process is assumed to be nonstationary and un-known. By a nonstationary demand process, we mean that themean and variance of the demand distribution changes with time.It is assumed that the suppliers orders are always sa
14、tisfied after1185a constant lead time from a perfectly reliable single outsidesource. It is also assumed that, for each retailer, transportationlead time from the supplier to the retailer is given as a constant.However, the retailers actual lead times are not constants unlessthe supplier has enough
15、inventory to meet the retailers orders.Finally, if customer demands are not satisfied at sales points oftime, the demands are treated as lost sales.In this environment, we propose two adaptive inventory-control models for the supply chain: a centralized model anda decentralized model. In both models
16、, the supplier makes use ofon-line information about the retailers inventory status in decid-ing his order placement time. The goal is to make the average ofservice levels during lead times as close as possible to a prede-fined target service level. We assume that the order size of eachretailer is p
17、redetermined based on the capacity of the deliverysystem. Therefore, associated decisions are concerned with whenthe retailers inventories are replenished. The control parametersof the two models determine the inventory replenishment times.The centralized inventory-control model is similar to theven
18、dor-managed inventory model 1, in the sense that the re-tailers no longer control their inventory replenishment times.Instead, the supplier is responsible for maintaining appropriateinventory levels of the retailers. In more detail, at each dis-crete inspection time, the supplier collects data on ea
19、ch retailersinventory position (on-hand inventory plus ordered quantity intransition) and sales history. With the data, the supplier makesuse of a linear time-series model to predict the time point atwhich the inventory position of the retailer is anticipated to dropdown below zero at first. If the
20、time interval between the inspec-tion time and the predicted time is approximately close to totallead time (suppliers lead time + retailers transportation leadtime + safety lead time), then the supplier places an order forthe retailer. In this paper, we call this inventory-control rule aJIT delivery
21、 policy. As soon as the supplier receives the orderedquantity from the outside source, he sends the quantity directlyto the retailer without keeping it in his warehouse. As a conse-quence, the inventory level of the supplier becomes completelyzero. The safety lead time is a time buffer and is tradit
22、ionallyused for coping with demand uncertainty during lead time. Inthe centralized model, it is a control parameter to adjust the ser-vice level of the retailer in a nonstationary demand situation. Thesafety lead time exists for each retailer.In the decentralized inventory-control model, each retail
23、er isallowed to adaptively set safety stock by reflecting the nonsta-tionarity of demand. Just as for the safety lead time, the safetystock is an inventory buffer to cover demand uncertainty duringlead time. Once the safety stock is decided, the retailer forecastsdemand during its transportation lea
24、d time at each inspectiontime. If, at a certain inspection time, demand during the trans-portation lead time plus safety stock is very close to its inventoryposition observed at the inspection time, the retailer places anorder to the supplier.The supplier also makes use of the safety stock set by th
25、eretailer. The operational mechanism of the supplier in the decen-tralized model is almost the same as the JIT delivery policy in thecentralized model. That is, at each inspection time, the supplierpredicts the time point at which the inventory position of the re-tailer is anticipated to drop below
26、the safety stock at first. If thetime interval between the inspection time and the predicted timeis approximately close to total lead time, then the supplier placesan order for the retailer. However, unlike the centralized model,after the supplier receives the ordered quantity from the outsidesource
27、, he keeps the quantity in his warehouse until the retaileractually places an order. In the decentralized model, safety stockand safety lead time are control parameters.Using a reinforcement learning technique, the control param-eters of the two models are designed to adaptively change. Thereinforce
28、ment learning technique employed in this research iscalled the action-value method 2, which is suitable to heuris-tically solve sequential optimization problems in uncertain en-vironments. A representative domain appropriate for applyingthe action-value method is the stochastic optimization problem,
29、where the value of each action is not known but should belearned through repetitive applications of the action in a realor simulated domain. The main advantage of the reinforcementlearning is that it is possible to make good decisions while thelearning is progressing. In this respect, reinforcement
30、learningwould be appropriate for applying to real-time control problems.Specifically, at each decision point of time, one of the pos-sible actions is selected based on a probabilistic function of theirvalue estimates. In minimization problems, it is desirable to givemore opportunity of being selecte
31、d to the actions with low valueestimates. This idea can be incorporated into the following prob-abilistic action selection rule:Pnew action = a=eValueEstimate(a)summationtextaiASeValueEstimate(ai)(1)where AS is the set of possible actions. Because the numerator,eValueEstimate(a), in Eq. 1 increases
32、as the value estimate of theaction ValueEstimate(a) decreases, the action with the lowest es-timated value would be selected with the highest probability. Thedenominator is a normalization term to make the action selectionrule be a probability function.The result of the selected action (current valu
33、e) is then usedfor learning its objective value. The learning formula we employis called the exponential recency weighted average (pp. 37, Sut-ton and Barto 2) and can be defined asNewValueEstimante OldValueEstimate+StepSizeCurrentValue OldValueEstimate (2)Each time a specific action is performed, i
34、ts new value estimateis updated by adding an error (weighted difference of the currentvalue and the old estimate) to the old estimate. The error indi-cates a desirable direction to which the value estimate moves.StepSize is a learning parameter that decides learning speed. Itis normally set to a con
35、stant, such as 0.1, which has been ex-perimentally verified to be desirable, especially in nonstationaryenvironments (pp. 39, Sutton and Barto 2). At the next decisionpoint of time, a new action is chosen according to the probabilis-tic rule with the updated value estimate, and this procedure isrepe
36、ated until the end of the decision horizon is reached.1186Fig. 1. Inventory replenishment time and order cycle in periodic inspectionsystemIn the context of the problem discussed in this paper, the ac-tion corresponds to a control parameter (safety stock or safetylead time). As shown in Fig. 1, for
37、each retailer, the decisionpoint of time implies inventory replenishment time. The time in-terval between two consecutive inventory replenishment timesis called an order cycle. When a new safety stock is selectedat an inventory replenishment time, the service level during thelead time is measured at
38、 the end of the order cycle. The re-sult of the safety stock is then defined as the absolute deviationof the service level from a target service level. After that, thevalue estimate of the safety stock is updated according to Eq. 2.The NewValueEstimate in Eq. 2 means the weighted average ofthe absol
39、ute deviations of service levels during lead times fromthe target service level. Therefore, as learning progresses, safetystocks with low service-level deviations will be given high selec-tion probabilities.The remainder of this paper is organized as follows. InSect. 2, we review extant inventory-co
40、ntrol methods relevant toour models. In Sect. 3, we present the two inventory controlmodels. In Sect. 4, we present the results of a simulation-basedperformance evaluation. Finally, in Sect. 5, we conclude this re-search and remark on some future research areas.2 Literature surveyRelated to adaptive
41、 inventory control in supply chains, most pre-vious research efforts have been occurred in the mathematicalproduction control area 36. With the objective of minimizingtotal sum of inventory costs, they formulate the inventory-controlproblem as a dynamic programming model and adaptively es-timate the
42、 uncertain parameters of demand distribution usingdemand history. While the rigorous optimization models showsome mathematical convergence results in stationary demandcases, it is unsupported in many applied contexts in which cus-tomer demand processes are nonstationary.The idea of our approach is s
43、imilar to Packer 7. He consid-ered the (Q, R) inventory policy in single-site inventory-controlproblems. He suggested a way of taking advantage of using de-mand history to decrease inventory-related costs. Specifically,order quantity Q is calculated with the economic order quantity(EOQ) model, for w
44、hich average demand rate is estimated by theexponential smoothing formula. Then the average demand dur-ing lead time and a predetermined safety-stock factor are usedfor setting reorder point, R.Moinzadeh 8 proposed a supplier replenishment policy inwhich an order is placed to the outside source imme
45、diately aftera retailers inventory position reaches R +s. Thus, s,inasense,gauges the proactivity of the supplier from information availabil-ity. He computationally derived the optimal s under the assump-tion that customer demand at the retailers is Poisson.Recently, a few distributed inventory-cont
46、rol models havebeen proposed. Axsater 9 applied the Stackelberg game modelto the decentralized control of a multiechelon inventory systemconsisting of a central warehouse and multiple retailers. Also, byemploying penalty cost concepts, Andersson et al. 10, Lee andWhang 11 and Cachon and Zipkin 12 at
47、tempted to distributedecision-making rights to the participants in a supply chain whileenforcing each participant to respect others costs. However, allof them analytically solved the distributed problems under theassumption that the customer demand distribution is known.Zhao et al. 13 proposed a ret
48、ailers early order commit-ments rule in a decentralized supply chain for enabling sup-pliers to smooth production, better utilize resources and ulti-mately reduce costs in the whole supply chain. They investigatedthe impact of the early-order commitment rule and forecastingmodels on the supply-chain
49、 performance under different scenar-ios of demand patterns and suppliers capacity tightness. Zhaoand Xie 14 also examined the impact of forecasting errors onthe value of information sharing between a supplier and retailers.They experimentally showed that, although information sharinggives benefits to the supplier, in most cases, it increases the re-tailers costs. This phenomenon becomes more distinctive as themagnitude of the forecasting error increases.Finally, a reinforcement learning approach was recently ap-plied to a coor