1、MLP,Week 03,Outline,Introduction Architecture Learning Process Learning Issues Summary and Further Discussion,Limit of Perceptron,XOR Problem,Limit of Perceptron,Nonlinear Separable Problem,+,Positive Class,Negative Class,Limit of Perceptron,Linear Classifier Linear Decision Boundary in Perceptron 2
2、 Dimensional Space: Line 3 Dimensional Space: Plane N Dimensional Space: Hyperplane Unsolvable to Problems XOR Problem Nonlinear Separable Research on Neural Network in Dark Age 1960 1980 (20 years) Motivation for inventing MLP Addition of intermediate layer called hidden layer Solvable even to Nonl
3、inear Separable,Limit of Perceptron,Solution to XOR Problem,1,1,bias,-0.5,0, 00 0, 11 1, 01 1, 11,Limit of Perceptron,Solution to XOR Problem,1,1,bias,-1.5,0, 00 0, 10 1, 00 1, 11,Limit of Perceptron,Solution to XOR Problem,0, 00 0, 11 1, 01 1, 10,1,1,1,1,1,-1.5,bias,bias,-1.5,-0.5,-0.5,Limit of Per
4、ceptron,Overview of MLP Addition of one more layer between input and output layer Added Layer called Hidden Layer Boundary: Linear Quadratic Solvable even to non-linear separable classification Approximation of any nonlinear function By Universe Theorem,Architecture,Input Layer,Output Layer,Hidden L
5、ayer,Architecture,Input Layer Receive input vector #Nodes = dimension of input vector Net input: element of input vector Output = Net Input Linear Function as Activation Function,Architecture,Input Nodes,Architecture,Hidden Layer Encode Input Vector into another Form Intermediate Layer Receive Net I
6、nput as the summation of product of input value and weights Compute its own output and transfer it to output layer #Nodes: Arbitrary Too Many Nodes Overfitting and High Complexity Too Few Nodes Underfitting and Poor Learning Linear Boundary Quadratic Boundary Activation Function: Sigmoid Function,Ar
7、chitecture,Hidden Nodes,Architecture,Output Layer Classification Output Value = CSV (Categorical Score Value) #Output Nodes = #Classes (or Categories) Output Value with Maximum Value Classified Class or Category Regression Univariate Regression: #Output Node = 1 Multivariate Regression: #Ouput Nodes
8、 = # Variables Output Value Estimated Output Value,Architecture,Hidden Nodes,Learning Rule: Back Propagation,Feed Forward,Output Computation,Input Layer,Output Layer,Hidden Layer,Learning Rule: Back Propagation,Weight Update,Input Layer,Output Layer,Hidden Layer,Backward,Learning Rule: Back Propagat
9、ion,Notations,Learning Rule: Back Propagation,Notations,Learning Rule: Back Propagation,Gradient Descent for Weights Optimization,Error Function to minimize,E,w,Learning Rule: Back Propagation,Update weight between output and hidden,Learning Rule: Back Propagation,Update weight between hidden and in
10、put,.,jth input,ith hidden,First output,cth ouput,.,Learning Rule: Back Propagation,Update weight between hidden and input,Learning Rule: Back Propagation,Batch Learning,Input: Training ExamplesInitialize Weights at Random Iterate T timesfor each training examplecompute values of hidden nodescompute
11、 value of output nodescompute average errorupdate weights between output and hiddenupdate weights between hidden and inputOutput: Optimized Weights,Learning Rule: Back Propagation,Interactive Learning,Input: Training ExamplesInitialize Weights at Random Iterate T timesfor each training examplecomput
12、e values of hidden nodescompute value of output nodescompute average errorupdate weights between output and hiddenupdate weights between hidden and input Output: Optimized Weights,Learning Issues,Optimization Architecture #Input Nodes Dimensions #Output Nodes Binary Classification: One node Multiple
13、 Classification: #Classes Univarite Regression: One node Multivariate Regression: #Output Variables #Hidden Nodes ? Validation Set Set of some training examples separated from given Training examples Reduction of #Training Examples for Training Parameter Optimization (#Learning epochs) Falling into
14、Local Minima Reducing following descent Once reach minima, not moving,Learning Issues,Parameter Settings Learning Rate: Arbitrary between 0 and 1 Close to 1: Fast Learning but Fluctuation Close to 0: Slow Learning but Stability #Hidden Nodes Many Nodes: Much Time for Learning, Overfitting Few Nodes:
15、 Less Time for Learning , Underfitting Training Iteration: Too Many: Overfitting Too Few: Underfitting,Learning Issues,Validation Set,Training Set,Test Set,For Training MLP,For Evaluating Performance Hide Target Labels during Training,Training Set,Validation Set,Learning Issues,Falling into local mi
16、nima,E,w,Learning Issues,Other Issues of MLP Obtaining Training Examples No Evidence to given answer Slow Learning Large Dimension in its application to real problems,Summary and Further Discussions,Summary Multiple Perceptrons as solution to Limit of Perceptron Architecture of MLP Learning Process
17、of MLP Learning Issues of MLP,Summary and Further Discussions,Virtual Training Examples Solution to insufficient number of training examples Other Training Examples derived from given Training Examples Original Training Examples Actual Ones labeled with their Target Outputs, initially Derived Traini
18、ng Examples Virtual Ones without their target outputs Target Output by Generalization of MLP Actual Ones and Virtual Ones Training,Summary and Further Discussions,Co Learning Two MLPs: MLP 1 and MLP 2 Training Examples: Labeled + Unlabeled MLP 1 Trained by Labeled Ones MLP 2 Trained by Labeled Ones
19、MLP 1 labels unlabeled training examples by its own generalization Set 1 MLP 2 labels unlabeled training examples by its own generalization Set 2 MLP 1 Trained by Labeled + Set 2 MLP 2 Trained by Labeled + Set 1,Summary and Further Discussions,Evolutionary Neural Networks Optimize Weights by Evoluti
20、onary Computations instead of Gradient Descent Evolutionary Computations Genetic Algorithm Genetic Programming Evolutionary Strategy Evolutionary Programming Avoid falling into local minima,Summary and Further Discussions,Application of MLP to Time Series Prediction X(1), X(2), X(3), , X(T) Training Examples (d: temporal window size) X(1), , X(d) X(d+1) X(2), , X(d+1) X(d+2) . X(T-1-d), , X(T-1) X(T) Test Example X(T-d), X(T) X(T+1),