1、2014 Mathematical Contest in Modeling (MCM) Summary SheetFor office use onlyT1 _T2 _T3 _T4 _Team Control Number201401Problem ChosenBFor office use onlyF1 _F2 _F3 _F4 _SummaryAs is known to all, splicing of paper scraps is a complex issue, which exerts a important role in judicial evidence recovery,
2、restoration of historical documents and access to military intelligence. This paper focuses on splicing problem of paper scraps, establishing shredding distance model and restoration TSP model.At same time,we design one-dimensional and multi-dimensional pieces restoration algorithm and then solves i
3、t by using MATLAB.For question one, we extract information from the Appendix 1 and 2,designing to recover the one-dimensional shredding algorithms,which is characterized by a text character size, line spacing structure. And then we can transfor the shredding problem into recovery TSP problem, thus o
4、btaining the correct recovery graphics and sequences. For question two, we firstly standardize shredding pictures from the Appendix 1 and 2, and then extract the normalized image-level features. For that pictures can not be classified via machine,we use the developed programs of GUI to improve the e
5、fficiency of labor.For question three, three-dimensional design shredding restoration algorithm, a first surface and the surface of Annex 5 b integrate pictures, get 416 pieces of shredding pictures, the picture also standardized level feature extraction, classification and other operations, will re
6、duced dimensions of the one-dimensional problem and solved to obtain the correct recovery images and sequences of positive and negative Annex 5. Taking into account the problem of quantitative evaluation algorithm, this paper presents minimal intervention model to improve the algorithm in place, tha
7、t is, through the computer to recognize the order and sequence in reverse order to recover the number of manual intervention to achieve a minimum number of advantages and disadvantages of the algorithm is portrayed. Keywords:Reconstruct documents;TSP ;Shredding Distance Model;Shredding Restoration A
8、lgorithmTeam # 201401 Page 1 of 251ContentI Introduction2II Symbol Definitions .2III Assumptions and Notations .2 For question one 34.1 Image Preprocessing34.2 Shredding Feature Extraction34.3 Recognition Sequence Based on Text Features44.4 The Definition of Shredding Distance54.5 Recovery of TSP54.
9、6 Simulate Anneal(SA) Algorithm54.7 One-Dimensional Shredding Restoration Algorithm64.8 The Solution of Model6 For question two 75.1 Shredding Standardization And Level Feature Extraction.75.2 The Classification of Level Feature.85.3 TwoDimensional Shredding Restoration Algorithm85.4 The Solution of
10、 Model9 For question three .106.1 Dimensionality Reduction106.2 ThreeDimensional Shredding Restoration Algorithm.116.3 The Solution of Model11 Strengths and Weaknesses 127.1 Strengths127.2 Weaknesses12 The Refinement of our Model 138.1 Improved Apply for Colorful Images138.2 Minimal Intervention Deg
11、ree Algorithm13Reference.13Appendix .15Appendix .16Appendix .17Team # 201401 Page 2 of 252I Introduction Traditionally, reconstructing shredded documents completed by hand is with higher accuracy, but inefficiency,especially when a huge amount of complicated work to complete in a short time. With th
12、e development of computer technology, people is trying to develop automatic splicing technique for reconstructing documents, as to improve the recovery efficiency of splicing.In addition, this is a kind of staff which is related to our daily life. The factors to be considered in reality far more tha
13、n the subject itself, and how to make the model more realistic and provide effective splicing information in this article is a major problem. Faced by lot of information offered and reasonable assumptions for shredding recovery ,we are able to conduct the research for shredding recovery.II Symbol De
14、finitionsSymbol DefinitionsPixel values before binarizationijpPixel values before binarizationijqThe distance between shred A and shred BabDLeft recognition sequence iLRight recognition sequenceiRWidth of characters wCTotal distance of TSPtspDThe length of recognition sequencersLIII Assumptions and
15、Notations For the sake of convenience of the following discussions, we firstly assume that:(1) Text direction is horizontal(2) Positive and negative print margins are in the same format(3) Ignore the efficiency of labor productivityTeam # 201401 Page 3 of 253 For question one4.1 Image preprocessingA
16、ccording to the relevant knowledge ,we need to process the picture pixels.Generally, the image pixel values are positioned within 0,255, and then are distinguished between blank position and font by setting the threshold. As for non-color pictures,we just need to distinguish blank and non-blank.To m
17、ake the picture can clearly describe the empty space and the character position, we use MATLAB for preprocessing and put the image into MATLAB as to obtain the corresponding pixel matrix. At last, we make pixel matrix binarization and then have1, qij 255Pij 255, others 4.2 Shredding feature extracti
18、onGenerally speaking,shredding feature extraction is divided into two categories.One is to extract shredding feature by splicing shape features,and the other is characterized by extracting text shredding based on features. According to the problem, the shape of this paper belongs to the second categ
19、ory.Figure 1. One-dimensional shreddingTeam # 201401 Page 4 of 254Figure 2.Characters featuresIn summary, the text feature extraction as follows:Step 1: the pictures binarization.text is white, blank is black.Step 2: find all line spacing and empty place of pictures, and mark it as grayStep 3: find
20、out all the kerning, and mark it as grayStep 4: calculate the character width by spacing, empty, kerning and other features. According to the problem, this paper extracts text feature by importing the image pixels and using MATLAB program 4.3 Recognition sequence based on text featuresThrough the an
21、alysis of Chinese characters and English letters, we sign the character width of C. The width is divided in two parts, respectively CR and R, and for not being cut character, still retains the width C.Figure 3. Character segmentationAccording to the definition of character-based segmentation,we cons
22、truct recognition sequences based on the characteristics of the textLeft RightFigure 4. Recognition sequencesFor the recognition sequence in Figure 4, the place with no character position is 0, and the other nodes represent the corresponding character length(for the full C and the incomplete is CR o
23、r R).Team # 201401 Page 5 of 2554.4 The definition of shredding distanceAccording to the definition of recognition sequence, we define the distance between shredding A and B and we get 2()abiiidLRCXX 0 or 1From these equations, we know that the greater the degree of agreement of the two recognition
24、sequences, the smaller the distance between two kinds of recognition sequence.Under the conditions, when the two recognition sequences are fully consistent ,the distance will be 0.4.5 Recovery of TSPTSP is one of the most famous problems in graph theory. If we see each of the shredding as a point, t
25、here is a distance between points. In essence, we need to find the smallest total distance path, which is to find an optimal TSP path.So,the recovery of shredding can be abstracted into the recovery of TSP.Therefore, we have the junction 2() abiiidLRCXS.tX 0 or 1where D is total distance of TSPis di
26、stance from i to i+11idBy solving TSP problem, you can get access to each point in the sequence, and finally use MATLAB to get original paper.4.6 Simulate Anneal(SA)algorithmSimulated annealing(SA) algorithm is an iterativesolution strategy on the random search algorithm, it is based on the physical
27、 annealing process of solid material and the general similarity of combinatorial optimization problems. The name and 1miniDTeam # 201401 Page 6 of 256inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals
28、 and reduce their defects. The heat causes the atoms to become unstuck from their initial positions and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. The SA can be described as foll
29、ows:Step 1. Initialization. Given the scope of model for each parameters, randomly selected an initial solution , and calculate the corresponding target value E ( ); 0x 0xset the initial temperature , final temperature , make a random number (0,1) as TfTa probability threshold, set the cooling funct
30、ion T( +1) = T( ), in which, is ttannealing coefficient, is the number of iterations.tStep 2. At a certain T temperature, make aperturbation x , then a new solution is = + produced, calculate the difference E( )=E( ) E( ).XXXStep 3. If E(x) 0, is accepted according to probability p = exp(E/ T), is a
31、 constant and usually taken the value 1. If p K, is accepted.When accepted, = XStep 4. In a certain temperature, repeat steps 3.Step 5. Reduce the temperature T by slow cooling function.Step 6. Repeat steps 2 to step 5, until the condition is meet.By using SA to solve TSP, we can regard each sequenc
32、e as each solution,as to find the optimal scheduling sequence.4.7 One-dimensional shredding restoration algorithmIn summary, through a one-dimensional shredding recovery algorithm,it can automatically recover the one-dimensional shredding.Algorithm steps is as follows:Step 1: Extracting image pixel
33、matrix, having the binarization processingStep 2: Extracting the image feature after binarizationStep 3: Using the text feature in step 2 to construct recognition sequence and transfor it into the TSPStep 4: Using SA for TSP4.8 The Solution of ModelUsing Matlab for computing, we obtain the results a
34、s followed,Team # 201401 Page 7 of 2571 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Appendix 19 15 13 16 4 11 3 17 25 6 10 14 19 12 8 18 1 7Appendix 24 7 3 8 16 19 12 1 62 10 14 11 9 13 15 18 17 5Table1. The splice sequence of Appendix 1 and 2We can complete the shredding splices via using splice sequ
35、ence and get recovery image without human intervention.The results is in the Appendix. For question two5.1 Shredding standardization and level feature extractionStandardization scraps of paper refers to the character layer and blank separation which is as followed,gray,i is blank layerfi black,i is
36、character layerWhere fi is i-th row of image pixels matrix In detail,it is shown as Figure 5Figure 5 Shredding standardizationAs for the normalized image, its hierarchy is obvious.We can extract image-level features and the following steps are shown,Step 1: the picture standardizationStep 2: normali
37、zed image correctionStep 3:watching normalized pixel matrix, recording starting position and end position of each images.Team # 201401 Page 8 of 258The level feature extraction is shown in Figure 6Figure 6 The level feature extraction5.2 The classification of level featureEach image can obtain corre
38、sponding level features,which is used to classify the two-dimensional shredding to reduce the dimension of the problem. Certain steps areStep 1: Contrasting each picture. If it has 3-4 same eigenvalues , then we put the two pictures into one categoryStep 2: We create a set for pictures which fail to
39、 be classifyStep 3: As for the pictures set, we make comparison with known class artificially and put it into corresponding category by using the GUI program.In order to classify the shredding that can not be classified,we use MATLAB GUI programming to achieve the design process of Step 3Figure 6 GU
40、I5.3 Twodimensional shredding restoration algorithmAccording to classification set, you can transfor the two-dimensional pieces of paper into serval one-dimensional piece of paper.But because the original text feature Team # 201401 Page 9 of 259is a small part and the blank areas account for most of
41、 scraps of paper, the results in one-dimensional model for the extent of shredding recovery is not high.We need some manual intervention as to achieve the goal ,which is shown in Figure 7Figure 7 One-dimensional case after conversionRed circle represent the successful shredding.Most shredding can be
42、 made by it, but still need manual adjustments.In summary, the two-dimensional shredding recovery steps are:Step 1: Two-dimensional shredding standardization;Step 2: Extracting the two-dimensional level features of the shredding;Step 3: Using hierarchical characteristics for machine division and a s
43、mall part of human intervention, then having classification;Step 4: Constructing many one-dimensional shredding problem and solved;Step 5: According to the results of Step 4 for proper manual correction.5.4 The Solution of ModelWe use shredding restoration algorithm to recover two-dimensional shredd
44、ing in Appendix 3 and Appendix 4 ,then get results as shown in Appendix IITable 2. The results sequence of Appendix 3Team # 201401 Page 10 of 2510Table 3. The results sequence of Appendix 4Because of characters shredding in the Appendix 3,we have less need for manual calibration frequency and high a
45、ccuracy,but not the same as Appendix 4. For question three6.1 Dimensionality ReductionAssuming that we have the same format of the page margins and other positive and negative printing, so get the same the level features of both positive and negative printing.When we have classification of them,we c
46、an converted question three to question two.Similarly,through the use of the classification for question two, the problem can be transformed into one-dimensional shredding recovery problems,and can be calculate by using one-dimensional shredding restoration algorithm.As for the GUI program, we also
47、need to modify it appropriately so that it can simultaneously compare multiple images.Figure 8 Multi-image GUI programTeam # 201401 Page 11 of 25116.2 ThreeDimensional Shredding Restoration AlgorithmAccording to the classification, three-dimensional shredding recovery problem can be restored to one-dimensional problems. But the characteristics of the text is only a small portion of the original and a blank area represent the majority of shredding , we have the necessarity to intervent results manually.Figure 9 One-dimensional case after conversionRed circle represent th