1、 1 SolutionsSolution 1.11.1.1 Computer used to run large problems and usually accessed via a network: 5 supercomputers1.1.2 1015or 250bytes: 7 petabyte1.1.3 Computer composed of hundreds to thousands of processors and terabytes of memory: 3 servers1.1.4 Todays science fi ction application that proba
2、bly will be available in near future: 1 virtual worlds1.1.5 A kind of memory called random access memory: 12 RAM1.1.6 Part of a computer called central processor unit: 13 CPU1.1.7 Thousands of processors forming a large cluster: 8 datacenters1.1.8 A microprocessor containing several processors in th
3、e same chip: 10 multi-core processors1.1.9 Desktop computer without screen or keyboard usually accessed via a net-work: 4 low-end servers1.1.10 Currently the largest class of computer that runs one application or one set of related applications: 9 embedded computers1.1.11 Special language used to de
4、scribe hardware components: 11 VHDL1.1.12 Personal computer delivering good performance to single users at low cost: 2 desktop computers1.1.13 Program that translates statements in high-level language to assembly language: 15 compilerS2 Chapter 1 Solutions1.1.14 Program that translates symbolic inst
5、ructions to binary instructions: 21 assembler1.1.15 High-level language for business data processing: 25 cobol1.1.16 Binary language that the processor can understand: 19 machine language1.1.17 Commands that the processors understand: 17 instruction1.1.18 High-level language for scientifi c computat
6、ion: 26 fortran1.1.19 Symbolic representation of machine instructions: 18 assembly language1.1.20 Interface between users program and hardware providing a variety of services and supervision functions: 14 operating system1.1.21 Software/programs developed by the users: 24 application software1.1.22
7、Binary digit (value 0 or 1): 16 bit1.1.23 Software layer between the application software and the hardware that includes the operating system and the compilers: 23 system software1.1.24 High-level language used to write application and system software: 20 C1.1.25 Portable language composed of words
8、and algebraic expressions that must be translated into assembly language before run in a computer: 22 high-level language1.1.26 1012or 240bytes: 6 terabyteSolution 1.21.2.1 8 bits 3 colors = 24 bits/pixel = 4 bytes/pixel. 1280 800 pixels = 1,024,000 pixels. 1,024,000 pixels 4 bytes/pixel = 4,096,000
9、 bytes (approx 4 Mbytes).1.2.2 2 GB = 2000 Mbytes. No. frames = 2000 Mbytes/4 Mbytes = 500 frames.1.2.3 Network speed: 1 gigabit network = 1 gigabit/per second = 125 Mbytes/second. File size: 256 Kbytes = 0.256 Mbytes. Time for 0.256 Mbytes = 0.256/125 = 2.048 ms.Chapter 1 Solutions S31.2.4 2 micros
10、econds from cache = 20 microseconds from DRAM. 20 micro-seconds from DRAM = 2 seconds from magnetic disk. 20 microseconds from DRAM = 2 ms from fl ash memory.Solution 1.31.3.1 P2 has the highest performanceperformance of P1 (instructions/sec) = 2 109/1.5 = 1.33 109performance of P2 (instructions/sec
11、) = 1.5 109/1.0 = 1.5 109performance of P3 (instructions/sec) = 3 109/2.5 = 1.2 1091.3.2 No. cycles = time clock ratecycles(P1) = 10 2 109= 20 109 scycles(P2) = 10 1.5 109= 15 109 scycles(P3) = 10 3 109= 30 109 stime = (No. instr. CPI)/clock rate, then No. instructions = No. cycles/CPIinstructions(P
12、1) = 20 109/1.5 = 13.33 109instructions(P2) = 15 109/1 = 15 109instructions(P3) = 30 109/2.5 = 12 1091.3.3 timenew= timeold 0.7 = 7 sCPI = CPI 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 3 = No. instr. CPI/time, then(P1) = 13.33 109 1.8/7 = 3.42 GHz(P2) = 15 109 1.2/7 = 2.57 GHz(P3) = 12 109 3
13、/7 = 5.14 GHz1.3.4 IPC = 1/CPI = No. instr./(time clock rate)IPC(P1) = 1.42IPC(P2) = 2IPC(P3) = 3.331.3.5 Timenew/Timeold= 7/10 = 0.7. So new= old/0.7 = 1.5 GHz/0.7 = 2.14 GHz.1.3.6 Timenew/Timeold= 9/10 = 0.9.So Instructionsnew= Instructionsold 0.9 = 30 109 0.9 = 27 109.S4 Chapter 1 SolutionsSoluti
14、on 1.41.4.1 P2 Class A: 105instr. Class B: 2 105instr. Class C: 5 105instr. Class D: 2 105instr.Time = No. instr. CPI/clock rateP1: Time class A = 0.66 104Time class B = 2.66 104Time class C = 10 104Time class D = 5.33 104Total time P1 = 18.65 104P2: Time class A = 104Time class B = 2 104Time class
15、C = 5 104Time class D = 3 104Total time P2 = 11 1041.4.2 CPI = time clock rate/No. instr.CPI(P1) = 18.65 104 1.5 109/106= 2.79CPI(P2) = 11 104 2 109/106= 2.21.4.3clock cycles(P1) = 105 1 + 2 105 2 + 5 105 3 + 2 105 4 = 28 105clock cycles(P2) = 105 2 + 2 105 2 + 5 105 2 + 2 105 3 = 22 1051.4.4 (500 1
16、 + 50 5 + 100 5 + 50 2) 0.5 109= 675 ns1.4.5 CPI = time clock rate/No. instr.CPI = 675 109 2 109/700 = 1.921.4.6Time = (500 1 + 50 5 + 50 5 + 50 2) 0.5 109= 550 nsSpeed-up = 675 ns/550 ns = 1.22CPI = 550 109 2 109/700 = 1.57Chapter 1 Solutions S5Solution 1.51.5.1 a. 1G, 0.75G inst/sb. 1G, 1.5G inst/
17、s1.5.2a. P2 is 1.33 times faster than P1b. P1 is 1.03 times faster than P21.5.3a. P2 is 1.31 times faster than P1b. P1 is 1.00 times faster than P21.5.4a. 2.05 sb. 1.93 s1.5.5 a. 0.71 sb. 0.86 s1.5.6a. 1.30 times fasterb. 1.40 times fasterSolution 1.61.6.1 Compiler A CPI Compiler B CPIa. 1.00 1.17 b
18、. 0.80 0.58S6 Chapter 1 Solutions1.6.2 a. 0.86b. 1.371.6.3 Compiler A speed-up Compiler B speed-upa. 1.52 1.77b. 1.21 0.881.6.4 P1 peak P2 peaka. 4G Inst/s 3G Inst/sb. 4G Inst/s 3G Inst/s1.6.5 Speed-up, P1 versus P2:a. 0.967105263b. 0.7302631581.6.6 a. 6.204081633b. 8.216216216Solution 1.71.7.1 Geom
19、etric mean clock rate ratio = (1.28 1.56 2.64 3.03 10.00 1.80 0.74)1/7= 2.15Geometric mean power ratio = (1.24 1.20 2.06 2.88 2.59 1.37 0.92)1/7= 1.621.7.2 Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4 Willamette)Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pen
20、tium Pro)Chapter 1 Solutions S71.7.3 Clock rate: 2.667 109/12.5 106= 212.8Power: 95 W/3.3 W = 28.781.7.4 C = P/V2 clockrate80286: C = 0.0105 10680386: C = 0.01025 10680486: C = 0.00784 106Pentium: C = 0.00612 106Pentium Pro: C = 0.0133 106Pentium 4 Willamette: C = 0.0122 106Pentium 4 Prescott: C = 0
21、.00183 106Core 2: C = 0.0294 1061.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)1.7.6 Pentium to Pentium Pro: 3.3/5 = 0.66Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0.53Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0.71Pentium 4 Prescott to Core 2: 1.1/1.25 = 0.88Geometric
22、mean = 0.68Solution 1.81.8.1 Power1= V2 clock rate C. Power2= 0.9 Power1C2/C1= 0.9 52 0.5 109/3.32 1 109= 1.031.8.2 Power2/Power1= V22 clock rate2/V12 clock rate1Power2/Power1= 0.87 = Reduction of 13%1.8.3 Power2= V22 1 109 0.8 C1= 0.6 Power1Power1= 52 0.5 109 C1V22 1 109 0.8 C1= 0.6 52 0.5 109 C1V2
23、= ( (0.6 52 0.5 109)/(1 109 0.8) )1/2= 3.06 VS8 Chapter 1 Solutions1.8.4 Powernew= 1 Cold V2old/(21/4)2 clock rate 21/2= Powerold.Thus, power scales by 1.1.8.5 1/21/2= 21/21.8.6 Voltage = 1.1 1/21/4= 0.92 V. Clock rate = 2.667 21/2= 3.771 GHzSolution 1.91.9.1 a. 1/49 100 = 2%b. 45/120 100 = 37.5%1.9
24、.2 a. Ileak= 1/3.3 = 0.3b. Ileak= 45/1.1 = 40.91.9.3 a. Powerst/Powerdyn= 1/49 = 0.02b. Powerst/Powerdyn= 45/57 = 0.61.9.4 Powerst/Powerdyn= 0.6 = Powerst= 0.6 Powerdyna. Powerst= 0.6 40 W = 24 Wb. Powerst= 0.6 30 W = 18 W1.9.5 a. Ilk= 24/0.8 = 30 Ab. Ilk= 18/0.8 = 22.5 AChapter 1 Solutions S91.9.6
25、Powerstat 1.0 V Ilkat 1.0 V Powerstat 1.2 V Ilkat 1.2 V Largera. 119 W 119 A 136 W 113.3 A Ilkat 1.0 Vb. 93.5 W 93.5 A 110.5 W 92.1 A Ilkat 1.0 VSolution 1.101.10.1 a. Processors Instructions per processor Total instructions1 4096 40962 2048 40964 1024 40968 512 4096b. Processors Instructions per pr
26、ocessor Total instructions1 4096 40962 2278 45564 1464 58568 1132 90561.10.2 a. Processors Execution time (s)1 4.0962 2.0484 1.0248 0.512b. Processors Execution time (s)1 4.0962 3.2034 3.1648 3.582S10 Chapter 1 Solutions1.10.3 a. Processors Execution time (s)1 5.3762 2.6884 1.3448 0.672b. Processors
27、 Execution time (s)1 5.3762 3.8784 3.5648 3.8821.10.4 a. Cores Execution time (s) 3 GHz1 4.002 2.174 1.258 0.75b. Cores Execution time (s) 3 GHz1 4.002 2.004 1.008 0.50Chapter 1 Solutions S111.10.5 a.CoresPower (W) per core 3 GHzPower (W) per core 500 MHzPower (W) 3 GHzPower (W) 500 MHz1 15 0.625 15
28、 0.6252 15 0.625 30 1.254 15 0.625 60 2.58 15 0.625 120 5b.CoresPower (W) per core 3 GHzPower (W) per core 500 MHzPower (W) 3 GHzPower (W) 500 MHz1 15 0.625 15 0.6252 15 0.625 30 1.254 15 0.625 60 2.58 15 0.625 120 51.10.6 a. Processors Energy (J) 3 GHz Energy (J) 500 MHz160 152 65 16.254 75 18.758
29、90 22.5b. Processors Energy (J) 3 GHz Energy (J) 500 MHz160 1526460 1586S12 Chapter 1 SolutionsSolution 1.111.11.1 Wafer area = (d/2)2a. Wafer area = 7.52 = 176.7 cm2b. Wafer area = 12.52= 490.9 cm2Die area = wafer area/dies per wafera. Die area = 176.7/90 = 1.96 cm2b. Die area = 490.9/140 = 3.51 cm
30、2Yield = 1/(1 + (defect per area die area)/2)2a. Yield = 0.97b. Yield = 0.92 1.11.2 Cost per die = cost per wafer/(dies per wafer yield)a. Cost per die = 0.12b. Cost per die = 0.161.11.3 a. Dies per wafer = 1.1 90 = 99Defects per area = 1.15 0.018 = 0.021 defects/cm2Die area = wafer area/Dies per wa
31、fer = 176.7/99 = 1.78 cm2Yield = 0.97b. Dies per wafer = 1.1 140 = 154Defects per area = 1.15 0.024 = 0.028 defects/cm2Die area = wafer area/Dies per wafer = 490.9/154 = 3.19 cm2Yield = 0.931.11.4 Yield = 1/(1 + (defect per area die area)/2)2Then defect per area = (2/die area)(y1/2 1)Replacing value
32、s for T1 and T2 we getT1: defects per area = 0.00085 defects/mm2= 0.085 defects/cm2T2: defects per area = 0.00060 defects/mm2= 0.060 defects/cm2T3: defects per area = 0.00043 defects/mm2= 0.043 defects/cm2T4: defects per area = 0.00026 defects/mm2= 0.026 defects/cm21.11.5 no solution providedChapter
33、 1 Solutions S13Solution 1.121.12.1 CPI = clock rate CPU time/instr. countclock rate = 1/cycle time = 3 GHza. CPI(pearl) = 3 109 500/2118 109= 0.7b. CPI(mcf) = 3 109 1200/336 109= 10.71.12.2 SPECratio = ref. time/execution time.a. SPECratio(pearl) = 9770/500 = 19.54b. SPECratio(mcf) = 9120/1200 = 7.
34、61.12.3(19.54 7.6)1/2= 12.191.12.4 CPU time = No. instr. CPI/clock rateIf CPI and clock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is, 10%.1.12.5 CPU time(before) = No. instr. CPI/clock rateCPU time(after) = 1.1 No. instr. 1.05 CPI/clock ra
35、teCPU times(after)/CPU time(before) = 1.1 1.05 = 1.155. Thus, CPU time is increased by 15.5%1.12.6 SPECratio = reference time/CPU timeSPECratio(after)/SPECratio(before) = CPU time(before)/CPU time(after) = 1/1.1555 = 0.86. That, the SPECratio is decreased by 14%.Solution 1.131.13.1 CPI = (CPU time c
36、lock rate)/No. instr.a. CPI = 450 4 109/(0.85 2118 109) = 0.99 b. CPI = 1150 4 109/(0.85 336 109) = 16.10S14 Chapter 1 Solutions1.13.2 Clock rate ratio = 4 GHz/3 GHz = 1.33.a. CPI 4 GHz = 0.99, CPI 3 GHz = 0.7, ratio = 1.41b. CPI 4 GHz = 16.1, CPI 3 GHz = 10.7, ratio = 1.50They are different because
37、 although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage.1.13.3 a. 450/500 = 0.90. CPU time reduction: 10%.b. 1150/1200 = 0.958. CPU time reduction: 4.2%.1.13.4 No. instr. = CPU time clock rate/CPI.a. No. instr. = 820 0.9 4 109/0.96 = 3075 109
38、b. No. instr. = 580 0.9 4 109/2.94 = 710 1091.13.5 Clock rate = No. instr. CPI/CPU time. Clock ratenew= No. instr. CPI/0.9 CPU time = 1/0.9 clock rateold= 3.33 GHz.1.13.6 Clock rate = No. instr. CPI/CPU time. Clock ratenew= No. instr. 0.85 CPI/0.80 CPU time = 0.85/0.80 clock rateold= 3.18 GHz.Soluti
39、on 1.141.14.1 No. instr. = 106Tcpu(P1) = 106 1.25/4 109= 0.315 103sTcpu(P2) = 106 0.75/3 109= 0.25 103sclock rate(P1) clock rate(P2), but performance(P1) not possibleb. 8 processors: CPIimproved fp= (512 944)/80 not possible1.15.5 Using the clock cycle data from 1.15.4:To half the number of clock cy
40、cles improving the CPI of L/S instructions:CPIfp No. FP instr. + CPIint No. INT instr. + CPIimproved l/s No. L/S instr. + CPIbranch No. branch instr. = clock cycles/2CPIimproved l/s= (clock cycles/2 (CPIfp No. FP instr. + CPIint No. INT instr. + CPIbranch No. branch instr.)/No. L/S instr.Chapter 1 S
41、olutions S17a. 1 processor: CPIimproved l/s= (4096 3072)/1280 = 0.8b. 8 processors: CPIimproved l/s= (512 384)/160 = 0.81.15.6Clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr.Tcpu= clock cycles/clock rate = clock cycles/2 109CPIint= 0.6 1
42、 = 0.6; CPIfp= 0.6 1 = 0.6; CPIl/s= 0.7 4 = 2.8; CPIbranch= 0.7 2 = 1.4a. 1 processor: Tcpu(before improv.) = 4.096 s; Tcpu(after improv.) = 2.739 sb. 8 processors: Tcpu(before improv.) = 0.512 s; Tcpu(after improv.) = 0.342 sSolution 1.161.16.1 Without reduction in any routine:a. total time 2 proc
43、= 185 nsb. total time 16 proc = 34 nsReducing time in routines A, C and E:a. 2 proc: T(A) = 17 ns, T(C) = 8.5 ns, T(E) = 4.1 ns, total time = 179.6 ns = reduction = 2.9%b. 16 proc: T(A) = 3.4 ns, T(C) = 1.7 ns, T(E) = 1.7 ns, total time = 32.8 ns = reduction = 3.5%1.16.2 a. 2 proc: T(B) = 72 ns, tot
44、al time = 177 ns = reduction = 4.3%b. 16 proc: T(B) = 12.6 ns, total time = 32.6 ns = reduction = 4.1%1.16.3 a. 2 proc: T(D) = 63 ns, total time = 178 ns = reduction = 3.7%b. 16 proc: T(D) = 10.8 ns, total time = 32.8 ns = reduction = 3.5%S18 Chapter 1 Solutions1.16.4 # Processors Computing timeComp
45、uting time ratio Routing time ratio2 1764 96 0.55 1.188 49 0.51 1.3116 30 0.61 1.2932 14 0.47 1.0564 6.5 0.46 1.131.16.5 Geometric mean of computing time ratios = 0.52. Multiply this by the computing time for a 64-processor system gives a computing time for a 128-processor system of 3.4 ms.Geometric mean of routing time ratios = 1.19. Multiply this by the routing time for a 64-processor system gives a routing time for a 128-processor system of 30.9 ms.1.16.6 Computing time = 176/0.52 = 338 ms. Routing time = 0, since no com-munication is required.