编译器keil的优化选项针对ARM.docx-道客多多

资源描述

1、分类： 2013-01-11 14:12 280 人阅读 (0) 最近发现在keil 编译的时候，出现莫名的问题，貌似代码有被优化掉的问题，后来查了下相关的资料，貌似懂了点。我选择的是默认的 default优化方式，上网看了下，默认的是 level2级别优化，最后选择level0就没有问题了下面是网上找的资料，介绍了优化功能介绍Getting the Best Optimized Code for your Embedded Application ARM Compilation Tools The ARM Compilation Tools are the only compilation

2、 tool s co -developed with the ARM processors, and specifically designed to optimally support the ARM architecture. They are a result of 20 years of development, and are recognized as the industry -leading C and C+ compilation tools for the ARM, Thumb, and Thumb -2 instructions sets. The ARM Compila

3、tion tools consist of: The ARM Compiler, which enables you to compile C and C+ code. It is an optimizing compiler, and features command - line options to enable you to control the level of optimization Linker and Utilities, which assign addresses and lay out sections of code to form a final image A

4、selection of libraries, including the ISO standard C libraries, and the MicroLIB C library which is optimized for embedded applications Assembler, which generates machine code instructions from ARM, Thumb or Thumb-2 assembly- level source code Compiler Options for Embedded Applications The ARM Compi

5、lation Tools include a number of compiler optimizations to help you best target your code for your chosen microcontroller device and application area. They can be accessed from within Vision by clicking on Project Options for Target. T he options described this document can be found on the Target an

6、 d C/C+ tabs of the Options for Targets dialog. MDK Compiler Optimizations Cross- Module Optimization takes information from a prior build and uses it to place UNUSED functions into their own ELF section in the corresponding object file. This option is also known as Linker Feedback, and requires you

7、 to build your application twice to take adv antage of it for reduced code size. Cross-Module Optimization has been shown to reduce code size, by removing unused functions from your application. It can also improve the performance of your application, by allowing modules to share inline code. The M

8、icroLIB C library has been optimized to reduce the size of embedded applications. It is a subset of the ISO standard C runtime library, and offers a tradeoff between functionality and code size. Some of the standard C library functions such as memcpy() are slower, while some features of the default

9、library are not supported. Unsupported features include: o Operating system functions e.g. abort(), exit(), time(), system(), getenv(), o Wide character and multi-byte support e.g. mbtowc(), wctomb() o The stdio file I/O function, with the exception of stdin, stdout and stderr o Position-independent

10、 and thread -safe code Use the MicroLIB C library for applications where overall performance can be traded off against the need to reduce code size and memory cost. Link- Time Code Generation instructs the compiler to create objects in an intermediate format so that the linker can perform further co

11、de optimizations. This gives the code generator visibility into cross - file dependencies of all objects simultaneously, allowing it t o apply a higher level of optimizations. Link -time code generation can reduce code size, and allow your application to run faster. Optimization Levels can also be a

12、djusted. The different levels of optimization allow you to trade off between the level of debug information available in the compiled code, and the performance of the code. The following optimization levels are available: o - O0 applies minimum optimizations. Most optimizations are switched off, and

13、 the code generated has the best debug view. o - O1 applies restricted optimization. For example, unused inline functions and unused static functions are removed. At this level of optimization, the compiler also applies automatic optimizations such as removing redundant code and re -ordering instruc

14、tions s o as to avoid an interlock situation. The code generated is reasonably optimized, with a good debug view. o - O2 applies high optimization (This is the default setting). Optimizations applied at this level take advantage of ARMs in-depth knowledge of the processor architecture, to exploit pr

15、ocessor -specific behavio r of the given target. It generates well optimized code, but with limited debug view. o - O3 applies th e most aggressive optimization. The optimization is in accordance with the users Ospace/- Otime choice . By default, multi - file compilation is enabled, which leads to a

16、 longer compile time, but gives the highest levels of optimization. The Optimize for Time checkbox causes the compiler to optimize with a greater focus on achieving the best performance when checked ( - O time) or the smallest code siz e when unchecked ( -O space). Unchecking Optimize for Time selec

17、ts the Ospace option which instructs the compiler to perform optimizations to reduce the image size at the expense of a poss ible increase i n execution time. F or example, using out -of -line function calls instead of inline code for large structure copies. This is the default option. When running

18、the compiler from the command line, this option is invoked using -Ospace Checking Optimize for Time selects the Otime option which instructs the compiler to optimize the code for the fastest execution time, at the risk of an increase in the image size. It is recommended that you compile the time -cr

19、itical parts of your code with Otime, and the rest us ing the Ospace directive . Split Load and Store Multiples instructs the compiler to split LDM and STM instructions involving a large number of registers into a series of loads/stores of fewer multiple registers. This means that an LDM of 16 regis

20、ters can be split into 4 separate LDMs of 4 registers each. This option helps to reduce the interrupt latency on ARM systems which do not have a cache or write buffer, and systems which use zero - wait state 32-bit memory. For example, the ARM7 and ARM9 processor s t ake can only take an exception o

21、n an instruction boundary. If an exception occurs at the start of an LDM of 16 registers in a cacheless ARM7 /ARM9 system, the system will finish making 16 accesses to memory before taking the exception. Depending on the memory arbitration system, this can result in a very high interrupt latency. Br

22、eaking the LDM into 4 individual LDMs for 4 registers means that the processor will take the exception after loading a maximum of 4 registers, thereby greatly reducing the interrupt latency. Selecting this option improves the overall performance of the system. The One ELF Section per Function option

23、 tells the compiler to put all functions into their own individual ELF sections. This allows the linker to remove unused functions. An ELF code section typically contains the code for a number of functions. The linker is normally only able to remove unused ELF sections, not unused functions. An ELF

24、section can only be removed if all its contents are unused. Therefore, splitting each function into its own ELF section allows the compiler to easily identify which ones are unused, and remove them. Selecting this option increases the time required to compile your code, but results in improved perfo

25、rmance . The combination of options applied will depend on your optimization goal whether you are optimizing for smallest code size, or best performance. The next section illustrates the best optimization options for each of these goals. Optimizing for Smallest Code Size To optimize your code for th

26、e smallest size, the best options to apply are: The MicroLIB C library Cross- module optimization Optimization level 2 ( -O2) Compile the Measure example without any optimizations The Measure example uses analog and digital inputs to simulate a data l ogger. File - Open Project C: Keil ARMBoards Kei

27、l MCBSTM32MeasureMeasure.uv2 Click the Options for Target button In the Target tab: Uncheck Cross- Module Optimization Uncheck Use MicroLIB Uncheck Use Link- Time Code Generation In the C/C+ tab: Set Optimization Level to Zero Then click OK to save your changes. Project Build target Without any comp

28、iler optimizations applied, the initial code size is 13,656 Bytes. MDK Compiler Optimizations Optimize the Measur e example for Size Apply the compiler optimizations in turn, and re-compile each time to see their effect in reducing the code size for the example. Options for Target Target tab: Use th

29、e MicroLIB C library Options for Target Target tab: Use cross - mod ule optimization - Remember to compile twice Options for Target C/C+ tab: Enable Optimization level 2 ( -O2) Optimization Applied Compile Size Size Reduction Improvement MicroLIB C library 8,960 Bytes 4,696 Bytes 34% smaller Cross-

30、Module Compilation 13,500 Bytes 156 Bytes 1.1% smaller Optimization level O2 12,936 Bytes 720 Bytes 5.3% smaller All 3 optimization options 8,116 Bytes 5,540 Bytes 40.6% smaller Applying all the optimizations will reduce the code size down to 8,116 Bytes. The fully optimized code is 5,540 Bytes smal

31、ler, a total code size reduction of 40.6% MDK Compiler Optimizations Optimizing for Best Performance To optimize your code for performance, the best options to apply are: Cross- module optimization Optimization level 3 ( -O3) Optimize for time Run the Dhrystone benchmark without any optimizations Th

32、e Dhrystone benchmark is used to measure and compare the performance of different computers, or the efficiency of the code generated for the same computer by different compilers. File Open Project C: Keil ARMExamples DHRY DHRY.uv2 Click the Options for Target button Turn off optimization settings in

33、 the Target and C/C+ tabs , then click OK Project Build target Enter D ebug mode View Se rial Windows UART #1 Open the UART #1 window View Analysis Windows Performance Analyzer Open the Performance Analyzer Debug Run Start running the application When prompted: Enter 50000 in the UART#1 window and p

34、ress Enter In the Performance Analyzer window, note that The drhy_1 loop took 2.829s The dhry_2 took 2.014s In the UAR T #1 window, note that It took 138.0 ms for 1 run through Dhrystone The application is executing 7246.4 Dhrystones per second Optimize the Dhrystone example for Performance Re-compi

35、le the example with all three of the following optimizations applied: Options f or Target Target tab: Cross - module optimization Remember to compile twice Options for Target C/C+ tab: Optimization level 3 ( -O3) Options for Target C/C+ tab: Optimize for Time Re-run the application, and examine the

36、performance. Measurement Without optimizations With Optimizations Improvement dhry_1 2.829s 1.695s 40.1% faster dhry_2 2.014s 1.011s 49.8% faster Microseconds for 1 run through Dhrystone 138.0 70 49.3% faster Dhrystones per second 7246.4 14,285.7 97.1% more The fu lly optimize d code achieves approx

37、imate ly 2x the performance of the un -optimized code. Summary The ARM Compilation Tools offer a range of options to apply when compiling your code. These options can be combined to optimize your code for best performance, for smallest code size, or for any performance point between these two extrem

38、es, to best suit your targeted microcontroller device and market. When optimizing your code, MDK- ARM makes it easy and convenient to measure the effect of the different optimization sett ings on your application. The code size is clearly displayed after compilation, and a range of analysis tools su

39、ch as the Performance Analyzer enable you to measure performance. The optimization options in the ARM Compilation Tools, together with the easy- to - use analysis tools in MDK - ARM, help you to easily optimize your application to meet your specific requirements. 获得最佳优化的代码为您的嵌入式应用ARM 编译工具ARM 编译工具是

40、唯一的编译工具与 ARM 处理器共同开发，并专门最佳支持 ARM 架构。他们是 20多年的发展，被确认为业界领先的C 和 C 编译工具的手臂，拇指和拇指-2 指令集。ARM 编译工具包括：ARM 编译器，它使您能够编译 C 和 C代码。这是一个优化的编译器，功能命令 - 行选项，使您能够控制的优化级别连接器和实用程序，分配地址和代码段，形成最终的图像库的选择，包括 ISO标准 C 库，以及新增加的 microlib 这是优化的 C库嵌入式应用汇编器，生成机器代码指令的 ARM，Thumb或 Thumb-2 汇编级源代码用于嵌入式应用的编译器选项ARM 编译工具包括编译器优化，以帮助您最好

41、针对您的代码，您所选择的一些微控制器的设备和应用领域。他们可以从Vision 访问点击项目 - 目标选项。他选择本文档描述的目标，C / C + +目标“对话框的选项标签上可以找到。MDK 编译优化跨模块优化信息从之前的构建，并使用它来将未使用的功能集成到他们相应的对象文件的 ELF 节。该选项也被称为链接器反馈，并且需要您在建立你的应用程序，两次采取副词 antage 的减少代码大小。跨模块优化已经证明，以减少代码大小，从应用程序中删除未使用的功能。它还可以提高应用程序的性能，允许内嵌代码模块共享。的 M icroLIB 的 C 库已优化的嵌入式应用，以减少大小。它的一个子集的 ISO标准

42、 C 运行时库，并提供了功能和代码大小之间的权衡。有些标准 C库memcpy（）函数的功能，如速度较慢，而默认的库不支持某些功能。不支持功能包括：o 操作系统的功能，例如退出中止（），（），（），（），用 getenv（）o 宽字符和多字节支持，例如 wctomb mbtowc（）（）stdio的文件 I / O 功能，除标准输入，标准输出和标准错误O 位置独立的线程安全的代码使用新增加的 microlib C 库的整体性能的应用场合需要减少代码可以进行交易抵销大小和内存成本。链接时代码生成指示编译器创建的对象中的中间格式，使连接器可以进行进一步的优化代码。这使代码生成器

43、的可视性 - 文件中的所有对象的依赖同时，以申请更高级别的优化。链接时代码生成，可以减少代码大小，让应用程序运行得更快。优化级别，也可以进行调整。不同层次的优化，让您取舍之间的水平调试信息可以在编译的代码，代码的性能。下面的优化水平可供选择：O - O0 适用最低的优化。最优化关闭，生成的代码具有最佳的调试视图。O - O1 适用于受限制的优化。例如，未使用的内联函数和未使用的静态函数将被删除。在这个层面上的优化，编译器也适用于自动优化，如去除冗余代码，并重新排序指令，所以以避免的联锁情况。生成的代码优化合理，具有良好的调试视图。O - O2 适用于高优化（这是默认设置）。在这个级别应用

44、的优化利用 ARM的处理器架构的深入了解，利用给定的目标的特定处理器的行为。它产生很好的优化代码，但有限的调试视图。邻 - O3 适用于日最积极的优化。的优化是根据与用户的的 - Ospace / - Otime进行选择。默认情况下，多 - 文件汇编启用，这导致更长的编译时间，但给出了最高级别的优化。时间“复选框的优化，使编译器将更加注重优化达到最佳性能检查（ - O 时间）或最小的代码尺寸未选中时（-O 空间等）。取消选中优化时间选择 - Ospace 编译选项指示编译器执行优化，以降低图像的大小，以牺牲一个POSS IBLE 的执行时间增加。 F 或例如，使用在线功能大型结构副本，

45、而不是内联代码调用。这是默认的选项。当运行编译器命令行中，该选项被调用使用的-Ospace检查时间的优化选择的 - Otime选项指示编译器优化代码以最快的执行时间，图像尺寸增加的风险。建议您编译时间的关键部分您的代码 - Otime时，的其余我们ING 的 - Ospace 指令。拆分负载和存储倍数指示编译器 LDM 和 STM 指令涉及了大量的分割一系列的寄存器加载/存储多个寄存器较少。这意味着，可以分割成16 个寄存器的 LDM4 个独立的4 个寄存器的 LDM。这个选项有助于减少中断延迟的 ARM 系统上不有一个缓存或写入缓冲区，系统使用零等待状态 - 32 位内存。例如，ARM7

46、和 ARM9 处理器 ST 阿克只能采取一个指令边界上的一个例外。如果异常发生时的 LDM 的开始的 16个寄存器，在没有高速缓存的 ARM7 / ARM9 系统，该系统将完成16 的内存访问异常。根据存储器仲裁制度，这可能会导致在一个非常高的中断延迟。也就是说处理器将打破4 个寄存器分为 4 个独立的 LDM LDM采取异常后最多可装载4 个寄存器，从而大大降低了中断延迟。选择此选项可提高系统的整体性能。一个 ELF节每个功能选项告诉编译器将所有功能集成到自己的个人ELF的章节。这允许链接器删除未使用的功能。一个 ELF代码段通常包含多项功能的代码。链接器通常只能够删除未使用的 EL

47、F 节，而不是未使用的功能。一个 ELF节只能所有内容都被删除，如果未使用。因此，每个功能拆分到它自己的 ELF 节使编译器可以很容易地识别哪些是未使用的，并删除它们。选择此选项会增加编译代码所需的时间，但在提高性能的结果。应用选项的组合将取决于你的优化目标 - 无论你是最小的代码优化的大小，或者最佳的性能。下一节将说明这些目标的最优化选择。最小的代码大小优化要优化你的代码的最小尺寸，适用的最佳选择是：新增加的 microlib C 库跨模块优化优化级别 2（O2）没有任何优化编译测量示例测量例如使用模拟数据升 ogger 的模拟和数字输入。“文件” - “打开项目”C： KEIL ARM 板

48、 KEIL MCBSTM32 测量 Measure.uv2 上单击“目标”选项按钮“在“目标”选项卡：取消选中“跨模块优化取消使用 microlib 中取消选中“使用链接时代码生成在 C / C + +选项卡：优化级别设置到零然后点击“确定”保存更改。项目 - 构建目标没有任何编译器优化应用，最初的代码大小是 13,656 字节。MDK 编译优化尺寸优化的MEASUR 例子反过来，编译器优化应用并重新编译每次看他们的效果，减少代码大小例子。目标选项“ - ”目标“选项卡：使用新增加的 microlib C 库目标选项“ - ”目标“选项卡：使用交叉 - MOD ULE 优化 - 请记住，两次编

49、译目标选项 - C / C 选项卡：启用优化级别 2（O2）优化应用编译尺寸大小减少改善microlib 中 C库 8,960 字节4,696 字节小 34跨模块编译13,500 字节156 字节小 1.1优化级别 - O2 12,936 字节 720 字节小 5.3所有的优化选项 8,116 字节5,540 字节小 40.6应用的所有优化将会减少代码大小8,116 字节。全面优化的代码是 5,540 字节小，总的代码大小减少 40.6MDK 编译优化优化最佳性能要优化你的代码的性能，最好的选择，适用于：跨模块优化优化级别 3（O3）优化时间没有任何优化，运行 Dhrystone 基准Dhrystone 基准是用来衡量和比较不同的计算机的性能或效率的由不同的编译器生成的代码在同

展开阅读全文