Tcl中binary format 和binary scan命令分析.doc-道客多多

资源描述

1、Tcl 中 binary format 和 binary scan 命令分析在 Tcl 中，用 binary format 和 binary scan 来处理二进制文件用得比较多，但这个两个命令比较难理解。我花了一天的时间，终于略知一二。现和大家分享下。一：binary 命令的解释binary format binary scan 帮助给出的解释：This command provides facilities for manipulating binary data. The first form, binary format, creates a binary string from no

2、rmal Tcl values. For example, given the values 16 and 22, on a 32-bit architecture, it might produce an 8-byte binary string consisting of two 4-byte integers, one for each of the numbers. The second form of the command, binary scan, does the opposite: it extracts data from a binary string and retur

3、ns it as ordinary Tcl string values.大意是：该命令是对二进制数据进行操作。binary format 命令，是把普通的Tcl 数据转换成二进制字符，例如：在 32 位的机器上，可以把 16 和 22 这样的数据，转换成由两个 4 字节的整数组成的 8 字节的二进制字符串（一个二进制字符的显示图形，是由字符编码方式决定的，在记事本里有ANSI、Unicode、Unicode big endian 和 UTF-8 编码方式，关于字符编码可看：字符编码笔记：ASCII，Unicode 和 UTF-8）。binary scan 命令，功能正好与binary form

4、at 命令相反，是把二进制字符转换成正常的 Tcl 数据。二：binary 命令的语法1binary format formatString ?arg arg .?The binary format command generates a binary string whose layout is specified by the formatString and whose contents come from the additional arguments. The resulting binary value is returnedbinary format 命令接收数据（ arg a

5、rg .?）并根据模板（ formatString）进行压缩转换，最后返回转换的值。处理不同的数据用不同的模板，比如待处理的数据是二进制数（例：1001010）可用 b 或 B，待处理的数据是十六进制数（例：FF）可用 h 或 H；并根据待处理数据的长度，设置 count，比如待处理二进制数 1001010 长为 8，则count=8，（在其它模板中，count 还可表示重复特征等）。2binary scan string formatString ?varName varName .?The binary scan command parses fields from a binary st

6、ring, returning the number of conversions performed. String gives the input bytes to be parsed (one byte per character, and characters not representable as a byte have their high bits chopped) and formatString indicates how to parse it. Each varName gives the name of a variable; when a field is scan

7、ned from string the result is assigned to the corresponding variable.Binary scan 命令根据模块（ formatString）从一二进制字符里解析获得一数值，并把该数值赋给变量（ arg arg .?），该命令返回解析的字符的个数。待解析的字符可能是由几个字节组成，到底由几个字节组成一个字符，由通道的属性来决定，比如 fconfigure channel -encoding binary ，则是一个字节构成一个字符，可以认为高 8 位的字节被砍掉了。三：二进制编码模式的设置fconfigure stdout tra

8、nslation binary encoding binary在文件中处理二进制数据时，要先关闭换行转换和字符设置编码方式：fconfigure stdout translation binary encoding binary，关闭换行转换还可在 puts 命令后面加上变元-nonewline。设置了二进制编码后，在进行二进制输出（puts）时，Tcl 就会把每个 Unicode 字符的高 8 位舍去，保留低8 位写入二进制文件中；在进行读（gets 或 read）二进制文件时，Tcl 就会读取每个 8 位字节并将其储于一个 16 位 Unicode 字符的低半部分中，同时将高半部分设置为

9、0。例 1：set fileID open test.hex w+;fconfigure $fileID -translation binary -encoding binary;puts nonewline $fileID “u30ac“;close $fileID在 tclsh 运行上面代码后，用 UltraEdit 打开 test.hex，可看到，已把高 8 位字节 30 给舍去了。例 2：把 I/O 通道的字符集编码设置成 unicode，即下面的代码，再运行一次。set fileID open test.hex w+;fconfigure $fileID -translation b

10、inary -encoding unicode;puts nonewline $fileID “u30ac“;close $fileID可看到，高 8 位的字节 30 还保存在。注意：puts stdout 与 puts $fileID 时的区别在 Tclsh 编译环境中，一般系统默认编码为 cp936（用 encoding system 查询），跟 unicode 编码方式差不多吧？错，不完全相同。当然 stdout 的默认编码也为 cp936，用 puts “u00ca”，能正确输出为；而当把这个字符写入到 fileID 文件时，用 puts -nonewline $fileID “

11、u00ca”，然后用记事本打开看到的是一个“？”字符，显然是记事本的无法用正常的编码方式打开，用 UltraEdit 可看到其实写入 3F，而u003F 正好就是“？”字符的编码。这是因为 pust $fileID 的编码方式用的默认的 cp936，而一般的文档的编码用的是 unicode，不支持 cp936(这也是为什么 Tclsh 环境中能正确显示，而文档不行)，当用 cp936 编码的字符存入到 unicode 编码的文档中，就丢失信息了，不能正确显示。所以在把数据写入文件 I/O 中，一定要先设置 I/O 通道的编码方式。用fconfigure $fileID -encoding b

12、inary(或 unicode 或 utf-8)，来设置输出到文件的模式，当然从文件 I/O 中读出数据也要同样设置 I/O 通道的编码模式，否则也会丢失信息。五：其它：1关于待要写入文件的数据的格式。在用 Tcl 处理中会产生一些数据，我们想把它保存起来，这些数据不外乎二进制数据和十六进制数据，其它这两种格式的数据是可以转换，例如：11001010B=CAH，binary scan binary format“B8”11001010“H2”tmp，转换后的 tmp 值为 CA。不转换也可以，用 binary format “B8“ 11001010 和 binary format“H2”CA 产生的效果是一样的。为什么要转换后，存入文件，因为小。2关于 big-endian 和 little-endian

展开阅读全文