1、Computer Communications 23 (2000) 15941605 On object initialization in the Java bytecode q S. Doyon * , M. Debbabi LSFM Research Group, Department of Computer Science, Laval University, Sainte Foy, Que., Canada G1K 7P4 Abstract Java is an ideal platform for implementing mobile code systems, not onl
2、y because of its portability but also because it is designed with security in mind. Untrusted Java programs can be statically analyzed and validated. The programs behavior is then monitored to prevent potentially malicious operations. Static analysis of untrusted classes is carried out by a componen
3、t of the Java virtual machine called the verier. The most complex part of the verication process is the dataow analysis, which is performed on each method in order to ensure type-safety. This paper claries in detail one of the tricky aspects of the dataow analysis: the verication of object initializ
4、ation. We present and explain the rules that need to be enforced and we then show how verier implementations can enforce them. Rules for object creation require, among other things, that uninitialized objects never be used before they are initialized. Constructors must properly initialize their this
5、 argument before they are allowed to return. This paper also deals with initialization failures (indicated by exceptions): the object being initialized must be discarded, and constructors must propagate initialization failures. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Java bytecod
6、e; Object initialization; Dataow analysis; static analysis; java security 1. Introduction The Java architecture is particularly well-suited for implementing mobile code systems. A mobile code archi- tecture allows a computer to fetch a program (or parts of a program) from a network source and execut
7、e it locally. However, security is a critical aspect of mobile code archi- tectures. The very essence of mobile code is to execute a program that originates from a remote source. This is inher-ently dangerous because it is not known what actions that program will take. By executing the mobile code,
8、we are allowing it to perform operations on our machine and we are giving it access to our local resources. Java is especially well-suited for implementing mobile code systems for three reasons: Java source is compiled into a platform-independent intermediate form called Java bytecode. Java byte-cod
9、e is then interpreted by the JVM (Java virtual machine). This makes Java bytecode completely portable, which means a piece of Java code in compiled form should run on any receiving machine. q The research reported in this paper has been supported by the National Science and Engineering Research Coun
10、cil (NSERC), the Fonds pour la formation de chercheurs et laide a la recherche (FCAR), and the Defense Research Establishment Valcartier (DREV), Department of National Defense. * Corresponding author. Tel.: _1-41-8656-7035; fax: _1-41-8656-2324. E-mail address: doyonift.ulaval.ca (S. Doyon). It is d
11、ynamically linked: the JVM will load classes from different network sources as they are needed and will link them into the program while it runs. The Java architecture is built with security in mind: its design makes it possible to enforce sufcient security to make mobile code safe and practical. Cu
12、rrently, the most popular manifestation of Java mobile code is applets. A JVM (bytecode interpreter) is incor- porated in web browsers. Web pages can then include links that point to the compiled (bytecode) form of programs which are called applets. The applet can then be loaded by the browser and e
13、xecuted locally with no special effort on the users part. The verier is a key component of the Java security archi-tecture. Its role is to examine compiled classes as they are loaded into the JVM in order to ensure that they are well-formed and valid. It checks that the code respects the syntax of t
14、he bytecode language and that it respects the language rules. Another component of the Java security architecture, called the security manager, monitors access to system resources and services. The security manager is a security layer, which goes on top of the verier and relies on its effectiveness.
15、 The most complex step of the verication process performed by the verier requires running a dataow analy- sis on the body of each method. There are a few particularly tricky issues regarding the dataow analysis. In this paper, we focus on the issues relating to the initialization of 0140-3664/00/$ -
16、 see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 4 0 - 3 6 6 4 ( 0 0 ) 0 0 2 4 5 - 0S. Doyon, M. Debbabi / Computer Communications 23 (2000) 15941605 1595 new objects: Issues relating to object creation: A new object is created in two steps: space is allocated for the n
17、ew object, and then it is initialized. When performing the dataow analysis, the verier must ensure that certain rules are respected: the constructor used to initialize an object must be appropriate, an object must not be used before it is initialized, an object must not be initialized more than once
18、 and initialization failures (indicated by exceptions) must be handled properly. Issues relating to constructors: The constructor is respon- sible for initializing a new object. The rst part of the constructors work performs initialization from a typing point of view, which implies directly or indir
19、ectly calling a constructor from the superclass. The rest of the constructor performs application-specic initialization. The verier must ensure that a constructor properly initi-alizes the current object before it returns, that it does not use the current object in any way before calling the super-c
20、lass constructor and that it propagates any initialization failure occurring in the superclass constructor. The Ofcial documentation on the verier, provided in (Ref. 1, Sections 4.8 and 4.9) and in Ref. 2, is relatively sparse; the portions discussing object initialization are very brief, vague, and
21、 leave out some important issues. Indepen- dent work presented in Ref. 3 has claried many aspects. Freund and Mitchell have extended the formalization of a subset of the Java bytecode language introduced in Ref. 4. They used a type system to describe the veriers handling of object initialization. Ou
22、r paper reviews and explains the rules related to object initialization and discusses how a verier implementation can enforce them. We also touch on a few issues not discussed in Ref. 3. Exceptions thrown during object initialization indicate initialization failures and must be handled properly, bot
23、h inside and outside of a constructor. We also provide a comprehensive, intuitive explanation of how the rules for object creation can be enforced with minimal effort. We assume that the reader has some knowledge of the Java bytecode language, as well as a basic understanding either of dataow analys
24、is in general or of the particular analysis technique used by the Java bytecode verier. The unfamiliar reader may consult the following references for more complete information: for the Java language the reader may refer to the ofcial specication of the language 5. The best way to learn Java or to n
25、d a more understandable explanation of its concepts is to read Ref. 6. For details on the Java standard library, see Ref. 7. The workings of the JVM and the bytecode instruction set are described in the ofcial JVM specication 1. For a lighter approach, see Ref. 8. To gain a good understanding of the
26、 Java bytecode language, it is necessary to experiment with it. Two tools are essential: a class le disassembler, that will print out a class le (and in particular the bytecode) in a readable format. Suns javap tool, which comes with the JDK can be used for this, although other alternatives are avai
27、lable. A byte- code assembler, that produces class les from some source with a manageable syntax. Otherwise, constructing binary class les by hand would be difcult and time consuming. A great solution is the excellent jasmin 9. This paper is organized as follows. Section 2 provides a brief overview
28、of the dataow analysis in order to show the context in which verication of object initialization occurs. Section 3 deals with the creation of new objects, while Section 4 explains the special requirements imposed on constructors. Each of these sections rst presents the neces-sary rules that the veri
29、er must somehow enforce, and then discusses how an implementation could achieve the desired result. Section 5 shows that constructors may leak or save a copy of their this reference, which means that it is possible for incompletely initialized objects to be actually used. Section 6 lists some of the
30、 related work. Some concluding remarks are ultimately sketched as a conclusion in Section 7. 2. Dataow analysis The Java bytecode verier ensures that the classes loaded by the JVM do not compromise the security of the system, either through disrespect of the language rules or through compromise of t
31、he integrity of the virtual machine. The verier validates many syntactical aspects of the class le. It validates eld and method declarations. It makes some checks relating to the superclass. It veries references to other classes, other methods and elds and it enforces access restriction mechanisms (
32、like protected, private and nal). The body of each method is examined in turn: each byte-code instruction and its operands are validated. The most complex yet most interesting part of the veri- cation process is the dataow analysis. It is performed inde- pendently on each method. The dataow analysis
33、 checks that each bytecode instruction gets arguments of the proper type (from the stack or from the registers), detects and prevent overows and underows of the expression evaluation stack and ensures that subroutines are used consistently. The dataow analysis also must check that object initializat
34、ion is performed correctly. This paper will attempt to clarify the properties that need to be enforced on object creation and constructors. We will also propose ways in which a verier implementation can enforce those rules. In order to perform the dataow analysis, it is necessary to keep track of th
35、e type of each value on the stack and in the registers at each program point. We will assume that each instruction of a method constitutes a program point, although it is possible to use fundamental blocks of instruc- tions as program points. The type, which is recorded by the dataow analysis for a
36、given location at a given program point must be consistent, irrespective of the execution path used to reach that program point. When there is a conict1596 S. Doyon, M. Debbabi / Computer Communications 23 (2000) 15941605 because two or more paths would yield different types of values for the same l
37、ocation, then we record for that location a common supertype of all the types that could actually occur. For instance, if at a given program point a certain loca-tion could contain either an instance of FileInputStream or an instance of ByteArrayInputStream, the dataow analysis merges the two types
38、and records the type Input- Stream instead. If there are no common supertypes for the possible types in a certain location, then the type unusable is used, indicating that the value cannot be used by the following instructions. This generalization of types does imply a loss of information and precis
39、ion. This is what makes the analysis conservative, in the sense that it is pessimistic. Types used in the dataow analysis are primitive types (single-word int or oat or double-word long or double) and reference types (the types associated to references to objects or arrays). A reference type may be
40、a class, interface or array type (which species a base type and a number of dimensions). The type returnAddress will be used to describe the return address to a subroutine, as created by the jsr instruction. The special type named unusable is used to mark uninitialized registers. The special referen
41、ce type null is used to represent the type of null references produced by the aconst_null instruction. Also note that implementations will generally use other special types to represent allocated but not yet initialized objects. 3. Object creation Creating a new object is done in two steps. First, s
42、pace for the object is allocated through the use of the new instruction, which returns a reference that points to the newly allocated memory space. Then, the object is initialized by invoking one of its constructors (a method named kinitl). For example, the Java statement new String() is translated
43、to the following bytecode instructions: ;allocate space for String and push ;reference to it onto the stack new java/lang/String ; duplicate top stack item (reference to ;newly allocated space) dup ; call String.String() constructor, uses ; up one of the references to newly allocated ;space as this
44、argument. invokespecial java/lang/String/ kinitl()V ;This leaves a reference to the new ;String object on the stack. The constructor is responsible for putting the object in a valid state. Until initialization of the new object completes, its state remains undened and may be inconsistent. The langua
45、ge semantics therefore disallows using a newly allo- cated object before it is initialized. Enforcing this is one of the veriers responsibilities. The verier must keep track of which object is initialized and which is not, ensure that proper constructors are used to initialize new objects and make s
46、ure that uninitialized objects are not used before they are initia- lized. This is one of the tricky points of the dataow analysis. Ref. 1 covers this aspect briey. Ref. 3 presents a detailed analysis and formal specication of the language rules related to object initialization. Unfortunately, neith
47、er Refs. 3 nor 1 discuss the interaction between object initialization and exception handlers. We will rst discuss the rules that the verier should enforce, and we will then consider how a verier implementation can enforce them. 3.1. Rules The verier must enforce the following properties: An object
48、must not be used before it is initialized. An uninitialized object must be initialized by one of the constructors declared in its class. A constructor from another class cannot be used. Notice that methods named kinitl are not inherited. An object must not be initialized more than once. If an except
49、ion is thrown by the call to the instance initialization method, then the new object must not be used because its initialization is incomplete. We rst discuss what it means for an uninitialized object (or rather a reference to it) to be used. The reference pushed onto the stack by the new instruction should be considered to have a special type, indicating that the object it points to is not initialized. The verier must al