Object-Oriented Techniques

Enumerated Types in Java

Definitions

Typical Usage

Constant Variable Implementation

Ordering and Iteration

Selection

Serialization and Parsing

Combinations of Enumerated Value

Quality Considerations

Class Implementation

Simple Static Types

Extensible Static Types

Comparable Types

Iteratable Types

Dynamic Types

Types with Data and Operations

Serialization and Parsing

Combinations of Enumerated Values

Quality Considerations

Further Reading

Note: this essay was written before Java gained explicit enumerated types. It has a few things that are still useful, and many more that have been made obsolete by the built-in enumerated types. I've left it available for those interested in the less-discussed options (e.g., dynamic types), and for those interested in the history of these ideas.

Definitions

An enumerated type is a type whose instances describe values, where the set of possible values is finite. Typically, an enumerated type is used when the most important information is the existence of the value. If the values have extensive data or nontrivial operations associated with them, then they are not usually considered enumerated types. From a purely analytical standpoint, however, there is little difference between an enumerated type and any other type which has a finite set of possible values.

In many languages other than Java, enumerated types are a language construct (e.g., C++ supports them with the keyword enum, while Pascal's scalar types use the TYPE keyword). A type that defines the unit vectors in a color space (e.g., enum color_value = {cyan, magenta, yellow}; or TYPE colorvalue = ( hue, saturation, value );) would fit the concept of an enumerated type. Since Java has no explicit language support, other techniques must be used. These techniques may also be of use where the language construct does not express the desired semantics; e.g., when implementing a dynamic enumerated type in C++.

A static enumerated type is an enumerated type whose set of possible values is fixed, and does not vary at run time. For example, a concept named Sex might be used in a biology context, with the possible values of asexual, male, female, or neuter. If the analysis is correct, these four values are the only possible values for the concept. There will never be another possible value, and none of these values will ever become obsolete.

A dynamic enumerated type is an enumerated type that can gain or lose values at run time. For example, an enumerated type of car models would need to be updated every year, as car manufacturers introduce new models. A program tracking car models could use a dynamic enumerated type to represent them. The enumerated type representing "current models" would be able to gain or lose values, while the enumerated type representing "all models throughout history" would only be able to gain values.

Enumerated types may be ordered or unordered, and ordered types may be singly or multiply ordered. Unordered types have no logical order. For example, the standard boolean type may be considered an unordered enumerated type: there is no logical reason to list the value true before or after the value false. For an ordered type, the possible values can be assigned a logical order. For example, an enumerated type covering the stages of a frog's life (egg, tadpole, adult) has a logical order. A multiply ordered type has more than one natural order. For example, a library catalogue may be able to present the collection of books in various catalogue systems (Library of Congress, Dewey Decimal, etc.) as well as alphabetical by author or title (although it is a stretch to imagine the books as an enumerated type).

Typical Usage

Programmers use enumerated types in a few typical ways.

Assignment: Variables and method/procedure/function arguments may be defined to be of this type, and the information used to convey the state or intention of the code. For example, the Java Swing library's AbstractButton uses a logical enumerated type (implemented as a bunch of int values) to define the horizontal alignment of the text on the button: is the text at the left, center, or right of the button (The actual values are found in SwingConstants). Programmers can assign the value SwingConstants.RIGHT to an appropriately typed variable or use the value as an argument to AbstractButton's setHorizontalTextPosition() method.

Comparison: A programmer may need to compare one value to another for identity or order (order only applies to ordered types, of course). For example, if the days of the week are suitably ordered, the programmer may implement a check to see if a day falls in the standard U.S. five-day work week by coding something like

if((day >= MONDAY) && (day <= FRIDAY)) {
    ...
}

Iteration: A programmer may need to invoke an operation for each possible value. For example, a program generating a histogram of events by the day of the week they occurred on might have some code like this:

FOR day := sunday TO saturday DO BEGIN
    countEventsOn(day);
END;

Selection: A programmer may need to choose an operation based on the value. For example, if employees are classified as having a pay method of salaried, commissioned, or hourly, the programmer may wish to invoke operations that are different based on the employee's pay method. The code might look like this:

switch(pay_method) {
    case HOURLY:
        pay = calculate_hourly_pay();
        break;
    case COMMISSIONED:
        pay = calculate_salaried_pay();
        pay += calculate_commissions();
        break;
    case SALARIED:
        pay = calculate_salaried_pay();
        break;
    default:
        report_program_logic_error();
        break;
}

Synonyms and Name Spaces: Programmers may find it convenient if some values have more than one name; for example, "middle" and "center" might be synonyms. Programmers may also wish to define values in different types that have the same name. For example, both the Color and Fruit types might contain a value "orange", in which the color orange is not the same as the fruit orange. In this case, color and fruit would be separate name spaces.

In most cases, it is simple to provide two or more names for the same value. For example, the SwingConstants package provides both the SwingConstants.TOP and SwingConstants.NORTH values, which may have the same underlying implementation value. Aliasing is typically implemented by defining the names to reference the same value, and can be done for all Java implementations the author is aware of.

In Java, a class is a name space. If there is a Color class and also a Fruit class, then each one may declare a constant named ORANGE. These two separate constants are referenced using the class name followed by the constant name: Color.ORANGE and Fruit.ORANGE. Under some implementations, these two variables could be compared, and (depending on the programmer's choices) would be either equal to each other, or not equal. Under other implementations, these two variables would be of different types, and would never be equal to each other.

Combinations: Programmers will sometimes wish to use an enumerated type to implement a set of "flags", of which all, some, or none of the values in the type may be combined to produce a legal value. In some languages, this is done by assigning various powers of two as the numeric implementations of the enumerated values. The programmer uses bitwise operators (and, or, exclusive or, not, and so on) to combine these values into a single value representing the desired set of values. The code might look something like:

#define READ 0x01
#define WRITE 0x02
#define EXECUTE 0x04
...
int permission = ...
if((permission & (READ | EXECUTE)) != 0) {
    ...
} else if((permission & READ) != 0) {
    ...
}

Constant Variable Implementation

Enumerated types may be implemented in Java as constant variable declarations. In this style, an enumerated type is informally assigned to a relevant class, and public static final variables of a suitable Java type are declared for the values. Commonly, the type is a primitive numeric type (int, byte, long, etc.), although variables of java.lang.String are also used. This implementation style is used extensively in the Swing package.

Constant Variable Implementation

/**
* A paragraph class, using constants to mimic
* enumerated types for style and alignment values.
*/
public class Para {
    public static final String STYLE_NORMAL = "p";
    public static final String STYLE_HEADER_1 = "h1";
    public static final String STYLE_HEADER_2 = "h2";
    public static final String STYLE_HEADER_3 = "h3";

    public static final int ALIGN_CENTER = 0;
    public static final int ALIGN_LEFT = 1;
    public static final int ALIGN_RIGHT = 2;

    ...
}

Naturally, the choice of actual implementation type is up to the programmer. Strings give a little more accessible information when debugging and may be chosen to assist with parsing, at the cost of a larger runtime. Integers are the most common choice, although bytes might be more efficient.

The class in which the constants are defined becomes the name space. Programmers are able to distinguish between two constants having the same name by using the class (e.g., the paragraph and plane "normal style" values would be distinguished by using Para.STYLE_NORMAL or Plane.STYLE_NORMAL). If the constants should exist in the same class, the programmer must choose unique names.

The code using such a class might look like this:

Assignment for Constant Variable Implementation

Para p = ...;
p.setStyle(Para.STYLE_NORMAL);
p.setAlignment(Para.ALIGN_CENTER);

Comparison for Constant Variable Implementation

if(p.getStyle() == Para.STYLE_HEADER_1) {
    ...
}

Ordering and Iteration

In the constant variable implementations, the type has an ordering that is identical to that of the underlying implementation. If numeric types are used, then the numeric order is the order imposed on the enumerated type. If objects are used, then the enumerated type may be unordered, or will have the order imposed by the underlying objects.

Iteration is handled using any of the looping statements supported by the language. If the underlying implementation is a numeric type, then iteration is easily handled like this:

Iteration for Constant Variable Implementation

for(int i = Para.ALIGN_CENTER, i < Para.ALIGN_RIGHT; i++) {
    int alignment = i;
    ...
}

Note that iteration is not easy to implement if the underlying constants are implemented as strings or other objects, and that selection using a switch statement can only be done with certain types as well. There seems to be no simple way to provide for multiple orderings or arbitrary subset iterations. The programmer could follow the iteration strategies outlined in the class implementation section, but there would be little advantage to the constant variable implementation in that case.

Selection

Selection for constant variable implementations is implemented as a switch/select statement, or as a sequence of if statements, or as a table lookup. Code can look something like this:

Selection for Constant Variable Implementation

Para p = ...;
Graphics g = ...;

if(p.getStyle() == Para.STYLE_NORMAL) {
    font = FONT_NORMAL;
} else if (p.getStyle() == Para.STYLE_HEADER_1) {
    font = FONT_HEADER_1;
} else if (p.getStyle() == Para.STYLE_HEADER_2) {
    font = FONT_HEADER_2;
} else if (p.getStyle() == Para.STYLE_HEADER_3) {
    font = FONT_HEADER_3;
}
g.setFont(font);

int left = 0
switch(p.getAlignment()) {
    case Para.ALIGN_CENTER:
        left = (width - p.getWidth())/2;
        break;
    case Para.ALIGN_RIGHT:
        left = width - p.getWidth();
        break;
    case Para.ALIGN_LEFT:
        left = 0;
        break;
}

Note that there is an extensibility problem with the selection code. This kind of selection (choosing program behavior based on a value) is a rather common occurrence in most programs. If the code contains many of these selection constructs (whether done as a switch/select statement, or as a series of if/else-if/else constructs, or even as a table), then adding or removing a value for the enumerated type becomes a difficult operation. Object-oriented languages such as Java, C++, or Eiffel provide a more easily extensible method for handling this task: polymorphism. Unfortunately, the choice of a constant variable implementation for a Java enumerated type makes polymorphism impossible. The author strongly recommends that a constant variable implementation be used only if there are no selection constructs based on values of the type (and if the programmer is convinced that there will never be any such constructs).

Used correctly, polymorphism also makes the code simpler. The details of the type are contained within the type definition, instead of being handled where the type is used. Thus, the programmer thinks about the type's internal details when implementing the type, instead of when implementing other code. Compare the example above with the same task implemented polymorphically.

Serialization and Parsing

Storing a record of an object, and recovering the object from the record, is a common task. It is covered quite well in [GOF 1994] under the Memento pattern. Java provides a serialization capability that handles storing object state and restoring it. The capability correctly handles arbitrary object graphs, with a few assumptions.

If the underlying implementation of the enumerated type is a primitive type (not an object), then serialization will work correctly without extra effort. For example, if the values are declared as bytes, the serialization code will store byte values and will later reconstitute the object with the appropriate byte values. The comparison operators still work correctly; in particular, the == operator will return the expected result.

However, if the underlying implementation of the enumerated type is an object (perhaps a String, as in the paragraph style example above), then serialization may work incorrectly. The serialization code works on an object graph, and deserializing produces a deep clone of the graph used to serialize. The serialization code assumes that it is correct to create a new object when it encounters a reference that it has not already created an object for, as is appropriate when doing a deep clone operation. However, an enumerated type by definition should have only a specific set of values. Cloning those values is inappropriate, as it means the == operator ceases to give the expected results.

Parsing in other contexts may raise similar issues. An XML write capability will require that the enumerated values be stored in XML format (however defined). The corresponding XML read capability will need to read the stored value, find the appropriate value in the enumerated type (not a clone, but the actual object, in the case of an underlying implementation that uses an object), and re-establish the reference to the value. If a string representation is used, the write capability becomes very simple, and the read capability is implemented with a simple lookup.

Combinations of Enumerated Values

When using the constant variable implementation, the programmer may choose an appropriate numeric representation in order to allow combinations (flags). The values are chosen and used in the same manner as traditional combining enumerated values.

Quality Considerations

The constant variable implementation can provide a very small and fast implementation. The storage space used can be minimized by using bytes as the underlying implementation for any enumerated type with few enough values. Iteration using an integer index is often faster than iteration using an Iterator object or similar construct. Also, programmers may find it convenient to be able to declare an enumerated type without thinking much about the actual semantics of the type.

When a class uses this kind of implementation for enumerated types, much of the compiler's type checking is bypassed. Thus, it's not easy to ensure that valid values are used. Each method with an enumerated type argument must check that the argument is in range, for example. For string implementations, the equals() method may produce different results from the == operator, which is appropriate for strings but inappropriate for enumerated types.

A problem with a numeric constant variable implementation is that it is language-legal to use the comparison operators (>, <, >=, <=) on enumerated values that have no logical ordering. It is also language-legal to compare enumerated values from different enumerated types, so long as their underlying types are comparable. Even more bizarre, it is language-legal to use the arithmetic operators on enumerated types with numeric implementations. For example, the expression

if((SwingConstants.LEFT + 2) < ZipFile.LOCNAM)
    ...

does not cause a compiler error, although the programmer's intent is obscure at best. (Very few programmers will produce code like this deliberately, but a careless or interrupted cut-and-paste session might produce something similar.)

The constant variable implementation makes multiply ordered types more error-prone. Which ordering do the comparison operators apply to, and how is the other ordering handled? The most apparent solution is to provide a static method in the surrounding class that compares two of these types in a particular ordering. Programmers will sometimes use the comparison operators for the wrong ordering, though. A bug like that could be difficult to find.

Class Implementation

When using a class implementation for an enumerated type, a class is defined as the enumerated type, and instances of the class (objects) are provided as the allowed values. The author first encountered the technique in [Geary 1996].

Simple Static Types

A static enumerated type is a type in which all possible values are known at compile time. Static enumerated types are implemented as classes with only private constructors. To allow access to the desired objects, the class provides a set of constants (public static final Class) that are pre-initialized to the appropriate values, or some other method of obtaining the values. These are the only objects that can be created, since the constructor is private.

Simple Static Enumerated Type

/**
* An enumerated type listing sexes of organisms.
*/
public class Sex {
    public static final Sex ASEXUAL = new Sex();
    public static final Sex MALE = new Sex();
    public static final Sex FEMALE = new Sex();
    public static final Sex NEUTER = new Sex();

    private Sex() {
    }
}

Note that it is possible to declare constant unique objects using an existing Java type, for example:

public static final Object ASEXUAL = new Object();

The most significant drawback to this is that the compiler can no longer perform type checking on the enumerated values; in general, this has all the drawbacks of both implementation methods. The author strongly recommends that a new class be defined; the few extra keystrokes are worth the compiler support. Only in cases where an extra class causes problems (perhaps in an embedded system where space is extremely tight) should the programmer consider using this implementation technique--and in that case, constant variable implementation using byte or some other primitive type as the underlying representation makes more sense.

This simple static enumerated type supports assignment and equality comparison only. Code using such a type might look like this:

Assignment for Class Implementation

Animal a = ...;
a.setSex(Sex.ASEXUAL);

Comparison for Class Implementation

if(a.getSex() == Sex.MALE) {
    ...
}

These code fragments apply equally to the following variants of class implementations, as well.

The enumerated type class becomes the name space for the values. Because the values are defined within the type's class, the class name becomes a part of every use of the value. The name space is enforced by the compiler.

In order to reduce the number of visible classes in a package, an enumerated type may be made into an inner class, with the outer class exposing the constants.

Extensible Static Types

If the static enumerated type should be extensible (that is, if a programmer should be able to define new values of the type), the constructor can be declared protected. The programmer then creates a subclass of the enumerated type class which defines new constants, using the original constructor or the new class' constructor. Since this requires editing code to create the new values, the type is still static.

Extensible Static Enumerated Type

/**
* An enumerated type listing known color schemes.
*/
public class ColorScheme {
    public static final ColorScheme RGB = new ColorScheme();
    public static final ColorScheme CMYK = new ColorScheme();

    protected ColorScheme() {
    }
}

/**
* An enumerated type listing more color schemes.
*/
public class ExtendedColorScheme extends ColorScheme {
    public static final ColorScheme YES = new ColorScheme();

    private ExtendedColorScheme() {
        // This c'tor should never be used; use ColorScheme()
        throw new IllegalStateException();
    }
}

Note that the extending class declares a private (or protected) constructor, as well. If there is some extra functionality available in the extending class, then the extending class' constructor(s) may be useful. Otherwise, it makes sense to disable them, as in the above example.

Comparable Types

The basic enumerated value has only an identity. That is, the value either is, or is not, the same as another value. There is no concept of "before," "after," "greater than," or "less than". A comparable enumerated type does include these "ordering" concepts. For example, there is probably no natural order for the countries in South America, while there is a natural order for the months of the Gregorian calendar.

Java provides an interface, java.lang.Comparable, which can be used to implement the functionality. This interface is used in tree implementations of the collection classes, for example, to sort the stored objects. Values from an enumerated type that implements Comparable can be compared to each other in a standardized way.

Comparable Enumerated Type

public class Month implements Comparable {
    public static final Month JANUARY = new Month(1);
    public static final Month FEBRUARY = new Month(2);
    ...

    private int _ordinal;

    private Month(int ordinal) {
        _ordinal = ordinal;
    }

    public int compareTo(Object other) {
        if(!(other instanceof Month)) {
            throw new ClassCastException();
        } else if (_ordinal < ((Month)other)._ordinal) {
            return -1;
        } else if (_ordinal > ((Month)other)._ordinal) {
            return 1;
        } else {
            return 0;
        }
    }
}

Sometimes, there is more than one natural order. For example, the enumeration of months in the business year probably has at least two orders (the fiscal year and the calendar year). In the case of multiple natural orderings, the programmer must decide whether one of the orderings is clearly dominant. If so, the Comparable interface may be used for the dominant ordering, and the subordinate orderings can be implemented using similar methods suitably named. If none of the orderings are dominant, then the programmer should not use the Comparable interface.

Iteratable Types

In many (but not all) cases, it is useful to be able to iterate through the values of an enumerated type. The programmer may want to initialize a counter, or categorize some data by sorting it into value-labelled bins, or whatever. In this case, it is important that the programmer be able to obtain the defined values for the type.

This points out the need to be able to treat the values of an enumerated type as a collection. The programmer should be able to iterate through the values, find the number of values that have been defined, and so on. This collection is logically part of the type, however, and is not associated with any particular instance of that type or with any object making use of the type. Thus, in Java, we provide static methods for collection access (the static collection itself remains private, for the usual reasons).

Iteratable Enumerated Type

/**
* An enumerated type listing all car models.
*/
public class CarModel {
    private static Collection _allModels = new HashSet();

    public static final CarModel Prowler =
                new CarModel(CarMaker.Ford);
    public static final CarModel WillysJeep =
                new CarModel(CarMaker.Willys);
    ...

    protected CarModel(CarMaker maker) {
        ...
        _allModels.add(this);
    }

    public static Iterator getModels() {
        return _allModels.iterator();
    }

    public static int getModelCount() {
        return _allModels.size();
    }
}

If the type has a natural order, the class should provide an iterator that presents the elements in that order. If the type has more than one natural order, the class should present an iterator for each of the natural orders. If the type has no natural order, the iterator may use any convenient order.

Code that iterates through values of an enumerated type can look something like this:

Iteration for Class Implementation

Iterator it = CarModel.getModels();
while(it.hasNext()) {
    CarModel model = (CarModel) it.next();
    ...
}

Sometimes, a type will have subset collections, as well. In the car model example, the programmer may find that she needs to iterate over the models produced by a single manufacturer. If this iteration is encountered in enough places, it would be worthwhile to add another collection to the class' static data:

Multiply Iteratable Enumerated Type

/**
* An enumerated type listing all car models.
*/
public class CarModel {
    private static Collection _allModels = new HashSet();
    private static Map _modelsByMaker = new HashMap();

    public static final CarModel Prowler =
                new CarModel(CarMaker.Ford);
    public static final CarModel WillysJeep =
                new CarModel(CarMaker.Willys);
    ...

    protected CarModel(CarMaker maker) {
        ...
        _allModels.add(this);
        if(!_modelsByMaker.containsKey(maker)) {
            HashSet set = new HashSet();
            set.add(this);
            _modelsByMaker.put(maker, set);
        } else {
            HashSet set = (HashSet) _modelsByMaker.get(maker);
            set.add(this);
        }
    }

    public static Iterator getModels() {
        return _allModels.iterator();
    }

    public static int getModelCount() {
        return _allModels.size();
    }

    public static Iterator getModelsByMaker(CarMaker maker) {
        if(_modelsByMaker.contains(maker)) {
            HashSet set = (HashSet) _modelsByMaker.get(maker);
            return set.iterator();
        }
        return EmptyIterator.get();
    }

    public static int getModelCountByMaker(CarMaker maker) {
        if(_modelsByMaker.contains(maker)) {
            HashSet set = (HashSet) _modelsByMaker.get(maker);
            return set.size();
        }
        return 0;
    }
}

Iteratable types may also be implemented by providing a public array of the types, instead of access to a collection. This approach works fairly well for static types; it's difficult to use with dynamic types. Multiply iteratable types provide multiple arrays, as might be expected. The benefit is that it is faster to iterate directly through an array than to iterate through a collection using an terator (though an optimizing compiler may be able to reduce or eliminate the difference). The array is slightly harder to maintain--the programmer must remember to add any new values to the array, as well as creating them. And if the system changes so that the type becomes dynamic, it's harder to refactor the array implementation than the collection implementation.

Iteratable Enumerated Type (Array Implementation)

/**
* An enumerated type listing all car models.
*/
public class CarModel {
    public static final CarModel Prowler =
                new CarModel(CarMaker.Ford);
    public static final CarModel WillysJeep =
                new CarModel(CarMaker.Willys);
    ...

    public static final CarModel[] MODELS = {
        Prowler, WillysJeep, ...
    }

    protected CarModel(CarMaker maker) {
        ...
    }
}

Iteration for Array Implementation

for(int i = 0; i < CarModel.MODELS.length; i++) {
    CarModel model = CarModel.MODELS[i];
    ...
}

The array implementation is discussed in more detail in [Armstrong 1997].

Dynamic Types

The values of an enumerated type may change over time. In the above example for car models, manufacturers introduce new models on a yearly basis. To avoid a constant maintenance effort, the car model type should support the adding of new values without recompiling. In other cases, the type should also support removal of values, although the programmer must take care that the values are not in use. Multiply iteratable types may support removal of the value from one logical collection but not another.

A dynamic type is implemented with a public constructor; the constructor adds a new value to the type. If values may be removed, the class also contains a method to remove a value.

Dynamic Iteratable Enumerated Type

/**
* An enumerated type listing all car models.
*/
public class CarModel {
    private static Collection _allModels = new HashSet();

    public static final CarModel Prowler =
                new CarModel(CarMaker.Ford);
    public static final CarModel WillysJeep =
                new CarModel(CarMaker.Willys);
    ...

    public CarModel(CarMaker maker) {
        ...
        _allModels.add(this);
    }

    public static Iterator getModels() {
        return _allModels.iterator();
    }

    public static deleteModel(CarModel model) {
        _allModels.remove(model);
    }
}

Types with Data and Operations

In the above discussion of class implementation, the enumerated type has been used solely for the existance of the various values. Programmers also use enumerated types to select associated values. For example, the programmer may want to find the font to use for each style of paragraph, where paragraph style is an enumerated type. In languages without polymorphism, this is typically implemented as a chain of if/else-if/else statements, a switch/select statement, or a table, found in each place the value is used. If the language supports polymorphism, as does Java, then the programmer should make use of this in the implementation.

To support this, enumerated type values may be associated with other information. The class is implemented with the appropriate attributes, and each instance is queried for the information.

An analagous implementation works well for operations that depend on the value of the type. The class declares a method signature for the operation, and the programmer provides the appropriate implementation for each value in the type. This may be done by subclassing the type and using instances of the various subclasses as the type values, or it may be done using a more complex implementation choice (e.g., the Strategy pattern [GOF 1994]). As the operations and attributes become more important to the type, this approach becomes difficult to distinguish from general (non-enumerated) object-oriented type design.

Class Implementation with Data

/**
* A paragraph class, using an enumerated type for
* style values, which are associated with a font.
*/
public class Para {
    public static final Style NORMAL = Style.NORMAL;
    public static final Style HEADER_1 = Style.HEADER_1;
    public static final Style HEADER_2 = Style.HEADER_2;
    public static final Style HEADER_3 = Style.HEADER_3;

    public static class Style {
        public static final Style NORMAL = new Style(Fn);
        public static final Style HEADER_1 = new Style(F1);
        public static final Style HEADER_2 = new Style(F2);
        public static final Style HEADER_3 = new Style(F3);

        private Font _font;

        private Style(Font font) {
            _font = font;
        }

        public Font getFont() {
            return _font;
        }
    }
}

Selection for Class Implementation with Data

Para p = ...;
Graphics g = ...;
g.setFont(p.getStyle().getFont());

Serialization and Parsing

As mentioned above, Java provides a serialization capability that handles storing object state and restoring it. Unfortunately, the default implementation of Java serialization assumes that it is correct to create any objects that it needs. This assumption is incorrect when an enumerated class is present. Specifically, if an object references a particular value from an enumerated type when it is saved, the restored object must reference the same value, not a copy of the value.

If used in a parsing context, the enumerated type should provide a static method that takes a token (a String, or whatever is to be parsed) and returns the associated constant, returning null or throwing an exception if the token does not represent a valid value. For example, in a class that implements an XML parsing scheme based on the org.w3c.dom package, the read method will parse the node for whatever information is needed to find the appropriate enumerated type (e.g., the tag, or the appropriate attribute value), and will then look up and use the corresponding value in the enumerated type.

Enumerated Type supporting XML Parsing as an Attribute

/**
* An enumerated type for horizontal alignment,
* supporting XML parsing.
*/
public class Align {
    private static Collection _all = new HashMap();

    public static final Align LEFT = new Align("left");
    public static final Align CENTER = new Align("center");
    public static final Align RIGHT = new Align("right");

    private String _tag;

    private Align(String tag) {
        _tag = tag;
        _all.put(tag, this);
    }

    public void write(PrintWriter writer) {
        writer.print("\"" + _tag + "\"");
    }

    public static Align read(Attr attribute) {
        if(_all.containsKey(attribute.getValue())) {
            return (Align)_all.get(attribute.getValue());
        }
        return null;
    }

    public String getTag() {
        return _tag;
    }
}

XML Element Class using an Enumerated Attribute

public class ImageNode implements MyXMLNodeClass {
    private Align _align;
    ...

    public void write(PrintWriter writer) {
        writer.print("<image");
        ...
        if (_align != null) {
            writer.print(" align="):
            _align.write(writer);
        }
        writer.println("/>");
    }

    public ImageNode read(Element element, Node parent) {
        ...
        Attr attr = element.getAttributeNode("align");
        if (attr != null) {
            _align = Align.read(attr);
        }
        return this;
    }
}

For a serializable enumerated type, the class must explicitly implement the readResolve() method. Otherwise, the object created by a read will not be one of the valid values. Additionally, the class must provide a no-argument public constructor. This constructor is used by the serialization code to create the object about to be read from the serialized object graph. This no-argument constructor should not place the object into any collections, as the object will immediately be replaced by the one returned in readResolve(). Programmers must be careful not to use the public no-argument constructor, as this will produce an invalid object.

Enumerated Type supporting Serialziation

/**
* An enumerated type for horizontal alignment,
* supporting serialization.
*/
public class Align implements Serializable {
    private static Collection _all = new HashMap();

    public static final Align LEFT = new Align("left");
    public static final Align CENTER = new Align("center");
    public static final Align RIGHT = new Align("right");

    private String _tag;

    /** @deprecated Use only for serialization. */
    public Align() {
    }

    private Align(String tag) {
        _tag = tag;
        _all.put(tag, this);
    }

    public String getTag() {
        return _tag;
    }

    public static Align find(String tag) {
        if(_all.containsKey(tag)) {
            return (Align)_all.get(tag);
        }
        return null;
    }

    public Object readResolve() {
        Object result = find(_tag);
        if(result != null) {
            return result;
        } else {
            return CENTER;    // default value
        }
    }
}

The above implementation of readResolve() returns a default value if the enumerated type cannot be resolved to one of the legal values. This case indicates a failure of some sort; perhaps the serialized file has been corrupted, or perhaps the enumerated type class has been changed to remove a value. Returning a default value is probably appropriate for released code, in which graceful recovery from a runtime failure is important. In code under development, it would be more appropriate to throw an exception or error, cause an assertion failure, or to do something else that makes the failure obvious.

Programmers may wish to use serialization with static enumerated type classes that have arbitrary attributes. The programmer can take advantage of one of the properties of enumerated types: the identity of a value is the important information. The code works if it stores only enough information to identify the value--that is, only enough to implement the find() method. All other variables are not written (declared transient, if using java.io.Serializable). If serialized data size is important, the programmer can assign consecutive bytes or integers to the values as part of the constructor, and use these numbers as the serialized representation.

Since the Java serialization code includes class information when serializing an object, a class with only one non-transient member variable is not the most efficient possible representation. If serialized data size is much more important than speed of access to the value, consider declaring the enumerated type without the Serializable interface. The class using the enumerated type can then store the value as a numeric primitive type (in an int variable, say). Accessor methods would use the enumerated type's find() method to look up the numeric value and return the enumerated value.

Combinations of Enumerated Values

At times, programmers use an enumerated type to describe a set of values that may be combined. The typical numeric implementation, often called "flags", involves assigning consecutive powers of two to the values, and then using the bitwise operators to combine the values.

The class implementation of enumerated types allows a more type-safe method. Conceptually, the programmer wishes to use a collection of the values for something, perhaps as an argument to a method or as a value for a variable. This collection may have special semantics--for example, the programmer may decide that the presence of three values implies the presence of a fourth, even if it is not explicitly part of the collection.

To implement this, the programmer creates an interface and a specialized collection class. The enumerated type class and the collection class both implement the interface. The programmer declares any variables or arguments as the interface if the semantics permit a collection or a single value, as the collection if the semantics demand only a collection, and as the enumerated type if the semantics demand only a single value. The following code sample shows a possible implementation for a set of file permissions:

Combinations of Enumerated Types (Flags)

/** An interface for file permissions. */
public interface Permission {
    public boolean allows(PType type);
}

/** An enumerated type for file permissions. */
public class PType implements Permission {
    public static final PType READ = new PType();
    public static final PType WRITE = new PType();
    public static final PType EXECUTE = new PType();

    private PType() {
    }

    public boolean allows(PType type) {
        return (type == this);
    }
}

/** A collection for file permissions. */
public class PSet implements Permission {
    private Set _set;

    public PSet() {
        _set = new HashSet();
    }

    public void add(PType type) {
        _set.add(type);
    }

    public boolean allows(PType type) {
        return (_set.contains(type));
    }
}

While this implementation is certainly more cumbersome than the minimalist numeric implementation, it does provide full type checking and more flexible semantics. Programmers concerned with code speed can gain most of the benefits of both implementation styles by using the class implementation's public signature while implementing the methods using bitwise numeric values for the underlying representation.

This section was inspired by comments in [Thimbleby 1999].

Quality Considerations

The class implementation provides a flexible, robust, and potentially full-featured enumerated type. The data size is the same or larger than that of a constant variable implementation. For code size, there is the overhead of defining a new class. However, this can often be compensated for by the removal of selection statements scattered through the code. While the total code size may increase or decrease, this provides a clear benefit to maintainability and correctness of the overall program.

When a programmer uses a class implementation for enumerated types, the full power of the compiler's type checking can be brought to bear. Method arguments and variables can be assumed to contain appropriate values (something that must be checked when using a constant variable implementation).

Code speed is also unlikely to change dramatically. The values are allocated at program start, a time penalty that is avoided by using numeric constants. The comparison operators appear to be slightly faster for integer types than object types (e.g., the == operator for objects appears slower than for ints but faster than for longs). This should be balanced against the need to ensure that values are in the correct range when using numeric constants.

The language can be used to enforce comparison and iteration operations. Adding a new value to the type does not change the use of these constructs; the type itself provides for the appropriate ordering. Thus, the programmer can safely add new values, and be confident that comparisons and iteration will continue to work as expected (of course, the programmer will need to add any new functionality).

The class implementation allows for functionality that is difficult to provide using the constant variable implementation. Unlike C++'s enum or Pascal's scalar type, the values in an enumerated type can have data and operations associated with them, allowing the programmer to convert many selection constructs to simple queries. The programmer may choose to allow the enumerated type's values to vary at runtime, or to allow the enumerated type to present multiple orderings and iterations.

The class implementation is clearly more trouble to program, if one considers only the type itself. If the concept behind the type is numeric (e.g., the enumerated type represents "the decimal digits"), then a numeric implementation may be the best choice. However, if the concept is not numeric, the apparent simplicity of the constant variable implementation (and of its cousin, the C-style enum type) is often illusory. As the system evolves, enumerated types tend to add data or operations, change their order or their endpoints, and so on. With the constant variable implementation, the programmer must change the (scattered) code referencing the type's values; with the class implementation, the programmer changes only the type itself.

As a final consideration, it is easier to convert a class implementation to a constant variable implementation than the other way around. Thus, if the benefits of a constant variable implementation become crucial, the programmer will have a relatively easy time of converting. If the type is first implemented as constant variables, the programmer will typically have a more difficult time converting, should the flexibility of the class implementation become necessary.

Further Reading

Joshua Bloch [Bloch 2001] in an online excerpt from his book Effective Java Programming offers another view on this topic. He has some ideas I'd missed.

Thomas E. Davis [Davis 1999] offers an implementation for a base class from which enumerated types can be derived. It handles some of the serialization duties, and implements a general-purpose find() method.