Go to the first, previous, next, last section, table of contents.

Overview

The central data structure used by the internal representation is the tree. These nodes, while all of the C type tree, are of many varieties. A tree is a pointer type, but the object to which it points may be of a variety of types. From this point forward, we will refer to trees in ordinary type, rather than in this font, except when talking about the actual C type tree.

You can tell what kind of node a particular tree is by using the TREE_CODE macro. Many, many macros take a trees as input and return trees as output. However, most macros require a certain kinds of tree node as input. In other words, there is a type-system for trees, but it is not reflected in the C type-system.

For safety, it is useful to configure G++ with --enable-checking. Although this results in a significant performance penalty (since all tree types are checked at run-time), and is therefore inappropriate in a release version, it is extremely helpful during the development process.

Many macros behave as predicates. Many, although not all, of these predicates end in `_P'. Do not rely on the result type of these macros being of any particular type. You may, however, rely on the fact that the type can be compared to 0, so that statements like

if (TEST_P (t) && !TEST_P (y))
  x = 1;

and

int i = (TEST_P (t) != 0);

are legal. Macros that return int values now may be changed to return tree values, or other pointers in the future. Even those that continue to return int may return multiple non-zero codes where previously they returned only zero and one. Therefore, you should not write code like

if (TEST_P (t) == 1)

as this code is not guaranteed to work correctly in the future.

You should not take the address of values returned by the macros or functions described here. In particular, no guarantee is given that the values are lvalues.

In general, the names of macros are all in uppercase, while the names of functions are entirely in lower case. There are rare exceptions to this rule. You should assume that any macro or function whose name is made up entirely of uppercase letters may evaluate its arguments more than once. You may assume that a macro or function whose name is made up entirely of lowercase letters will evaluate its arguments only once.

The error_mark_node is a special tree. Its tree code is ERROR_MARK, but since there is only ever one node with that code, the usual practice is to compare the tree against error_mark_node. (This test is just a test for pointer equality.) If an error has occurred during front-end processing the flag errorcount will be set. If the front-end has encountered code it cannot handle, it will issue a message to the user and set sorrycount. When these flags are set, any macro or function which normally returns a tree of a particular kind may instead return the error_mark_node. Thus, if you intend to do any processing of erroneous code, you must be prepared to deal with the error_mark_node.

Occasionally, a particular tree slot (like an operand to an expression, or a particular field in a declaration) will be referred to as "reserved for the back-end." These slots are used to store RTL when the tree is converted to RTL for use by the GCC back-end. However, if that process is not taking place (e.g., if the front-end is being hooked up to an intelligent editor), then those slots may be used by the back-end presently in use.

If you encounter situations that do not match this documentation, such as tree nodes of types not mentioned here, or macros documented to return entities of a particular kind that instead return entities of some different kind, you have found a bug, either in the front-end or in the documentation. Please report these bugs as you would any other bug.

Trees: Macros and functions that can be used with all trees.
Identifiers: The names of things.
Containers: Lists and vectors.

Trees

This section is not here yet.

Identifiers

An IDENTIFIER_NODE represents a slightly more general concept that the standard C or C++ concept of identifier. In particular, an IDENTIFIER_NODE may contain a `$', or other extraordinary characters.

There are never two distinct IDENTIFIER_NODEs representing the same identifier. Therefore, you may use pointer equality to compare IDENTIFIER_NODEs, rather than using a routine like strcmp.

You can use the following macros to access identifiers:

IDENTIFIER_POINTER: The string represented by the identifier, represented as a char*. This string is always NUL-terminated, and contains no embedded NUL characters.
IDENTIFIER_LENGTH: The length of the string returned by IDENTIFIER_POINTER, not including the trailing NUL. This value of IDENTIFIER_POINTER (x) is always the same as strlen (IDENTIFIER_POINTER (x)).
IDENTIFIER_OPNAME_P: This predicate holds if the identifier represents the name of an overloaded operator. In this case, you should not depend on the contents of either the IDENTIFIER_POINTER or the IDENTIFIER_LENGTH.
IDENTIFIER_TYPENAME_P: This predicate holds if the identifier represents the name of a user-defined conversion operator. In this case, the TREE_TYPE of the IDENTIFIER_NODE holds the type to which the conversion operator converts.

Containers

Two common container data structures can be represented directly with tree nodes. A TREE_LIST is a singly linked list containing two trees per node. These are the TREE_PURPOSE and TREE_VALUE of each node. (Often, the TREE_PURPOSE contains some kind of tag, or additional information, while the TREE_VALUE contains the majority of the payload. In other cases, the TREE_PURPOSE is simply NULL_TREE, while in still others both the TREE_PURPOSE and TREE_VALUE are of equal stature.) Given one TREE_LIST node, the next node is found by following the TREE_CHAIN. If the TREE_CHAIN is NULL_TREE, then you have reached the end of the list.

A TREE_VEC is a simple vector. The TREE_VEC_LENGTH is an integer (not a tree) giving the number of nodes in the vector. The nodes themselves are accessed using the TREE_VEC_ELT macro, which takes two arguments. The first is the TREE_VEC in question; the second is an integer indicating which element in the vector is desired. The elements are indexed from zero.

Go to the first, previous, next, last section, table of contents.