Go to the first, previous, next, last section, table of contents.
This section describes some of the routines used in the C++ front-end.
build_vtable
and prepare_fresh_vtable
is used only within
the `cp-class.c' file, and only in finish_struct
and
modify_vtable_entries
.
build_vtable
, prepare_fresh_vtable
, and
finish_struct
are the only routines that set DECL_VPARENT
.
finish_struct
can steal the virtual function table from parents,
this prohibits related_vslot from working. When finish_struct steals,
we know that
get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
will get the related binfo.
layout_basetypes
does something with the VIRTUALS.
Supposedly (according to Tiemann) most of the breadth first searching
done, like in get_base_distance
and in get_binfo
was not
because of any design decision. I have since found out the at least one
part of the compiler needs the notion of depth first binfo searching, I
am going to try and convert the whole thing, it should just work. The
term left-most refers to the depth first left-most node. It uses
MAIN_VARIANT == type
as the condition to get left-most, because
the things that have BINFO_OFFSET
s of zero are shared and will
have themselves as their own MAIN_VARIANT
s. The non-shared right
ones, are copies of the left-most one, hence if it is its own
MAIN_VARIANT
, we know it IS a left-most one, if it is not, it is
a non-left-most one.
get_base_distance
's path and distance matters in its use in:
prepare_fresh_vtable
(the code is probably wrong)
init_vfields
Depends upon distance probably in a safe way,
build_offset_ref might use partial paths to do further lookups,
hack_identifier is probably not properly checking access.
get_first_matching_virtual
probably should check for
get_base_distance
returning -2.
resolve_offset_ref
should be called in a more deterministic
manner. Right now, it is called in some random contexts, like for
arguments at build_method_call
time, default_conversion
time, convert_arguments
time, build_unary_op
time,
build_c_cast
time, build_modify_expr
time,
convert_for_assignment
time, and
convert_for_initialization
time.
But, there are still more contexts it needs to be called in, one was the
ever simple:
if (obj.*pmi != 7) ...Seems that the problems were due to the fact that
TREE_TYPE
of
the OFFSET_REF
was not a OFFSET_TYPE
, but rather the type
of the referent (like INTEGER_TYPE
). This problem was fixed by
changing default_conversion
to check TREE_CODE (x)
,
instead of only checking TREE_CODE (TREE_TYPE (x))
to see if it
was OFFSET_TYPE
.
current_member_init_list
contains the list of
mem-initializers specified in a constructor declaration. For example:
foo::foo() : a(1), b(2) {}will initialize `a' with 1 and `b' with 2.
expand_member_init
places each initialization (a with 1) on the
global list. Then, when the fndecl is being processed,
emit_base_init
runs down the list, initializing them. It used to
be the case that g++ first ran down current_member_init_list
,
then ran down the list of members initializing the ones that weren't
explicitly initialized. Things were rewritten to perform the
initializations in order of declaration in the class. So, for the above
example, `a' and `b' will be initialized in the order that
they were declared:
class foo { public: int b; int a; foo (); };Thus, `b' will be initialized with 2 first, then `a' will be initialized with 1, regardless of how they're listed in the mem-initializer.
explicit
on a constructor is used by grokdeclarator
to set the field DECL_NONCONVERTING_P
. That value is used by
build_method_call
and build_user_type_conversion_1
to decide
if a particular constructor should be used as a candidate for conversions.
is_normal
depends upon this.
FIELD_DECL
s that are pointer types that point to
vtables. See also vtable and vfield.
This section describes some of the macros used on trees. The list should be alphabetical. Eventually all macros should be documented here.
BINFO_BASETYPES
BINFO_INHERITANCE_CHAIN
Z ZbY least derived | Y YbX | X Xb most derived TYPE_BINFO (X) == Xb BINFO_INHERITANCE_CHAIN (Xb) == YbX BINFO_INHERITANCE_CHAIN (Yb) == ZbY BINFO_INHERITANCE_CHAIN (Zb) == 0Not sure is the above is really true, get_base_distance has is point towards the most derived type, opposite from above. Set by build_vbase_path, recursive_bounded_basetype_p, get_base_distance, lookup_field, lookup_fnfields, and reverse_path. What things can this be used on: TREE_VECs that are binfos
BINFO_OFFSET
BINFO_VIRTUALS
BINFO_VTABLE
BLOCK_SUPERCONTEXT
CLASSTYPE_TAGS
CLASSTYPE_METHOD_VEC
CLASSTYPE_VFIELD
DECL_CLASS_CONTEXT
struct A { virtual int f (); }; struct B : A { int f (); }; DECL_CONTEXT (A::f) == A DECL_CLASS_CONTEXT (A::f) == A DECL_CONTEXT (B::f) == A DECL_CLASS_CONTEXT (B::f) == BHas values of: RECORD_TYPEs, or UNION_TYPEs What things can this be used on: TYPE_DECLs, _DECLs
DECL_CONTEXT
VAR_DECLs that are virtual function tables _DECLs
DECL_FIELD_CONTEXT
FIELD_DECLs that are virtual function pointers FIELD_DECLs
DECL_NAME
0 for things that don't have names IDENTIFIER_NODEs for TYPE_DECLs
DECL_IGNORED_P
DECL_VIRTUAL_P
DECL_VPARENT
DECL_FCONTEXT
DECL_REFERENCE_SLOT
DECL_VINDEX
DECL_SOURCE_FILE
DECL_SOURCE_LINE
0 for an undefined label 0 for TYPE_DECLs that are internally generated 0 for FUNCTION_DECLs for functions generated by the compiler (not yet, but should be) 0 for "magic" arguments to functions, that the user has no control over
TREE_USED
TREE_ADDRESSABLE
TREE_COMPLEXITY
TREE_HAS_CONSTRUCTOR
TREE_PRIVATE
TREE_PROTECTED
TYPE_BINFO
TYPE_BINFO_BASETYPES
TYPE_BINFO_VIRTUALS
TYPE_BINFO_VTABLE
TYPE_NAME
0 for things that don't have names. should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and ENUM_TYPEs. TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but shouldn't be. TYPE_DECL for typedefs, unsure why.What things can one use this on:
TYPE_DECLs RECORD_TYPEs UNION_TYPEs ENUM_TYPEsHistory: It currently points to the TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
TYPE_METHODS
CLASSTYPE_METHOD_VEC
. Chained together with
TREE_CHAIN
. `dbxout.c' uses this to get at the methods of a
class.
TYPE_DECL
typedef int foo;
is
seen.
DECL_SOURCE_LINE identifies what source line number in the
source file the declaration was found at. A value of 0
indicates that this TYPE_DECL is just an internal binding layer
marker, and does not correspond to a user supplied typedef.
DECL_SOURCE_FILE
TYPE_FIELDS
TREE_CHAIN
) of member types of a class. The
list can contain TYPE_DECL
s, but there can also be other things
in the list apparently. See also CLASSTYPE_TAGS
.
TYPE_VIRTUAL_P
FIELD_DECL
or a VAR_DECL
, indicates it is
a virtual function table or a pointer to one. When used on a
FUNCTION_DECL
, indicates that it is a virtual function. When
used on an IDENTIFIER_NODE
, indicates that a function with this
same name exists and has been declared virtual.
When used on types, it indicates that the type has virtual functions, or
is derived from one that does.
Not sure if the above about virtual function tables is still true. See
also info on DECL_VIRTUAL_P
.
What things can this be used on:
FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
VF_BASETYPE_VALUE
finish_base_struct
time.
What things can this be used on:
TREE_LISTs that are vfields
History:
This field was used to determine if a virtual function table's
slot should be filled in with a certain virtual function, by
checking to see if the type returned by VF_BASETYPE_VALUE was a
parent of the context in which the old virtual function existed.
This incorrectly assumes that a given type _could_ not appear as
a parent twice in a given inheritance lattice. For single
inheritance, this would in fact work, because a type could not
possibly appear more than once in an inheritance lattice, but
with multiple inheritance, a type can appear more than once.
VF_BINFO_VALUE
TREE_VIA_VIRTUAL
on result to find out if it is a virtual base class. Related to the
binfo found by
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)where `t' is the type that has the given vfield.
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)will return the binfo for the given vfield. May or may not be set at
modify_vtable_entries
time. Set at
finish_base_struct
time.
What things can this be used on:
TREE_LISTs that are vfields
VF_DERIVED_VALUE
finish_base_struct
time.
What things can this be used on:
TREE_LISTs that are vfields
VF_NORMAL_VALUE
finish_base_struct
time.
What things can this be used on:
TREE_LISTs that are vfields
WRITABLE_VTABLES
Whenever seemingly normal code fails with errors like
syntax error at `\{'
, it's highly likely that grokdeclarator is
returning a NULL_TREE for whatever reason.
It should never be that case that trees are modified in-place by the back-end, unless it is guaranteed that the semantics are the same no matter how shared the tree structure is. `fold-const.c' still has some cases where this is not true, but rms hypothesizes that this will never be a problem.
A template is represented by a TEMPLATE_DECL
. The specific
fields used are:
DECL_TEMPLATE_RESULT
DECL_TEMPLATE_PARMS
The generic decl is parsed as much like any other decl as possible, given the parameterization. The template decl is not built up until the generic decl has been completed. For template classes, a template decl is generated for each member function and static data member, as well.
Template members of template classes are represented by a TEMPLATE_DECL for the class' parameters around another TEMPLATE_DECL for the member's parameters.
All declarations that are instantiations or specializations of templates refer to their template and parameters through DECL_TEMPLATE_INFO.
How should I handle parsing member functions with the proper param decls? Set them up again or try to use the same ones? Currently we do the former. We can probably do this without any extra machinery in store_pending_inline, by deducing the parameters from the decl in do_pending_inlines. PRE_PARSED_TEMPLATE_DECL?
If a base is a parm, we can't check anything about it. If a base is not a parm, we need to check it for name binding. Do finish_base_struct if no bases are parameterized (only if none, including indirect, are parms). Nah, don't bother trying to do any of this until instantiation -- we only need to do name binding in advance.
Always set up method vec and fields, inc. synthesized methods. Really? We can't know the types of the copy folks, or whether we need a destructor, or can have a default ctor, until we know our bases and fields. Otherwise, we can assume and fix ourselves later. Hopefully.
The function compute_access returns one of three values:
access_public
access_protected
access_private
DECL_ACCESS is used for access declarations; alter_access creates a list of types and accesses for a given decl.
Formerly, DECL_{PUBLIC,PROTECTED,PRIVATE} corresponded to the return codes of compute_access and were used as a cache for compute_access. Now they are not used at all.
TREE_PROTECTED and TREE_PRIVATE are used to record the access levels granted by the containing class. BEWARE: TREE_PUBLIC means something completely unrelated to access control!
The C++ front-end uses a call-back mechanism to allow functions to print
out reasonable strings for types and functions without putting extra
logic in the functions where errors are found. The interface is through
the cp_error
function (or cp_warning
, etc.). The
syntax is exactly like that of error
, except that a few more
conversions are supported:
There is some overlap between these; for instance, any of the node
options can be used for printing an identifier (though only %D
tries to decipher function names).
For a more verbose message (class foo
as opposed to just foo
,
including the return type for functions), use %#c
.
To have the line number on the error message indicate the line of the
DECL, use cp_error_at
and its ilk; to indicate which argument you want,
use %+D
, or it will default to the first.
Some comments on the parser:
The after_type_declarator
/ notype_declarator
hack is
necessary in order to allow redeclarations of TYPENAME
s, for
instance
typedef int foo; class A { char *foo; };
In the above, the first foo
is parsed as a notype_declarator
,
and the second as a after_type_declarator
.
Ambiguities:
There are currently four reduce/reduce ambiguities in the parser. They are:
1) Between template_parm
and
named_class_head_sans_basetype
, for the tokens aggr
identifier
. This situation occurs in code looking like
template <class T> class A { };
It is ambiguous whether class T
should be parsed as the
declaration of a template type parameter named T
or an unnamed
constant parameter of type class T
. Section 14.6, paragraph 3 of
the January '94 working paper states that the first interpretation is
the correct one. This ambiguity results in two reduce/reduce conflicts.
2) Between primary
and type_id
for code like `int()'
in places where both can be accepted, such as the argument to
sizeof
. Section 8.1 of the pre-San Diego working paper specifies
that these ambiguous constructs will be interpreted as typename
s.
This ambiguity results in six reduce/reduce conflicts between
`absdcl' and `functional_cast'.
3) Between functional_cast
and
complex_direct_notype_declarator
, for various token strings.
This situation occurs in code looking like
int (*a);
This code is ambiguous; it could be a declaration of the variable `a' as a pointer to `int', or it could be a functional cast of `*a' to `int'. Section 6.8 specifies that the former interpretation is correct. This ambiguity results in 7 reduce/reduce conflicts. Another aspect of this ambiguity is code like 'int (x[2]);', which is resolved at the '[' and accounts for 6 reduce/reduce conflicts between `direct_notype_declarator' and `primary'/`overqualified_id'. Finally, there are 4 r/r conflicts between `expr_or_declarator' and `primary' over code like 'int (a);', which could probably be resolved but would also probably be more trouble than it's worth. In all, this situation accounts for 17 conflicts. Ack!
The second case above is responsible for the failure to parse 'LinppFile ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave Math.h++) as an object declaration, and must be fixed so that it does not resolve until later.
4) Indirectly between after_type_declarator
and parm
, for
type names. This occurs in (as one example) code like
typedef int foo, bar; class A { foo (bar); };
What is bar
inside the class definition? We currently interpret
it as a parm
, as does Cfront, but IBM xlC interprets it as an
after_type_declarator
. I believe that xlC is correct, in light
of 7.1p2, which says "The longest sequence of decl-specifiers that
could possibly be a type name is taken as the decl-specifier-seq of
a declaration." However, it seems clear that this rule must be
violated in the case of constructors. This ambiguity accounts for 8
conflicts.
Unlike the others, this ambiguity is not recognized by the Working Paper.
Note, exception handling in g++ is still under development.
This section describes the mapping of C++ exceptions in the C++ front-end, into the back-end exception handling framework.
The basic mechanism of exception handling in the back-end is unwind-protect a la elisp. This is a general, robust, and language independent representation for exceptions.
The C++ front-end exceptions are mapping into the unwind-protect semantics by the C++ front-end. The mapping is describe below.
When -frtti is used, rtti is used to do exception object type checking, when it isn't used, the encoded name for the type of the object being thrown is used instead. All code that originates exceptions, even code that throws exceptions as a side effect, like dynamic casting, and all code that catches exceptions must be compiled with either -frtti, or -fno-rtti. It is not possible to mix rtti base exception handling objects with code that doesn't use rtti. The exceptions to this, are code that doesn't catch or throw exceptions, catch (...), and code that just rethrows an exception.
Currently we use the normal mangling used in building functions names (int's are "i", const char * is PCc) to build the non-rtti base type descriptors for exception handling. These descriptors are just plain NULL terminated strings, and internally they are passed around as char *.
In C++, all cleanups should be protected by exception regions. The region starts just after the reason why the cleanup is created has ended. For example, with an automatic variable, that has a constructor, it would be right after the constructor is run. The region ends just before the finalization is expanded. Since the backend may expand the cleanup multiple times along different paths, once for normal end of the region, once for non-local gotos, once for returns, etc, the backend must take special care to protect the finalization expansion, if the expansion is for any other reason than normal region end, and it is `inline' (it is inside the exception region). The backend can either choose to move them out of line, or it can created an exception region over the finalization to protect it, and in the handler associated with it, it would not run the finalization as it otherwise would have, but rather just rethrow to the outer handler, careful to skip the normal handler for the original region.
In Ada, they will use the more runtime intensive approach of having fewer regions, but at the cost of additional work at run time, to keep a list of things that need cleanups. When a variable has finished construction, they add the cleanup to the list, when the come to the end of the lifetime of the variable, the run the list down. If the take a hit before the section finishes normally, they examine the list for actions to perform. I hope they add this logic into the back-end, as it would be nice to get that alternative approach in C++.
On an rs6000, xlC stores exception objects on that stack, under the try block. When is unwinds down into a handler, the frame pointer is adjusted back to the normal value for the frame in which the handler resides, and the stack pointer is left unchanged from the time at which the object was thrown. This is so that there is always someplace for the exception object, and nothing can overwrite it, once we start throwing. The only bad part, is that the stack remains large.
The below points out some things that work in g++'s exception handling.
All completely constructed temps and local variables are cleaned up in all unwinded scopes. Completely constructed parts of partially constructed objects are cleaned up. This includes partially built arrays. Exception specifications are now handled. Thrown objects are now cleaned up all the time. We can now tell if we have an active exception being thrown or not (__eh_type != 0). We use this to call terminate if someone does a throw; without there being an active exception object. uncaught_exception () works. Exception handling should work right if you optimize. Exception handling should work with -fpic or -fPIC.
The below points out some flaws in g++'s exception handling, as it now stands.
Only exact type matching or reference matching of throw types works when -fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000, PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not work. HPPA is mostly done, but throwing between a shared library and user code doesn't yet work. Some targets have support for data-driven unwinding. Partial support is in for all other machines, but a stack unwinder called __unwind_function has to be written, and added to libgcc2 for them. The new EH code doesn't rely upon the __unwind_function for C++ code, instead it creates per function unwinders right inside the function, unfortunately, on many platforms the definition of RETURN_ADDR_RTX in the tm.h file for the machine port is wrong. See below for details on __unwind_function. RTL_EXPRs for EH cond variables for && and || exprs should probably be wrapped in UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
We only do pointer conversions on exception matching a la 15.3 p2 case 3: `A handler with type T, const T, T&, or const T& is a match for a throw-expression with an object of type E if [3]T is a pointer type and E is a pointer type that can be converted to T by a standard pointer conversion (_conv.ptr_) not involving conversions to pointers to private or protected base classes.' when -frtti is given.
We don't call delete on new expressions that die because the ctor threw an exception. See except/18 for a test case.
15.2 para 13: The exception being handled should be rethrown if control reaches the end of a handler of the function-try-block of a constructor or destructor, right now, it is not.
15.2 para 12: If a return statement appears in a handler of function-try-block of a constructor, the program is ill-formed, but this isn't diagnosed.
15.2 para 11: If the handlers of a function-try-block contain a jump into the body of a constructor or destructor, the program is ill-formed, but this isn't diagnosed.
15.2 para 9: Check that the fully constructed base classes and members of an object are destroyed before entering the handler of a function-try-block of a constructor or destructor for that object.
build_exception_variant should sort the incoming list, so that it implements set compares, not exact list equality. Type smashing should smash exception specifications using set union.
Thrown objects are usually allocated on the heap, in the usual way. If one runs out of heap space, throwing an object will probably never work. This could be relaxed some by passing an __in_chrg parameter to track who has control over the exception object. Thrown objects are not allocated on the heap when they are pointer to object types. We should extend it so that all small (<4*sizeof(void*)) objects are stored directly, instead of allocated on the heap.
When the backend returns a value, it can create new exception regions that need protecting. The new region should rethrow the object in context of the last associated cleanup that ran to completion.
The structure of the code that is generated for C++ exception handling code is shown below:
Ln: throw value; copy value onto heap jump throw (Ln, id, address of copy of value on heap) try { +Lstart: the start of the main EH region |... ... +Lend: the end of the main EH region } catch (T o) { ...1 } Lresume: nop used to make sure there is something before the next region ends, if there is one ... ... jump Ldone [ Lmainhandler: handler for the region Lstart-Lend cleanup ] zero or more, depending upon automatic vars with dtors +Lpartial: | jump Lover +Lhere: rethrow (Lhere, same id, same obj); Lterm: handler for the region Lpartial-Lhere call terminate Lover: [ [ call throw_type_match if (eq) { ] these lines disappear when there is no catch condition +Lsregion2: | ...1 | jump Lresume |Lhandler: handler for the region Lsregion2-Leregion2 | rethrow (Lresume, same id, same obj); +Leregion2 } ] there are zero or more of these sections, depending upon how many catch clauses there are ----------------------------- expand_end_all_catch -------------------------- here we have fallen off the end of all catch clauses, so we rethrow to outer rethrow (Lresume, same id, same obj); ----------------------------- expand_end_all_catch -------------------------- [ L1: maybe throw routine ] depending upon if we have expanded it or not Ldone: ret start_all_catch emits labels: Lresume,
The __unwind_function takes a pointer to the throw handler, and is expected to pop the stack frame that was built to call it, as well as the frame underneath and then jump to the throw handler. It must restore all registers to their proper values as well as all other machine state as determined by the context in which we are unwinding into. The way I normally start is to compile:
void *g; foo(void* a) { g = a; }
with -S, and change the thing that alters the PC (return, or ret usually) to not alter the PC, making sure to leave all other semantics (like adjusting the stack pointer, or frame pointers) in. After that, replicate the prologue once more at the end, again, changing the PC altering instructions, and finally, at the very end, jump to `g'.
It takes about a week to write this routine, if someone wants to volunteer to write this routine for any architecture, exception support for that architecture will be added to g++. Please send in those code donations. One other thing that needs to be done, is to double check that __builtin_return_address (0) works.
For the alpha, the __unwind_function will be something resembling:
void __unwind_function(void *ptr) { /* First frame */ asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */ asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ /* Second frame */ asm ("ldq $15, 8($30)"); /* fp */ asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ /* Return */ asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */ }
However, there are a few problems preventing it from working. First of
all, the gcc-internal function __builtin_return_address
needs to
work given an argument of 0 for the alpha. As it stands as of August
30th, 1995, the code for BUILT_IN_RETURN_ADDRESS
in `expr.c'
will definitely not work on the alpha. Instead, we need to define
the macros DYNAMIC_CHAIN_ADDRESS
(maybe),
RETURN_ADDR_IN_PREVIOUS_FRAME
, and definitely need a new
definition for RETURN_ADDR_RTX
.
In addition (and more importantly), we need a way to reliably find the
frame pointer on the alpha. The use of the value 8 above to restore the
frame pointer (register 15) is incorrect. On many systems, the frame
pointer is consistently offset to a specific point on the stack. On the
alpha, however, the frame pointer is pushed last. First the return
address is stored, then any other registers are saved (e.g., s0
),
and finally the frame pointer is put in place. So fp
could have
an offset of 8, but if the calling function saved any registers at all,
they add to the offset.
The only places the frame size is noted are with the `.frame'
directive, for use by the debugger and the OSF exception handling model
(useless to us), and in the initial computation of the new value for
sp
, the stack pointer. For example, the function may start with:
lda $30,-32($30) .frame $15,32,$26,0
The 32 above is exactly the value we need. With this, we can be sure
that the frame pointer is stored 8 bytes less--in this case, at 24(sp)).
The drawback is that there is no way that I (Brendan) have found to let
us discover the size of a previous frame inside the definition
of __unwind_function
.
So to accomplish exception handling support on the alpha, we need two
things: first, a way to figure out where the frame pointer was stored,
and second, a functional __builtin_return_address
implementation
for except.c to be able to use it.
Or just support DWARF 2 unwind info.
This subsection discusses various aspects of the design of the data-driven model being implemented for the exception handling backend.
The goal is to generate enough data during the compilation of user code,
such that we can dynamically unwind through functions at run time with a
single routine (__throw
) that lives in libgcc.a, built by the
compiler, and dispatch into associated exception handlers.
This information is generated by the DWARF 2 debugging backend, and includes all of the information __throw needs to unwind an arbitrary frame. It specifies where all of the saved registers and the return address can be found at any point in the function.
Major disadvantages when enabling exceptions are:
The backend must be extended to fully support exceptions. Right now there are a few hooks into the alpha exception handling backend that resides in the C++ frontend from that backend that allows exception handling to work in g++. An exception region is a segment of generated code that has a handler associated with it. The exception regions are denoted in the generated code as address ranges denoted by a starting PC value and an ending PC value of the region. Some of the limitations with this scheme are:
The above is not meant to be exhaustive, but does include all things I have thought of so far. I am sure other limitations exist.
Below are some notes on the migration of the exception handling code backend from the C++ frontend to the backend.
NOTEs are to be used to denote the start of an exception region, and the end of the region. I presume that the interface used to generate these notes in the backend would be two functions, start_exception_region and end_exception_region (or something like that). The frontends are required to call them in pairs. When marking the end of a region, an argument can be passed to indicate the handler for the marked region. This can be passed in many ways, currently a tree is used. Another possibility would be insns for the handler, or a label that denotes a handler. I have a feeling insns might be the best way to pass it. Semantics are, if an exception is thrown inside the region, control is transferred unconditionally to the handler. If control passes through the handler, then the backend is to rethrow the exception, in the context of the end of the original region. The handler is protected by the conventional mechanisms; it is the frontend's responsibility to protect the handler, if special semantics are required.
This is a very low level view, and it would be nice is the backend supported a somewhat higher level view in addition to this view. This higher level could include source line number, name of the source file, name of the language that threw the exception and possibly the name of the exception. Kenner may want to rope you into doing more than just the basics required by C++. You will have to resolve this. He may want you to do support for non-local gotos, first scan for exception handler, if none is found, allow the debugger to be entered, without any cleanups being done. To do this, the backend would have to know the difference between a cleanup-rethrower, and a real handler, if would also have to have a way to know if a handler `matches' a thrown exception, and this is frontend specific.
The stack unwinder is one of the hardest parts to do. It is highly machine dependent. The form that kenner seems to like was a couple of macros, that would do the machine dependent grunt work. One preexisting function that might be of some use is __builtin_return_address (). One macro he seemed to want was __builtin_return_address, and the other would do the hard work of fixing up the registers, adjusting the stack pointer, frame pointer, arg pointer and so on.
operator new []
adds a magic cookie to the beginning of arrays
for which the number of elements will be needed by operator delete
[]
. These are arrays of objects with destructors and arrays of objects
that define operator delete []
with the optional size_t argument.
This cookie can be examined from a program as follows:
typedef unsigned long size_t; extern "C" int printf (const char *, ...); size_t nelts (void *p) { struct cookie { size_t nelts __attribute__ ((aligned (sizeof (double)))); }; cookie *cp = (cookie *)p; --cp; return cp->nelts; } struct A { ~A() { } }; main() { A *ap = new A[3]; printf ("%ld\n", nelts (ap)); }
The linkage code in g++ is horribly twisted in order to meet two design goals:
1) Avoid unnecessary emission of inlines and vtables.
2) Support pedantic assemblers like the one in AIX.
To meet the first goal, we defer emission of inlines and vtables until the end of the translation unit, where we can decide whether or not they are needed, and how to emit them if they are.
Both C++ and Java provide overloaded functions and methods, which are methods with the same types but different parameter lists. Selecting the correct version is done at compile time. Though the overloaded functions have the same name in the source code, they need to be translated into different assembler-level names, since typical assemblers and linkers cannot handle overloading. This process of encoding the parameter types with the method name into a unique name is called name mangling. The inverse process is called demangling.
It is convenient that C++ and Java use compatible mangling schemes, since the makes life easier for tools such as gdb, and it eases integration between C++ and Java.
Note there is also a standard "Jave Native Interface" (JNI) which implements a different calling convention, and uses a different mangling scheme. The JNI is a rather abstract ABI so Java can call methods written in C or C++; we are concerned here about a lower-level interface primarily intended for methods written in Java, but that can also be used for C++ (and less easily C).
Note that on systems that follow BSD tradition, a C identifier var
would get "mangled" into the assembler name `_var'. On such
systems, all other mangled names are also prefixed by a `_'
which is not shown in the following examples.
C++ mangles a method by emitting the function name, followed by __
,
followed by encodings of any method qualifiers (such as const
),
followed by the mangling of the method's class,
followed by the mangling of the parameters, in order.
For example Foo::bar(int, long) const
is mangled
as `bar__C3Fooil'.
For a constructor, the method name is left out.
That is Foo::Foo(int, long) const
is mangled
as `__C3Fooil'.
GNU Java does the same.
The C++ types int
, long
, short
, char
,
and long long
are mangled as `i', `l',
`s', `c', and `x', respectively.
The corresponding unsigned types have `U' prefixed
to the mangling. The type signed char
is mangled `Sc'.
The C++ and Java floating-point types float
and double
are mangled as `f' and `d' respectively.
The C++ bool
type and the Java boolean
type are
mangled as `b'.
The C++ wchar_t
and the Java char
types are
mangled as `w'.
The Java integral types byte
, short
, int
and long
are mangled as `c', `s', `i',
and `x', respectively.
C++ code that has included javatypes.h
will mangle
the typedefs jbyte
, jshort
, jint
and jlong
as respectively `c', `s', `i',
and `x'. (This has not been implemented yet.)
A simple class, package, template, or namespace name is
encoded as the number of characters in the name, followed by
the actual characters. Thus the class Foo
is encoded as `3Foo'.
If any of the characters in the name are not alphanumeric (i.e not one of the standard ASCII letters, digits, or '_'), or the initial character is a digit, then the name is mangled as a sequence of encoded Unicode letters. A Unicode encoding starts with a `U' to indicate that Unicode escapes are used, followed by the number of bytes used by the Unicode encoding, followed by the bytes representing the encoding. ASSCI letters and non-initial digits are encoded without change. However, all other characters (including underscore and initial digits) are translated into a sequence starting with an underscore, followed by the big-endian 4-hex-digit lower-case encoding of the character.
If a method name contains Unicode-escaped characters, the entire mangled method name is followed by a `U'.
For example, the method X\u0319::M\u002B(int)
is encoded as
`M_002b__U6X_0319iU'.
A C++ pointer type is mangled as `P' followed by the mangling of the type pointed to.
A C++ reference type as mangled as `R' followed by the mangling of the type referenced.
A Java object reference type is equivalent to a C++ pointer parameter, so we mangle such an parameter type as `P' followed by the mangling of the class name.
Squangling (enabled with the `-fsquangle' option), utilizes the `B' code to indicate reuse of a previously seen type within an indentifier. Types are recognized in a left to right manner and given increasing values, which are appended to the code in the standard manner. Ie, multiple digit numbers are delimited by `_' characters. A type is considered to be any non primitive type, regardless of whether its a parameter, template parameter, or entire template. Certain codes are considered modifiers of a type, and are not included as part of the type. These are the `C', `V', `P', `A', `R', `U' and `u' codes, denoting constant, volatile, pointer, array, reference, unsigned, and restrict. These codes may precede a `B' type in order to make the required modifications to the type.
For example:
template <class T> class class1 { }; template <class T> class class2 { }; class class3 { }; int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) { } B0 -> class2<class1<class3> B1 -> class1<class3> B2 -> class3
Produces the mangled name `f__FGt6class21Zt6class11Z6class3iRCB1PB2'. The int parameter is a basic type, and does not receive a B encoding...
Both C++ and Java allow a class to be lexically nested inside another class. C++ also supports namespaces. Java also supports packages.
These are all mangled the same way: First the letter `Q' indicates that we are emitting a qualified name. That is followed by the number of parts in the qualified name. If that number is 9 or less, it is emitted with no delimiters. Otherwise, an underscore is written before and after the count. Then follows each part of the qualified name, as described above.
For example Foo::\u0319::Bar
is encoded as
`Q33FooU5_03193Bar'.
Squangling utilizes the the letter `K' to indicate a remembered portion of a qualified name. As qualified names are processed for an identifier, the names are numbered and remembered in a manner similar to the `B' type compression code. Names are recognized left to right, and given increasing values, which are appended to the code in the standard manner. ie, multiple digit numbers are delimited by `_' characters.
For example
class Andrew { class WasHere { class AndHereToo { }; }; }; f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) { } K0 -> Andrew K1 -> Andrew::WasHere K2 -> Andrew::WasHere::AndHereToo
Function `f()' would be mangled as : `f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo'
There are some occasions when either a `B' or `K' code could be chosen, preference is always given to the `B' code. Ie, the example in the section on `B' mangling could have used a `K' code instead of `B2'.
A class template instantiation is encoded as the letter `t', followed by the encoding of the template name, followed the number of template parameters, followed by encoding of the template parameters. If a template parameter is a type, it is written as a `Z' followed by the encoding of the type. If it is a template, it is encoded as `z' followed by the parameter of the template template parameter and the template name.
A function template specialization (either an instantiation or an explicit specialization) is encoded by an `H' followed by the encoding of the template parameters, as described above, followed by an `_', the encoding of the argument types to the template function (not the specialization), another `_', and the return type. (Like the argument types, the return type is the return type of the function template, not the specialization.) Template parameters in the argument and return types are encoded by an `X' for type parameters, `zX' for template parameters, or a `Y' for constant parameters, an index indicating their position in the template parameter list declaration, and their template depth.
C++ array types are mangled by emitting `A', followed by the length of the array, followed by an `_', followed by the mangling of the element type. Of course, normally array parameter types decay into a pointer types, so you don't see this.
Java arrays are objects. A Java type T[]
is mangled
as if it were the C++ type JArray<T>
.
For example java.lang.String[]
is encoded as
`Pt6JArray1ZPQ34java4lang6String'.
Both C++ and Java classes can have static fields. These are allocated statically, and are shared among all instances.
The mangling starts with a prefix (`_' in most systems), which is
followed by the mangling
of the class name, followed by the "joiner" and finally the field name.
The joiner (see JOINER
in cp-tree.h
) is a special
separator character. For historical reasons (and idiosyncracies
of assembler syntax) it can `$' or `.' (or even
`_' on a few systems). If the joiner is `_' then the prefix
is `__static_' instead of just `_'.
For example Foo::Bar::var
(or Foo.Bar.var
in Java syntax)
would be encoded as `_Q23Foo3Bar$var' or `_Q23Foo3Bar.var'
(or rarely `__static_Q23Foo3Bar_var').
If the name of a static variable needs Unicode escapes,
the Unicode indicator `U' comes before the "joiner".
This \u1234Foo::var\u3445
becomes _U8_1234FooU.var_3445
.
The following special characters are used in mangling:
bool
type,
and the Java boolean
type.
char
type, and the Java byte
type.
const
type.
Also used to indicate a const
member function
(in which cases it precedes the encoding of the method's class).
double
types.
...
.
float
types.
int
types.
intn_t
, where n is a
positive decimal number. The `I' is followed by either two
hexidecimal digits, which encode the value of n, or by an
arbitrary number of hexidecimal digits between underscores. For
example, `I40' encodes the type int64_t
, and `I_200_'
encodes the type int512_t
.
long
type.
long double
type.
short
types.
char
.
Also used as a modifier to indicate a static member function.
restrict
type qualifier.
void
types.
volatile
type or method.
wchar_t
type, and the Java char
types.
long long
type, and the Java long
type.
The letters `G', `M', `O', and `p' also seem to be used for obscure purposes ...
Go to the first, previous, next, last section, table of contents.