These are some potential future projects for GCC. Most of them have to do with the optimizer. Some are old ideas which might not help very much anymore, but who knows?
There is a separate page for Bounds Checking with Bounded Pointers.
There is a separate project list for the C preprocessor.
We also have a page detailing optimizer inadequacies, if you'd prefer to think about it in terms of problems instead of features.
The new version of the C standard (ISO/IEC 9899:1999) requires a number of library changes; these have to be provided by the C library, and not by gcc. In addition, there are also changes to the language proper, and some compiler support is needed for the new library features. An overview of the C99 implementation status is available.
-f
options, and haifa doesn't help that situation.
The current implementation of global cse uses partial redundancy elimination via lazy code motion (lcm).
lcm also provides the underlying framework for several additional optimizations such as shrink wrapping, spill code motion, dead store elimination, and generic load/store motion (all the other examples are subcases of load/store motion).
It can probably also be used to improve the reg-stack pass of the compiler.
Contact law@cygnus.com if you're interested in working on lazy code motion.
GNU libc includes some macros to optimize calls to some string
functions with constant arguments. These macros tend to cause huge
blowup in the size of preprocessed source if nested; for example, each
nested call to strcpy
expands the source 20-fold, with
four nested calls having an expansion ten megabytes in size. GCC then
consumes a huge amount of memory compiling such expressions. Many of
the optimizations to ISO C string functions could be implemented in
GCC and then disabled in glibc, with benefits to other systems as
well, and the potential to use information GCC has about
alignment.
All the string functions act as if they access individual
characters, so care may need to be taken that no
-fstrict-aliasing
problems occur when internal uses of
other types are generated. Also, the arguments to the string function
must be evaluated exactly once each (if they have any side effects),
even though the call to the string function might be optimized away.
Care must be taken that any optimizations in GCC are
standards-conforming in terms of not possibly accessing beyond the
arrays involved (possibly within a function call created in the
optimization); whereas the glibc macros know the glibc implementation
and how much memory it might access, GCC optimizations can't. When
-fcheck-memory-usage
is used, calls to the checking
functions from Checker may need to be emitted; see the existing
builtin string functions in GCC for examples.
There are some further optimizations in glibc not covered here, that either optimize calls to non-ISO C functions or call glibc internal functions in the expansion of the macro.
glibc also has inline assembler versions of various string functions; GCC has some, but not necessarily the same ones on the same architectures.
Many of these optimizations should not be applied if
-Os
is specified.
memset
only when the value and length
involved have no side effects, and the value is a constant zero. On
architectures allowing unaligned accesses, glibc also optimizes if the
length is constant and does not exceed 16; the value to which memory
is set need not be constant. In such a case, GCC should convert the
value (constant or not) to unsigned char
, maybe using
save_expr
to protect against multiple evaluation, and
compute and store the values c * 0x01
, c *
0x0101
, and so on (whichever are needed for the given length),
to store as single words then as part of a word. Where the
architecture does not support unaligned accesses, GCC may still be
able to optimize if the destination is known to be suitably aligned.
For example, no alignment is needed for a set of one byte; a set of
zero bytes should be eliminated in all cases; if two bytes are to be
set, two byte alignment may suffice; and so on.strcpy
from a string constant into a
memcpy
with the known length of that string constant. In
turn, emit_block_move
may expand that memcpy
inline. glibc includes optimizations to load the constant contents of
the string directly rather than via a memory copy, when no more than 8
bytes need copying. emit_block_move
could be taught to
find the contents of the string constant and include the appropriate
integers directly in the assembler output for the copy.strncpy
from a string
constant. Where the maximum length to be copied is not more than the
length of the string constant including the terminating null
character, GCC could (but does not) optimize as a
memcpy
. Where the maximum length to be copied is
greater, trailing null characters must be added to make up the length;
here glibc can use a micro-optimization not available to GCC, its
__mempcpy
function to copy and return the address after
the data copied, to pass into memset
. This is only used
when as inline assembler __mempcpy
is available.strncat
with constant source and
maximum length. If the maximum length exceeds the length of the
source string, it is optimized to strcat
; GCC could do
this (including when the maximum length equals the length of the
source string). Otherwise, on architectures where glibc has an
inline assembler strchr
and it is used, glibc converts
the strncat
to a memcpy
, but fails to add
the terminating null character.strcmp
. The following in GCC would completely cover
them: comparisons of constant strings should be done at compile time;
comparisons where one string is constant and of length less than 4
should be inlined to compare successive bytes to the known constant
ones (further optimization to compare more than one byte at once is in
general unsafe as it might access too much memory). Certain of these
cases could also be inlined for memcmp
.strncmp
, where one string is constant
and strictly shorter than the maximum number of characters to be
compared, by replacing it with strcmp
(which may then be
further optimized). This may not be safe for GCC except in the cases
where it handles strcmp
internally as above, since
strcmp
may require both strings to be null-terminated
whereas strncmp
doesn't.strcspn
where the string of excluded
characters is constant and of length not greater than three. GCC
could readily optimize the specific case where the length is zero,
converting it to strlen
.strspn
where not more than
three characters are involved. Where there are zero characters
acceptable in the initial segment, GCC could optimize to the integer
constant 0.strpbrk
.
The cases of finding the first occurrence of zero characters (return a
null pointer) or one character (convert to strchr
) could
readily be optimized by GCC.printf
, scanf
and
strftime
) checking:Contact jsm28@cam.ac.uk before doing any substantial format checking work.
$
format checking.strftime
: warnings for use of 0
,
-
or _
flags without width on formats where
inappropriate.printf
formats: warn for integer constant
argument out of range of unpromoted type (including signed/unsigned
where it doesn't fit in range).%q
: anything more accurate than
long long
? See messages 1 and
2
about this to gcc-bugs. (Note that %q
is now largely
obsolete given the new C99 formats and macros.)format-va-1.c
(see PR
c/479).__attribute__((__nonnull__))
patch
in (he may be working on a cleaner version that applies to current
sources). See messages 1,
2
and 3
to gcc-patches.-Wsecurity
option):
printf
and scanf
functions with non-constant format if there are no arguments to the
format (for example, printf (foo)
).sprintf
into fixed length buffer if the
output can't be proved not to overrun. Similarly for
scanf
%s
and %[...]
without
width to fixed length buffer (or possibly to any buffer); or
%s
, %[...]
and %c
with width
to too short a buffer, including %lc
, %ls
and %l[...]
.Stuff I know has been done has been deleted. Stuff in progress has a contact name associated with it.
Better optimization.
If a function has been placed in a special section via attributes, we may want to put its static data and string constants in a special section too. But which one? (Being able to specify a section for string constants would be useful for the Linux kernel.)
It is possible to optimize
if (x == 1) ...; if (x == 2) ...; if (x == 3) ...;into
if (x == 1) ...; else if (x == 2) ...; else if (x == 3) ...;provided that x is not altered by the contents of the if statements.
It's not certain whether this is worth doing. Perhaps programmers nearly always write the else's themselves, leaving few opportunities to improve anything.
Perhaps we should have an un-cse step right after cse, which tries to replace a reg with its value if the value can be substituted for the reg everywhere, if that looks like an improvement. Which is if the reg is used only a few times. Use rtx_cost to determine if the change is really an improvement.
The scheme is that each value has just one hash entry. The first_same_value and next_same_value chains are no longer needed.
For arithmetic, each hash table elt has the following slots:
So, if we want to enter (plus:SI (reg:SI 30) (const_int
104))
, we first enter (const_int 104)
and find the
entry that (reg:SI 30)
now points to. Then we put these
elts into operands 0 and 1 of a new elt. We put PLUS and
SI into the new elt.
Registers and mem refs would never be entered into the table as such. However, the values they contain would be entered. There would be a table indexed by regno which points at the hash entry for the value in that reg.
The hash entry index now plays the role of a qty number. We still need qty_first_reg, reg_next_eqv, etc. to record which regs share a particular qty.
When a reg is used whose contents are unknown, we need to create a hash table entry whose contents say "unknown", as a place holder for whatever the reg contains. If that reg is added to something, then the hash entry for the sum will refer to the "unknown" entry. Use UNKNOWN for the rtx code in this entry. This replaces make_new_qty.
For a constant, a unique hash entry would be made based on the value of the constant.
What about MEM? Each time a memory address is referenced, we need a qty (a hash table elt) to represent what is in it. (Just as for a register.) If this isn't known, create one, just as for a reg whose contents are unknown.
We need a way to find all mem refs that still contain a certain value. Do this with a chain of hash elts (for memory addresses) that point to locations that hold the value. The hash elt for the value itself should point to the start of the chain. It would be good for the hash elt for an address to point to the hash elt for the contents of that address (but this ptr can be null if the contents have never been entered).
With this data structure, nothing need ever be invalidated except the lists of which regs or mems hold a particular value. It is easy to see if there is a reg or mem that is equiv to a particular value. If the value is constant, it is always explicitly constant.
This might be possible under certain circumstances, such as when the argument lists of the functions have the same lengths. Perhaps it could be done with a special declaration.
You would need to verify in the calling function that it does not use the addresses of any local variables (?) and does not use setjmp.
-foptimize-sibling-calls
does at least some of this.
Useful on the 68000/68020 and perhaps on the 32000 series, provided one has a linker that works with the feature. This is said to make a 15% speedup on the 68000.
Here is a scheme for doing this. A global variable, or a local variable whose address is taken, can be kept in a register for an entire function if it does not use non-constant memory addresses and (for globals only) does not call other functions. If the entire function does not meet this criterion, a loop may.
The VAR_DECL for such a variable would have to have two RTL expressions: the true home in memory, and the pseudo-register used temporarily. It is necessary to emit insns to copy the memory location into the pseudo-register at the beginning of the function or loop, and perhaps back out at the end. These insns should have REG_EQUIV notes so that, if the pseudo-register does not get a hard register, it is spilled into the memory location which exists in any case.
The easiest way to set up these insns is to modify the routine put_var_into_stack so that it does not apply to the entire function (sparing any loops which contain nothing dangerous) and to call it at the end of the function regardless of where in the function the address of a local variable is taken. It would be called unconditionally at the end of the function for all relevant global variables.
For debugger output, the thing to do is to invent a new binding level around the appropriate loop and define the variable name as a register variable with that scope.
Currently a variable is allocated a hard register either for the full extent of its use or not at all. Sometimes it would be good to allocate a variable a hard register for just part of a function; for example, through a particular loop where the variable is mostly used, or outside of a particular loop where the variable is not used. (The latter is nice because it might let the variable be in a register most of the time even though the loop needs all the registers.) Contact meissner@cygnus.com before starting any work on live range splitting.
A store into memory is dead if it is followed by another store into the same location; and, in between, there is no reference to anything that might be that location (including no reference to a variable address).
This can be modeled as a partial redundancy elimination/lazy code motion problem. Contact law@cygnus.com before working on dead store elimination optimizations.
Strength reduction and iteration variable elimination could be smarter. They should know how to decide which iteration variables are not worth making explicit because they can be computed as part of an address calculation. Based on this information, they should decide when it is desirable to eliminate one iteration variable and create another in its place.
It should be possible to compute what the value of an iteration variable will be at the end of the loop, and eliminate the variable within the loop by computing that value at the loop end.
When a loop has a simple increment that adds 1, instead of jumping in after the increment, decrement the loop count and jump to the increment. This allows aob insns to be used.
Many operations could be simplified based on knowledge of the
minimum and maximum possible values of a register at any particular
time. These limits could come from the data types in the tree, via
rtl generation, or they can be deduced from operations that are
performed. For example, the result of an and
operation
one of whose operands is 7 must be in the range 0 to 7. Compare
instructions also tell something about the possible values of the
operand, in the code beyond the test.
Value constraints can be used to determine the results of a further
comparison. They can also indicate that certain and
operations are redundant. Constraints might permit a decrement and
branch instruction that checks zeroness to be used when the user has
specified to exit if negative.
John Wehle (john@feith.com) implemented a value range propagation pass which isn't yet in GCC.
Sometimes a variable is declared as int
, it is
assigned only once from a value of type char
, and then it
is used only by comparison against constants. On many machines,
better code would result if the variable had type char
.
If the compiler could detect this case, it could change the
declaration of the variable and change all the places that use it.
There may be cases where it would be better to compile a switch statement to use a fixed hash table rather than the current combination of jump tables and binary search.
It might be possible to make better code by paying attention to the order in which to generate code for subexpressions of an expression.
Consider hoisting common code up past conditional branches or tablejumps.
Contact law@cygnus.com before working on code hoisting.
This technique is said to be able to figure out which way a jump will usually go, and rearrange the code to make that path the faster one.
The C expression *(X + 4 * (Y + C))
compiles better on
certain machines if rewritten as *(X + 4*C + 4*Y)
because
of known addressing modes. It may be tricky to determine when, and
for which machines, to use each alternative.
Some work has been done on this, in combine.c.
if (x) y; else z;
into
z; if (x) y;
if z and x do not interfere and z has no
effects not undone by y. This is desirable if z is faster than
jumping.
foo: movb a2@+,a3@+ jne fooit is better to insert
dbeq d0,foo
before the jne.
d0
can be a junk register. The challenge is to fit this
into a portable framework: when can you detect this situation and
still be able to allocate a junk register?
Right now, describing the target machine's instructions is done cleanly, but describing its addressing mode is done with several ad-hoc macro definitions. Porting would be much easier if there were an RTL description for addressing modes like that for instructions. Tools analogous to genflags and genrecog would generate macros from this description.
There would be one pattern in the address-description file for each kind of addressing, and this pattern would have:
We currently have front ends for C, C++, Objective C, CHILL, Fortran, and Java. Pascal and Ada front ends exist but have not yet been integrated.
Cobol and Modula-2 front ends might be useful, and are being worked on.
Pascal, Modula-2 and Ada require the implementation of functions within functions. Some of the mechanisms for this already exist.
struct foo { enum { INT, DOUBLE } code; auto union { case INT: int i; case DOUBLE: double d;} value : code; };
(struct foo) {a, b, c} = foo();This would call foo, which returns a structure, and then store the several components of the structure into the variables a, b, and c.
Some new compiler features may be needed to do a good job on machines where static data needs to be addressed using base registers.
Some machines have two stacks in different areas of memory, one used for scalars and another for large objects. The compiler does not now have a way to understand this.
The scheduler does not do very well on recent RISC machines. Haifa helps but not enough.
Warn about statements that are undefined because the order of evaluation of increment operators makes a big difference. Here is an example:
*foo++ = hack (*foo);
-Wsequence-point
does some of this, but not that
particular case.
Here is an outline proposed by Allan Adler.
The following used to be in a file PROBLEMS
in the GCC
distribution. Probably much of it is no longer relevant as of GCC 3.0
(the file hadn't changed since GCC 2.0), but some might be. Someone
should go through it, identifying what is and isn't relevant, adding
anything applicable to current GCC (and describing a bug) to GNATS and
sending patches to gcc-patches to remove from the list entries that no
longer apply or have been entered in GNATS.
find_reloads
is used to count number of
spills needed it does not take into account the fact that a reload may
turn out to be a dummy.
I'm not sure this really happens any more. Doesn't it find all the dummies on both passes?
movl a3@,a0 movl a3@(16),a1 clrb a0@(a1:l)
is generated and may be worse than
movl a3@,a0 addl a3@(16),a0 clrb a0@
If ordering of operands is improved, many more such cases will be generated from typical array accesses.
expand_mult
so that if there is no
same-modes multiply it will use a widening multiply and then truncate
rather than calling the library.reload_contents
got set up. If it was copied from a register, just reload from that
register. Otherwise, perhaps can change the previous insn to move the
data via the reload reg, thus avoiding one memory ref.cc_status.value2
, if it
ever activates itself after a two-address subtraction (which currently
cannot happen). It is supposed to compare the current value of the
destination but eliminating it would use the results of the
subtraction, equivalent to comparing the previous value of the
destination.-fstrict-aliasing
.scan_loop
doesn't even try to deal
with them.insn-output.c
turns a bit-test into a
sign-test, it should see whether the cc is already set up with that
sign.expand_expr
.foo (bar
())
when bar
returns a double, because the pseudo
used fails to get preferenced into an fp reg because of the
distinction between regs 8 and 9.