Quo vadis, GPC?
As you have surely noticed, GPC development has stalled recently
and questions about GPC's future have been raised. I've thought about
the issues for a while, and here are my thoughts.
I started as GPC's main developer and maintainer some 10 years ago,
following Peter Gerwinski and followed by Waldek Hebisch, though
with some overlap each time. Recently, GPC development has almost
come to a standstill as all three of us have focused on other
projects, and have not had the time nor the necessity to do
significant GPC development.
Back then, Peter Gerwinski, and occasionally I, professionally
worked on some larger Pascal projects. These projects are now
finished (as far as we are concerned), so for our professional work,
there is currently no more interest in GPC. In recent years, Peter
and I have done our professional work in other languages, mainly
C++.
Of course, I still have quite a few Pascal programs of my own,
including several that I use regularly and therefore need to
maintain (as well as other private, semi-professional and also
professional, but not currently maintained, programs). However, when
I compare the effort of rewriting them in another language versus
maintaining GPC just for this purpose, the latter is clearly more
work, especially in the long run.
So I've been looking for other advantages to make it worthwhile for
me to stick with GPC. Of course, I generally like the Pascal
language, and many of its features – otherwise I wouldn't have
gotten involved with it in the first place. However, with the latest
(and quite possibly last) official Pascal standard (ISO 10206)
celebrating its 20th birthday this year, and dialects developed
since then, especially Delphi, focused mostly on Windows integration
and other non-portable features which are uninteresting to me,
instead of genuine language advances, it is a bit outdated.
Meanwhile other languages haven't stood still – I'll be using C++
for my comparison since that's what I've used recently, and also
because it may be the only natively compiled language among today's
popular languages, between the various virtual machines and script
languages – besides C, of course, but that's too low-level for a
meaningful comparison. In particular since template support in C++
has stabilized in the last few years, it has become actually useful
for writing high-level code – higher-level than in Pascal in some
cases, IMHO. I'll expand on this below (see C++
Features); I'll also compare its object model with BP's (see
C++ Object Model) to explore how they overlap and
differ.
The result of the comparison is, not surprisingly, that there are
advantages and disadvantages to both languages which in total more
or less cancel out, as far as I'm concerned. This is, unfortunately,
not good enough reason for me to justify the greater effort of
maintaining GPC vs. rewriting my code. In other words, maintaining
GPC would be worthwhile to me only if it was to support a "best of
both worlds" approach, i.e., to combine its existing good features
with some good features from C++.
I suppose that some of GPC's current users might not welcome such an
approach, especially the standard-purists who have moaned about
feature creep before (though this approach wouldn't mean worsening
standards-compliance in GPC's standard Pascal modes). Others, whose
reason for using GPC is mainly to maintain legacy code originally
written for other Pascal compilers, might be indifferent to the new
features, i.e., not mind them, but also not use them. So before I go
on, I'd like to know who would actually be interested in major new
features (working out the details would be the next step, if there's
interest at all). If I'm basically the only one, it would not be
worth it.
Another serious problem is that GPC development in the past has
often not been very efficient in my opinion. I'll expand on this
below, see Problems with GPC Development. The
problems described there make me skeptical that the current approach
of devloping GPC is viable anymore.
Therefore, I've thought about alternatives. One idea that came to my
mind is to basically rewrite GPC and turn it from a compiler into a
converter to another language. This other language can, in my
opinion, only be C++, since it should be natively compiled, widely
available and powerful enough to match most of GPC's features more
or less directly, which rules out C. I know it would be a radical
step, but I think the advantages would outweigh the disadvantages as
I'll explain below, see GPC as a C++
Converter and New GPC Design. I have discussed this
text with Waldek and Peter before, and incorporated some of their
suggestions.
But even if GPC development becomes easier with the new design, as I
seriously hope, it will not be a task for a single person. So again,
I need to know if some of you are interested, not only in using, but
also helping to develop, a new and improved GPC – especially when
the new GPC eventually (see below) will be written in readable
Pascal, thus removing two obstacles that the current GPC, being
written in almost-unreadable C, has always posed to participation.
Problems with GPC Development
As I see it, GPC development has suffered from some principal
problems most of the time:
- We GPC developers spent most of our work catching up with changes
in the GCC backend that GPC currently uses, such as new internal
memory management systems (again and again), other changes in the
infrastructure, in the expected output data structures, etc. I
know this is true for my work on GPC, and I'm quite sure it also
holds for Peter's and Waldek's. The backend is not really like a
library which the frontends use (though in theory it could have
been designed as one), where some more or less minor changes are
expected when switching to a new version. It's much more
intimately tied to the frontends, so version changes have much
bigger implications, and supporting several backend versions at
once, as we did in GPC, is frowned upon by GCC developers and has
always required more or less ugly hacks (whereas with libraries
it's not uncommon to support several versions). This approach
certainly has its advantages for GCC, but for GPC development it
turned out to be a serious impediment.
After I wrote the previous paragraph, I tried compiling several of
my programs with a gcc-4 based GPC (I've always used gcc-3 based
ones before), and the results, unfortunately, confirmed my
suspicions: several compile time errors, compiler crashes, runtime
errors and wrong runtime behaviour, all with code that compiles
and works well with the gcc-3 backend. I figure it would take me a
few weeks just to get my code running as well as it does with the
older backend, so if I were to continue working on GPC like
before, that's time I'd need to spend before I could even start
trying to improve it.
- Since the backend, and therefore also GPC, is written in C, there
has always been little participation by GPC users, i.e. Pascal
programmers.
- The basic data structure of the backend, the various kinds of
TREE_NODE, are a kind of "hand-made object-orientation in
C", which implies, among other things, manual type conversions
which, even though hidden in clever macros, means much
type-checking is relegated to runtime, so bugs are harder to find,
while the excessive use of those macros makes the code hard to
read, cf. this not very untypical line of GPC source code whose
purpose (pun intended) is not obvious at all:
TREE_PURPOSE (tail) = TREE_VALUE (TREE_PURPOSE (tail));
- Like most C code, it's a bit low-level, e.g., using hand-made
lists rather than container classes or such, which also makes it
harder to work with.
- The backend data structures pose various spurious restrictions to
GPC. E.g., each TREE_NODE kind allows for a fixed number of
language-specific flags. Unfortunately, the number is often not
sufficient for GPC, so we had to overload them heavily, making the
code even more difficult to follow and error-prone, by having to
make sure the overloaded variants don't conflict.
- Most of the backend patches we made to support GPC were never
integrated into the official GCC version, even those that were
obvious fixes to backend bugs (typically, if those bugs happened
to affect only GPC, not C/C++), so we've had to carry them with
GPC forever.
- Finally, even the genuine GPC code, i.e. not the backend or those
parts of the frontend that were originally derived from the C
frontend and adjusted for Pascal, has aged quite a bit, while
requirements (languages and dialects to support) and circumstances
(such as computer speed or available memory) have changed a lot,
and we've all learned many things in the process. This alone might
justify a major rewrite (see: code rot).
C++ Features
C++ has many features distinct from Pascal or from C, but from my
experience using it, it seems to me that two of them, automatic
destructors and templates, were most important to make it easier to
write high-level code.
- As detailed in the objects section, C++ objects have
exactly one destructor, and the compiler ensures that it's always
called when an object ceases to exist (unless the programmer
specifially works around that). This includes the case of explicit
disposal of dynamically allocated objects, as well as local object
variables going out of scope, but also temporary values. So if,
e.g., f is a function that returns an object and g
is a procedure that takes an object parameter, given the statement
g (f);, the compiler will call the destructor of f's
result afterwards. (Though it can be optimized away in certain
cases where it's safe.)
This case is important because it makes is possible to use
"value-like" objects that require destructors. In contrast, while
Extended Pascal allows returning structured types from functions,
and all Pascal dialects allow assigning them and passing them as
parameters, the lack of automatic destructors prevents their use
when any kind of cleanup is necessary. And though BP, Delphi and
OOE objects can have destructors, they must be called explicitly,
which is not possible with temporary values as in the example
above (but would require the use of a local variable to hold the
result and later call the destructor, which would expand this
simple example above to 4 lines – worse for more complex cases).
As a concrete and common example, a string type of unlimited
length requires dynamic memory allocation, therefore it needs to
dispose of the memory at the end of its lifetime to avoid memory
leaks. (Unless one uses garbage collection, which is suitable in
some, but not all, environments, so it's a separate topic.)
Therefore it's inherently impossible to implement such a type in
plain Pascal code (any dialect supported by GPC currently), and
such a type would have to be built into the compiler – which GPC
hasn't done yet (or only partially) because it's much more
difficult to do than writing it in Pascal code.
That's why I consider this feature, though it looks harmless at
first glance, one of the most important ones.
- Templates are types or functions that depend on parameters which
can be values or types. As far as value parameters are concerned,
they are comparable to EP schema types (though the parameters must
be constant at compile time, so schema types are more powerful).
However, templates that depend on types have no correspondence in
any Pascal dialect I know, which I consider a serious shortcoming.
In concrete terms, there's no way in Pascal to define, e.g., a
generic list type. Sure, there are kludges (and I used some
myself), but they always involve unsafe type-casts, preprocessor
(ab)use or other dirty tricks. Of course, the compiler could have
a built-in list type, but then you may want a double-linked list,
a hash table, various trees etc. Building them all into the
compiler would be very heavy and not flexible. With templates,
such types can all be implemented in normal C++ code.
In fact, the C++ Standard Template Library (STL) provides all
those (and more) templates, so it's often not necessary for
programmers to write any templates themselves, but when one needs
a structure not implemented by the STL, one can always write them
as a new template rather than having to ask for special compiler
support for a new built-in type, as one would have to do with GPC
now.
The STL also provides a string type which is a template because
its element type cannot only be "char", but also other types, e.g.
"wide char" for extended character sets (Unicode). Whether one
wants to use them, or instead UTF-8 strings, is another question,
but using a template, this possibility comes essentially for free,
and all the usual string operations (concatenation, substrings,
searching, ...) are automatically available for these types,
whereas the current GPC's EP string type is closely tied to the
Char type. While STL strings are always dynamic (unbounded
size) and their internal layout is different from GPC's EP
strings, it would be easy to implement a GPC/EP binary-compatible
non-dynamic string type as a template, and even a BP
binary-compatible short string. Conversions between these types
could be written as plain template functions, so the whole short
string support (which is one of GPC's long-standing deficiencies
with respect to BP compatility) could be done without much (or
any) special compiler support.
Disadvantages
Of course, C++ also has quite a few disadvantages in my opinion, so
it's not my absolutely ideal language, otherwise I wouldn't be
writing all this in the first place. As the point of this writing is
not to bash C/C++, I'll keep the list short (and incomplete). It
just serves to show why I'd see some value in a Pascal compiler that
uses the advantages, but avoids the disadvantages of C++.
- Like C, it has no module concept, still using include files etc.,
which is a major drawback, IMHO.
- It still has mixup of char and integer types – which,
ironically, is worse in C++ than in C from which it was inherited,
as it mostly affects text input/output, and whereas C's
printf etc. need an explicit type specification
("%i" vs. "%c") anyway, C++ streams
("cout << foo") do not, and will therefore happily print an
integer as a character if it happens to be of the same size as
char, usually 8 bit; I regularly get bitten by this in C++.
- It inherits all the low-level features C has (though usually
provides higher-level alternatives), such as manual memory
management and type-casts. Though we have to admit that sometimes
such features are needed, and on the Pascal side, standard Pascal
doesn't provide them at all, and what other dialects (especially
BP) provide, is often just as low-level and less well thought-out,
to put it politely.
- It has syntax that is often not easy to read (many symbols rather
than keywords compared to Pascal, various different meanings of
"static", just to give some examples) or even dangerous
(fallthrough in "switch" (case) statements without explicit
"break", like in C, etc.).
C++ Object Model
Fortunately, the C++ object model is for the most part
backward-compatible with the BP object model. (I'm less familiar
with the Delphi, Mac and OOE models, but I suppose they also mostly
map to a, slightly different, subset of the C++ model.)
- Objects can have data fields and methods (also static, i.e.
class-wide, data, which is a useful extension not present in the
BP model).
- Fields and methods can be public, protected or private.
- Methods can be virtual or non-virtual.
- Objects can be variables (i.e., live in global memory or on the
stack), fields of other objects, or allocated dynamically (on the
heap), and accessed through pointers or references. (Of course,
object models where objects are always references, such as
Delphi's, can be supported under this model as well.)
- Classes can inherit from other classes. Also multiple inheritance
is supported – though not needed for BP (and I've rarely needed
it myself in C++), it can't hurt to have it available when needed
(either in full, or in a more restricted form, such as Java's
"interfaces" that were once discussed for GPC).
- Objects can have (one or several) constructors, though they're a
bit different from BP:
- The name of all constructors is same as the name of class (which
is made possible due to function overloading); they don't have
individual names. This is mostly a syntactic difference and
wouldn't matter for compatibility – except in the case of two
Pascal constructors with the same signature, e.g.:
constructor Init1 (a: Integer);
constructor Init2 (a: Integer);
These couldn't coexist by overloading. This could be resolved by
adding a hidden parameter internally (which might later be
optimized away), or by turning them into regular methods, see
the next point.
- In C++, when a new object is created, exactly one of its
constructors is called, either an explicitly declared one or an
auto-generated default constructor. At first glance, the rules
can seem confusing, but they guarantee that always a constructor
is called (unless explicitly circumvented, e.g. by low-level
memory allocation and type-cast to an object pointer), which can
prevent problems such as the current GPC's problem with strings
with uninitialized Capacity fields. Likewise, class
designers can enforce any constraints they need by putting them
in the constructor(s).
In contrast, in BP you can create an object without calling a
constructor – in fact very easily (far too easy IMHO) by just
omitting the constructor in the New call, or not calling
a constructor for an object variable, and you can call a
constructor explicitly basically at any time like a normal
method. To me, these are all misfeatures, but for an exact
translation, it seems to me, BP constructors should just be
mapped to regular C++ methods which can be called in any way
(though a new "GPC object model" could support "real"
constructors as in C++). The BP style New call with
constructor would then be translated to a plain new
(using the empty default constructor) plus a regular call of the
BP constructor as a plain method.
- C++ constructors don't have anything like Fail. The only
way to signal constructor failure is to throw an exception. By
translating a BP constructor to a C++ method, Fail could
be handled by simply letting it return a Boolean and checking it
implicitly. (In fact, in BP an explicit constructor call returns
exactly such a Boolean, so we'd get this for free.)
- There are semantic differences when calling virtual methods from
inherited constructors (as well as destructors). Without going
into details, these would also be circumvented by turning
contructors into plain methods.
- Objects can have only one destructor, and it can have no
parameters. In my experience with BP and GPC, that's a reasonable
restriction, and I've almost always used only one destructor
without parameters there as well. After all, the destructor's
purpose is to destroy the object, and how this is done should
depend only on the object's state. (In other words, if you need
alternative ways of destruction of objects of the same class, you
might rather add object fields to keep track of them, rather than
relying on the user to call the right destructor.) Of course, the
restriction to a single destructor without parameters is necessary
to allow for automatic destruction which I
consider a very important feature.
In BP's object model, again, there can be several destructors with
any number of parameters. Then again, like constructors, they are
called mostly like normal methods, i.e., Dispose can be
used with or without an destructor, and a destructor can be called
from anywhere (though that's considered bad style, unless it's
from other destructors). So again, BP destructors could be mapped
to plain C++ methods. This would, of course, eliminate the
advantages of automatic destruction, so it would only be useful as
a compatibility mode for existing Pascal code, where new code
(e.g., a String type with automatic memory-management) would make
use of real C++ destructors.
GPC as a C++ Converter
The suggestion of writing a new GPC as a converter that outputs C++
code might seem strange, after we've long preached that the current
GPC is not this way – the very first paragraph of the manual says:
"Unlike utilities such as p2c, this is a true compiler, not just a
converter."
But that was the 1990s, this is the 2010s. Since then, computers
have become much faster, C++ has become useable (whereas in the
1990s, the natural target language C would have been too low-level),
and we've learned a bit – in particular that developer time is the
really sparse resource. This is not meant to justify a slow
compiler, but in my experience, most time in the compiler is spent
in the optimization stages rather than parsing, so this would be
unaffected. (In some cases, especially with huge interfaces, most
time might be spent currently with GPI file reading/writing, i.e.
imports/exports, and using higher-level structures in the compiler
might make it easier to improve perfomance here, so in these cases,
the new GPC might actually run much faster than the existing one.)
BTW, the new GPC still wouldn't be quite like p2c, as the main goal
would not be to produce readable C++ code, but an exact translation
of the Pascal input – whereas those p2c versions I've seen, apart
from being hopelessly outdated, produced rather a rough
approximation of what their authors seemed to believe Pascal
semantics were like.
The new approach would have some important advantages:
- GPC would be mostly independent of GCC versions, so most users
would not be required to install a different GCC version, but can
use the one they have; GPC developers would be independent of GCC
release schedules. It might even work with other C++ compilers
(but might not, if some g++-specific features will be needed to
implement some GPC features).
- Debugger (gdb) support has always been problematic for GPC. Though
it's not perfect for g++ either, it's quite a bit better, and
actively supported by g++ and gdb developers.
It won't be perfect; e.g., data structures will be shown and have
to be entered in C++ notation (at least initially – further gdb
patches might improve the situation), but even then, I think it
would be vastly better than the current situation where debugging
GPC code with gdb is mostly impossible.
- C++ gives us an object model that already has
everything we need for backward-compatibility plus additional
desirable features, so we don't have to reinvent the wheel.
Additionally, GPC objects would then be binary-compatible with C++
objects, so multi-language OOP programming would be possible
(whereas now, to interface GPC and C++ code, basically the only
way is via a "C" interface on both sides).
- Other C++ features such as templates or
exceptions would also be readily available, so extending GPC with
them would be easier.
- Name mangling is also already done in C++. Though the current GPC
has name mangling where it needs it, more cases might be required
for new features. In C++ we have them already, and they're
understood by gdb.
- We won't need to maintain GCC backend patches. If problems are
found, they can be shown on the C++ level, thus g++ developers
should be more open to our reports.
- Interfacing to C/C++ headers would become much easier. For a
compiler like the current GPC, this is a very difficult problem,
and no satisfying solution (short of writing basically a complete
C-to-Pascal converter) has even be devised. Many GPC contributors
have written interfaces to various C libraries, but this has been
time-consuming, tedious and error-prone work (since the
correctness could not be checked by the compiler), and requires
updates with every new library version.
With GPC as a converter, this would become much easier; though not
quite as easy as, e.g., importing C code into C++, which only
requires saying extern "C" { ... } and only works because
C++ is almost a strict superset of C, so it wouldn't work for
Pascal (unless GPC could parse C code, but we don't really want
that). But a way similar to inline assembler code could work,
i.e., in the same way that GCC/GPC currently can simply pass
through inline assembler code to the next stage, the assembler,
while substituting variable names etc. in order to connect it to
the C/Pascal code, the new GPC could pass C/C++ code to the C++
compiler, while connecting Pascal to C++ variables. (Though there
are some issues left, e.g., getting C++ types as non-opaque Pascal
types automatically doesn't seem so easy.)
A Pascal interface for a C/C++ library then would still use
wrapper functions, but these could call the library functions from
"inline C/C++ code" using their declarations from the header, and
would therefore by type-safe (checked by the C++ compiler), and
updates to new library versions would be less important
(incompatible changes would be detected by the compiler, rather
than resulting in wrong code) and less work.
The biggest problems with this approach will be GPC features that
don't directly map to C++ – I stress "directly", as anything
can be mapped somehow (both languages are Turing complete), it's
just a question of effort required.
E.g., type-checking as well as function overloading might better be
handled in GPC, instead of deferring it to the C++ type system,
since e.g. "Boolean", "Char" and "ByteCard" or similar might map to
the same C++ type, but are quite different in Pascal.
Another example is local routines, especially if they access
variables/parameters from their enclosing routines (because if they
don't, they can actually be implemented in C++, by wrapping them in
a local class – looks ugly, which is irrelevant for generated code,
but works). If they do, the upcoming next standard version of C++
(C++0x) will allow implementing them (via so-called lambda
functions), but it will probably be some time until this standard is
published and then supported by g++.
In the meantime, we could emulate them by passing (as hidden
parameters) either references to the relevant variables or a
reference to the whole "stack frame" which would then be wrapped in
a record internally. A bit cumbersome perhaps, but feasible.
Things become more difficult when local routines (that access
variables/parameters from their enclosing routines) are used as
procedural parameters, something which BP doesn't allow, but
standard Pascal does. Long-time GPC users will know that this area
has always be problematic anyway, that GPC has had many bugs (and
still has some) there, and that many of the dreaded backend patches
deal with this issue. It's where the backend on most platforms uses
so-called "trampolines", on-the-fly generated code stubs on the
stack, which have become even more problematic with the recent
security practice of disallowing execute permission for stack
memory. (The merits of this practice are not the issue here, the
fact is that many systems employ it.)
A possible solution (before C++0x) will be to create trampolines in
"code space", i.e., as generated C++ code. These would work just
like on the stack, with the only problem that their number is
limited (per function), because new code cannot be generated at
runtime. However, this problem is not as serious as it might seem:
- It would in fact be standard Pascal compliant. The standard allows
for system limits ("1.2 This International Standard does not
specify a) the size or complexity of a program and its data that
will exceed the capacity of any specific data processing system or
the capacity of a particular processor [...]"), and of course, all
existing systems have some limits. In fact, recursive calls (which
are always involved in this case) usually consume stack space, and
stack limits are often set rather small, limiting the depth of any
sequence of recursive calls (regardless of trampolines).
- Almost all practical uses require at most one trampoline per local
function. In fact, you need some rather unusual code to need more
than one, see knuth1.pas in GPC's test suite, which is, of
course, a constructed test case.
- Using an option/directive similar to the existing setlimit
for sizes of sets, we can let the user specify how many
trampolines they want (if they want a million to match their stack
size for recursion depth, fine, the code will just take a few more
MB and take a bit longer to compile ;-).
New GPC Design
For the design of a new GPC, I would suggest the following goals:
- Be compatible with the current GPC as much as feasible, though not
including all current misfeatures.
- Continue the current GPC's goals of supporting both Pascal
standards completely, and other dialects as much as feasible.
- Internally use high-level data structures (as opposed to
TREE_NODEs), such as polymorphic object classes (e.g., an
"declaration" base class with various sub-classes such as variable
declarations, routine declarations, etc.), as well as strongly
typed lists and other containers (e.g., using STL
templates.
- Move as much work as possible to the runtime library, e.g., much
of the string or set handling, as described above. Note that this
doesn't imply slower code, as the runtime routines can be inlined
where appropriate.
- Ultimately be written in Pascal and compilable by itself.
This is a typical goal for most compilers, and the current GPC
(like other non-C frontends of GCC) is an exception, not being
written in its source language. Two problems typically associated
with this approach are:
- How to get to the point where it can compile itself? There are
basically two approaches:
- Develop the compiler so it compiles with the current GPC,
until it's self-hosting, then switch to compiling it with
itself, and adding higher-level features (such as templates)
later. Disadvantage: During the first step, these features are
not available, so all their uses will have to be written in
lower-level ways and later thrown away and rewritten.
- Develop the compiler in C++, using high-level features from
the start and later convert it to Pascal. Though it seems more
work, that conversion could be mostly automatic if during the
C++ development phase care is taken to use only those features
and constructs that will have a one-to-one translation. So I
tend to prefer this approach.
- For completeness, I mention another possibility, to extend the
current GPC with high-level features such as templates, and
write the new GPC using it, but for the same
problems that motivate me to write this, I
would exclude this option as far as I'm concerned.
- Bootstrapping: How to build the compiler on a new system when
it's required to compile itself? Cross-compilation is one
answer. If the new GPC is implemented as a
converter to C++, another option is
to just move a created C++ file to the new system and compile it
there with a native C++ compiler. (And possibly use this newly
built GPC to build itself again, much like GCC's
make bootstrap process.)
Reusable parts
Some parts of the current GPC can be reused, at least partially or
initially:
- The lexer – should not require many changes, since it doesn't
deal with backend data structures.
- The preprocessor – at least initially. The current gpcpp is ugly
spaghetti code, and I had planned to rewrite it anyway (by
integrating it into the lexer, thereby reducing its size by 90% or
more by getting rid of much redundant parsing), but that can be
done later just as well. For a first step, it can remain an extra
pass as it is now.
- The parser – only the grammar part, not the (TREE_NODE
based) actions. But the (bison) grammar is the hard part where
much effort has gone to handle syntactic near-ambiguities and
other problems. This wouldn't need to change much.
- Handling of options/directives and Pascal dialects. Not so much
the code (the respective functions and macros are rather simple),
but things list the list of options/directives and the current
code as a reference of which dialect checks to apply where, so we
don't have to gather all this information from scratch.
- GPI file handling – just the general concept (how to store
heavily cross-referential internal structures, preserve identities
across direct and indirect imports/exports, etc.), whereas the
structures themselves (now TREE_NODEs) would change
completely – but that affects only the boring part of the code
(store_node_fields, load_node).
- The run-time system – initially perhaps unchanged though, of
course, it will improve later. Also, at least conceptually, the
GPC code to create and check RTS calls (predef.c).
- The test suite, the units/modules that come with GPC, and
contributions – if the new compiler is backward-compatible as
planned, they should all keep working.
- The documentation – except for the first paragraph ;-).
One thing that will require almost complete changes is the
type-checking and converting (including parameter
checking/converting) code, but it should be rewritten anyway, since
now it's basically C types with a lot of kludges – quite a few of
GPC's known bugs stem from there, and the implementation is much
worse than it seems from the outside ...