Quo vadis, GPC?

As you have surely noticed, GPC development has stalled recently and questions about GPC's future have been raised. I've thought about the issues for a while, and here are my thoughts.

I started as GPC's main developer and maintainer some 10 years ago, following Peter Gerwinski and followed by Waldek Hebisch, though with some overlap each time. Recently, GPC development has almost come to a standstill as all three of us have focused on other projects, and have not had the time nor the necessity to do significant GPC development.

Back then, Peter Gerwinski, and occasionally I, professionally worked on some larger Pascal projects. These projects are now finished (as far as we are concerned), so for our professional work, there is currently no more interest in GPC. In recent years, Peter and I have done our professional work in other languages, mainly C++.

Of course, I still have quite a few Pascal programs of my own, including several that I use regularly and therefore need to maintain (as well as other private, semi-professional and also professional, but not currently maintained, programs). However, when I compare the effort of rewriting them in another language versus maintaining GPC just for this purpose, the latter is clearly more work, especially in the long run.

So I've been looking for other advantages to make it worthwhile for me to stick with GPC. Of course, I generally like the Pascal language, and many of its features – otherwise I wouldn't have gotten involved with it in the first place. However, with the latest (and quite possibly last) official Pascal standard (ISO 10206) celebrating its 20th birthday this year, and dialects developed since then, especially Delphi, focused mostly on Windows integration and other non-portable features which are uninteresting to me, instead of genuine language advances, it is a bit outdated.

Meanwhile other languages haven't stood still – I'll be using C++ for my comparison since that's what I've used recently, and also because it may be the only natively compiled language among today's popular languages, between the various virtual machines and script languages – besides C, of course, but that's too low-level for a meaningful comparison. In particular since template support in C++ has stabilized in the last few years, it has become actually useful for writing high-level code – higher-level than in Pascal in some cases, IMHO. I'll expand on this below (see C++ Features); I'll also compare its object model with BP's (see C++ Object Model) to explore how they overlap and differ.

The result of the comparison is, not surprisingly, that there are advantages and disadvantages to both languages which in total more or less cancel out, as far as I'm concerned. This is, unfortunately, not good enough reason for me to justify the greater effort of maintaining GPC vs. rewriting my code. In other words, maintaining GPC would be worthwhile to me only if it was to support a "best of both worlds" approach, i.e., to combine its existing good features with some good features from C++.

I suppose that some of GPC's current users might not welcome such an approach, especially the standard-purists who have moaned about feature creep before (though this approach wouldn't mean worsening standards-compliance in GPC's standard Pascal modes). Others, whose reason for using GPC is mainly to maintain legacy code originally written for other Pascal compilers, might be indifferent to the new features, i.e., not mind them, but also not use them. So before I go on, I'd like to know who would actually be interested in major new features (working out the details would be the next step, if there's interest at all). If I'm basically the only one, it would not be worth it.

Another serious problem is that GPC development in the past has often not been very efficient in my opinion. I'll expand on this below, see Problems with GPC Development. The problems described there make me skeptical that the current approach of devloping GPC is viable anymore.

Therefore, I've thought about alternatives. One idea that came to my mind is to basically rewrite GPC and turn it from a compiler into a converter to another language. This other language can, in my opinion, only be C++, since it should be natively compiled, widely available and powerful enough to match most of GPC's features more or less directly, which rules out C. I know it would be a radical step, but I think the advantages would outweigh the disadvantages as I'll explain below, see GPC as a C++ Converter and New GPC Design. I have discussed this text with Waldek and Peter before, and incorporated some of their suggestions.

But even if GPC development becomes easier with the new design, as I seriously hope, it will not be a task for a single person. So again, I need to know if some of you are interested, not only in using, but also helping to develop, a new and improved GPC – especially when the new GPC eventually (see below) will be written in readable Pascal, thus removing two obstacles that the current GPC, being written in almost-unreadable C, has always posed to participation.

Problems with GPC Development

As I see it, GPC development has suffered from some principal problems most of the time:

C++ Features

C++ has many features distinct from Pascal or from C, but from my experience using it, it seems to me that two of them, automatic destructors and templates, were most important to make it easier to write high-level code.

Disadvantages

Of course, C++ also has quite a few disadvantages in my opinion, so it's not my absolutely ideal language, otherwise I wouldn't be writing all this in the first place. As the point of this writing is not to bash C/C++, I'll keep the list short (and incomplete). It just serves to show why I'd see some value in a Pascal compiler that uses the advantages, but avoids the disadvantages of C++.

C++ Object Model

Fortunately, the C++ object model is for the most part backward-compatible with the BP object model. (I'm less familiar with the Delphi, Mac and OOE models, but I suppose they also mostly map to a, slightly different, subset of the C++ model.)

GPC as a C++ Converter

The suggestion of writing a new GPC as a converter that outputs C++ code might seem strange, after we've long preached that the current GPC is not this way – the very first paragraph of the manual says: "Unlike utilities such as p2c, this is a true compiler, not just a converter."

But that was the 1990s, this is the 2010s. Since then, computers have become much faster, C++ has become useable (whereas in the 1990s, the natural target language C would have been too low-level), and we've learned a bit – in particular that developer time is the really sparse resource. This is not meant to justify a slow compiler, but in my experience, most time in the compiler is spent in the optimization stages rather than parsing, so this would be unaffected. (In some cases, especially with huge interfaces, most time might be spent currently with GPI file reading/writing, i.e. imports/exports, and using higher-level structures in the compiler might make it easier to improve perfomance here, so in these cases, the new GPC might actually run much faster than the existing one.)

BTW, the new GPC still wouldn't be quite like p2c, as the main goal would not be to produce readable C++ code, but an exact translation of the Pascal input – whereas those p2c versions I've seen, apart from being hopelessly outdated, produced rather a rough approximation of what their authors seemed to believe Pascal semantics were like.

The new approach would have some important advantages:

The biggest problems with this approach will be GPC features that don't directly map to C++ – I stress "directly", as anything can be mapped somehow (both languages are Turing complete), it's just a question of effort required.

E.g., type-checking as well as function overloading might better be handled in GPC, instead of deferring it to the C++ type system, since e.g. "Boolean", "Char" and "ByteCard" or similar might map to the same C++ type, but are quite different in Pascal.

Another example is local routines, especially if they access variables/parameters from their enclosing routines (because if they don't, they can actually be implemented in C++, by wrapping them in a local class – looks ugly, which is irrelevant for generated code, but works). If they do, the upcoming next standard version of C++ (C++0x) will allow implementing them (via so-called lambda functions), but it will probably be some time until this standard is published and then supported by g++.

In the meantime, we could emulate them by passing (as hidden parameters) either references to the relevant variables or a reference to the whole "stack frame" which would then be wrapped in a record internally. A bit cumbersome perhaps, but feasible.

Things become more difficult when local routines (that access variables/parameters from their enclosing routines) are used as procedural parameters, something which BP doesn't allow, but standard Pascal does. Long-time GPC users will know that this area has always be problematic anyway, that GPC has had many bugs (and still has some) there, and that many of the dreaded backend patches deal with this issue. It's where the backend on most platforms uses so-called "trampolines", on-the-fly generated code stubs on the stack, which have become even more problematic with the recent security practice of disallowing execute permission for stack memory. (The merits of this practice are not the issue here, the fact is that many systems employ it.)

A possible solution (before C++0x) will be to create trampolines in "code space", i.e., as generated C++ code. These would work just like on the stack, with the only problem that their number is limited (per function), because new code cannot be generated at runtime. However, this problem is not as serious as it might seem:

New GPC Design

For the design of a new GPC, I would suggest the following goals:

Reusable parts

Some parts of the current GPC can be reused, at least partially or initially:

One thing that will require almost complete changes is the type-checking and converting (including parameter checking/converting) code, but it should be rewritten anyway, since now it's basically C types with a lot of kludges – quite a few of GPC's known bugs stem from there, and the implementation is much worse than it seems from the outside ...