Why is Logtalk not written in Logtalk?

Discussion on porting Prolog applications to Logtalk

Moderator: Paulo Moura

Post Reply
Parker
Posts: 33
Joined: Wed Feb 27, 2008 2:51 pm

Why is Logtalk not written in Logtalk?

Post by Parker » Wed Oct 14, 2009 3:35 pm

Please don't take it badly Paulo, it's only intended as constructive criticism. :-)


Why is Logtalk not written in Logtalk?

The Logtalk compiler is written in Prolog and is about 16,000 lines of source. It is not "programming in the small", so why isn't Logtalk written in Logtalk?
There's a principle called "eating your own dogfood". By using a tool, you demonstrate the usefulness of the tool.
It is often asked, "if the compiler for language X isn't written in language X, why should I risk using it?" [1]
Bootstrapping is used in most modern languages because it makes it possible to use the language features introduced by the new language.
Logtalk has introduced good support for "programming in the large". Since the Logtalk compiler is 16,000 of lines source code and is a large monolithic system it is clear that this is no longer "programming in the small". It would clearly benefit from decomposition, separation of concerns, re-use, object orientation, portable libraries, etc.

Further rationale:
  • Monolithic code. At present there is a single developer and there are few (any?) contributions from others. This is partly because the code is monolithic and hence understanding the compiler is hard. There is no separation of concerns, no modular programming. Predicate interdependencies are not delineated. Navigating a 16,000 line file is cumbersome.
  • Predicate naming. Predicates have ugly names. The following is a typical predicate name in the Logtalk compiler:

    Code: Select all

    '$lgt_runtime_error_handler'(error(Variable, Sender))
    This is due to two reasons:
    1) Compiler predicates are prefixed with a $ to denote a system predicate, and the single quotes also exist for the same reason.
    2) Part of the predicate name is a namespace that compensates for the lack of modular programming.

    A more elegant name would be

    Code: Select all

    runtime_error_handler(error(Variable, Sender))
    
    or better still, with object orientation

    Code: Select all

    runtime::error_handler(error(Variable, Sender))
    The compiler back-end should be able to take runtime::error_handler(...) and compile it to '$lgt_runtime_error_handler'(...)

    This change in predicate naming would significantly improve readability of the code. Good predicate naming together with decomposition into objects, would make the source far more accessible.
  • Documentation. There is little documentation for the Logtalk compiler. Although Paulo's PhD thesis[2] describes the internals it has not been updated since 2003 while the compiler has continued to evolve. It is not clear how up to date the PhD thesis is with respect to the current Logtalk implementation. Logtalk provides support for automatic documentation generation and would allow the compiler's documentation to evolve and stay up to date.
Other advantages of bootstrapping[3]:
* The compiler is a non-trivial test of the language being compiled.
* Improvements to the language will also improve the compiler.

Of course, Logtalk didn't exist when Logtalk was created so another implementation language had to be used to solve the "chicken and egg" problem. But now that Logtalk is firmly established, this legacy can hopefully be abandoned.

Summary
Logtalk provides a host of great new features for developing large applications. The compiler is an ideal candidate for exploiting these features. They would help improve the Logalk language by making it more readable and provide improved documentation. Furthermore, the code would be more maintainable; it would be easier to spot bugs and the barrier to entry would be significantly lower thus allowing others to contribute.

Writing the Logtalk language in Logtalk is a sign of maturity and is the natural next step in the evolution of this language.

References
1. http://stackoverflow.com/questions/1493 ... mpiler-why
2. Paulo Moura (2003). Logtalk: Design of an Object-Oriented Logic Programming Language. PhD thesis. Universidade da Beira Interior
3. Compilers and Compiler Generators: An Introduction With C++. Patrick D. Terry 1997. International Thomson Computer Press. ISBN 1850322988

Paulo Moura
Logtalk developer
Posts: 474
Joined: Sat May 05, 2007 8:35 am
Location: Portugal
Contact:

Re: Why is Logtalk not written in Logtalk?

Post by Paulo Moura » Thu Oct 15, 2009 12:39 am

Parker wrote:Please don't take it badly Paulo, it's only intended as constructive criticism. :-)


Why is Logtalk not written in Logtalk?

The Logtalk compiler is written in Prolog and is about 16,000 lines of source. It is not "programming in the small", so why isn't Logtalk written in Logtalk?
There's a principle called "eating your own dogfood". By using a tool, you demonstrate the usefulness of the tool.
It is often asked, "if the compiler for language X isn't written in language X, why should I risk using it?" [1]
Well, I'm certainly "eating my own dogfood". I'm using Logtalk to develop non-trivial applications and helping others do the same.
Parker wrote: Bootstrapping is used in most modern languages because it makes it possible to use the language features introduced by the new language.
Logtalk has introduced good support for "programming in the large". Since the Logtalk compiler is 16,000 of lines source code and is a large monolithic system it is clear that this is no longer "programming in the small". It would clearly benefit from decomposition, separation of concerns, re-use, object orientation, portable libraries, etc.

Further rationale:
  • Monolithic code. At present there is a single developer and there are few (any?) contributions from others. This is partly because the code is monolithic and hence understanding the compiler is hard. There is no separation of concerns, no modular programming. Predicate interdependencies are not delineated. Navigating a 16,000 line file is cumbersome.
Actually, there are a sizable number of contributions: bug reports, feature requests, general feedback and discussion, and also some code contributions (for example, in a folder named "contributions" in the current Logtalk distribution). Not to mention the applications, thesis, and papers where Logtalk plays a significant role. These are important contributions to one of the most important assets of a programming language: its user community.

Writing a compiler is neither "programming in the small" or "programming in the large". Writing a compiler is, at least in the specific case of Logtalk, programming at a low-level. This is a key difference. Some people, usually outsiders not familiar with Logtalk, think 16000 lines of code (well, not exactly but I will talk about that in a moment) is a large number of lines for adding objects to Prolog (I know that's not your case). Most of the time they are oblivious of the Logtalk feature set. Personally, I'm always amazed about how small the number of lines of code I wrote in order to implement Logtalk. Prolog is a wonderful, compact, and expressive language.

But let's look at current Logtalk development version (r5116) and do some counting on the "compiler/logtalk.pl" file that implements both the Logtalk compiler and the Logtalk runtime:

Total number of lines: 16019
Empty (layout) lines: 3799
Lines containing only comments: 1414

Therefore, the total number of lines of *code* is 10806. Well, actually is a bit lower as not all lines that are essentially comments are being accounted above. How complex are these 10806 lines of code? Most of them are trivial code. Just to give you some example:

Operator and predicate directives: 77
Tables (facts) of e.g built-in predicates and functions: 453
Number of lines for printing Logtalk banner and flag values: 123

So, 653 lines of trivial code. We are down at 10153. Still a lot you might say. Let's proceed. Not trivial but quite simple code:

Writing out XML documentation files: 798
Error-checking predicates: 223

Still 9132 lines of code. Things that you don't need to care about in order to understand the essential bits about either the compiler or the runtime:

Core high-level multi-threading implementation: 436
Definite Clause Grammars (DCGs) implementation: 416

Now at 8280 lines of code. This is half the number of lines (16000) that you talk about repeatedly in your post. Are these remaining lines of code easy to grasp? Yes, provided that you take the time to read my PhD thesis. Some things, of course, are not there. An example is the dynamic binding caching mechanisms. Moreover, the difficult bits are getting better documented at each release.
Parker wrote: [*]Predicate naming. Predicates have ugly names. The following is a typical predicate name in the Logtalk compiler:

Code: Select all

'$lgt_runtime_error_handler'(error(Variable, Sender))
This is due to two reasons:
1) Compiler predicates are prefixed with a $ to denote a system predicate, and the single quotes also exist for the same reason.
2) Part of the predicate name is a namespace that compensates for the lack of modular programming.
Both (1) and (2) are correct. Allow me to add two bits of additional information. Some Prolog compilers, not all but a fair number, automatically hide (e.g. from the debugger) predicates whose name start with the "$" character. Regarding "the lack of modular programming", in a dynamic language such as Logtalk, where you can do essentially everything at runtime, you cannot separate the compiler from the runtime.
Parker wrote: A more elegant name would be

Code: Select all

runtime_error_handler(error(Variable, Sender))
or better still, with object orientation

Code: Select all

runtime::error_handler(error(Variable, Sender))
The compiler back-end should be able to take runtime::error_handler(...) and compile it to '$lgt_runtime_error_handler'(...)
Why? The requirements to generate an extensible compiler and an efficient runtime are not the same as the requirements to compile user applications. For example, the runtime tables and entity tables necessary to support essential features such as inheritance are basically irrelevant to the implementation of either the compiler or the runtime.
Parker wrote: This change in predicate naming would significantly improve readability of the code. Good predicate naming together with decomposition into objects, would make the source far more accessible.
Predicate (and variable) naming in the implementation of the Logtalk compiler and runtime is top notch if you forget for a moment the necessary quotes and the $lgt_ prefix. In case of doubt, I invite you to browse the source code of most open source Prolog compilers.
Parker wrote: [*] Documentation. There is little documentation for the Logtalk compiler. Although Paulo's PhD thesis[2] describes the internals it has not been updated since 2003 while the compiler has continued to evolve. It is not clear how up to date the PhD thesis is with respect to the current Logtalk implementation. Logtalk provides support for automatic documentation generation and would allow the compiler's documentation to evolve and stay up to date.[/list]
Logtalk automatic documentation generation cannot fulfill the requirements of describing the design decisions and algorithms used in the Logtalk implementation. Logtalk automatic documentation generation is a tool for documenting entity and predicates from an interface perspective.
Parker wrote: Summary
Logtalk provides a host of great new features for developing large applications. The compiler is an ideal candidate for exploiting these features.
No, it's not. As I wrote at the beginning, the Logtalk compiler/runtime is not a large application, is a low-level application. Quite different beasts.
Parker wrote: They would help improve the Logalk language by making it more readable and provide improved documentation. Furthermore, the code would be more maintainable; it would be easier to spot bugs and the barrier to entry would be significantly lower thus allowing others to contribute.
You fail to mention that bootstrapping is not trivial to implement (and I'm talking from experience). It would set back Logtalk development for a large period of time. It would rise new classes of bugs and it would force the Logtalk compiler to target two separate sets of requirements, those of the Logtalk compiler/runtime itself and those of the user applications, adding complexity to a base code that I strive to refine and simplify.
Parker wrote: Writing the Logtalk language in Logtalk is a sign of maturity and is the natural next step in the evolution of this language.
It would be a nice academic exercise, sure. From a pragmatic perspective it would a big waste of the currently limited resources available for developing Logtalk. Bootstrapping sound nice on paper but, in the case of Logtalk and, I suspect, of most programming languages, it would be a diversion. Good for bragging but of little impact in users lives. I rather spend my limited time e.g. implementing a good cross-referencer. Nevertheless, thanks for opening an interesting discussion.

Best regards,

Paulo
Paulo Moura
Logtalk developer

Parker
Posts: 33
Joined: Wed Feb 27, 2008 2:51 pm

Re: Why is Logtalk not written in Logtalk?

Post by Parker » Fri Oct 16, 2009 3:43 am

Don't get me wrong, I'm not trying to persuade you to bump re-writing Logtalk up your priority list, I'm sure you have plenty of other more important things to do. I want to discuss the pros/cons of writing Logtalk in Prolog or Logtalk. So just see this as food for thought.

Writing the Logtalk compiler in Prolog would be an academic exercise.
I disagree. The following languages are bootstrapped: Ada, BASIC, C, Pascal, Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Clojure. I would not label them academic exercises.

The Logtalk compiler does not warrant decomposition into modules/objects.
As an academic prototype, perhaps this is true. If it is more a production quality compiler then I think most software engineers would disagree with you. Whether Logtalk is 16,000 or 8,000 lines of code is irrelevant in the light of the following article:
[The] structure of a rule-based Prolog program becomes complex and error-prone when the number of unique predicate names (UPN) exceeds a threshold of around 35±5 unique predicate names per Prolog program. [...] it is possible to observe a threshold point of around 35 ± 5 UPN, above which Prolog programs contain significantly more errors (ρ=0.000). doi:10.1016/S0164-1212(98)10042-0 http://dx.doi.org/10.1016%2FS0164-1212%2898%2910042-0
I take that to mean: if Logtalk were decomposed into modules/objects you'd get more reliable code.

Logtalk receives many contributions
Yes of course it does, I don't question that. What I'm saying is: how many contributions/patches *to the compiler* have there been?

bootstrapping is not trivial to implement.
Given that Logtalk is described as an extension of Prolog, I'd be interested to know why you think it would be harder to write in Logtalk than in Prolog.

Anyway, I shall look forward to trying out the new cross-referencer.

Cheers,
Parker

Paulo Moura
Logtalk developer
Posts: 474
Joined: Sat May 05, 2007 8:35 am
Location: Portugal
Contact:

Re: Why is Logtalk not written in Logtalk?

Post by Paulo Moura » Fri Oct 16, 2009 12:53 pm

Parker wrote: Writing the Logtalk compiler in Prolog would be an academic exercise.
I disagree. The following languages are bootstrapped: Ada, BASIC, C, Pascal, Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Clojure. I would not label them academic exercises.
The languages itself are, of course, not academic exercises. But I'm not talking about the languages but about their implementations. Are you saying that all production versions of these language compilers are bootstrapped? Note that there are a significant number of implementations for Basic, C, Pascal, and Common Lisp. Not sure about the others.
Parker wrote: The Logtalk compiler does not warrant decomposition into modules/objects.
As an academic prototype, perhaps this is true. If it is more a production quality compiler then I think most software engineers would disagree with you. Whether Logtalk is 16,000 or 8,000 lines of code is irrelevant in the light of the following article:
[The] structure of a rule-based Prolog program becomes complex and error-prone when the number of unique predicate names (UPN) exceeds a threshold of around 35±5 unique predicate names per Prolog program. [...] it is possible to observe a threshold point of around 35 ± 5 UPN, above which Prolog programs contain significantly more errors (ρ=0.000). doi:10.1016/S0164-1212(98)10042-0 http://dx.doi.org/10.1016%2FS0164-1212%2898%2910042-0
I take that to mean: if Logtalk were decomposed into modules/objects you'd get more reliable code.
I never wrote that the Logtalk compiler does not warrant a modular decomposition. In fact, in my previous reply I mentioned that some aspects (e.g. DCGs implementation or MT implementation) are reasonably self-contained. But other aspects, such as the compiler and the runtime are inter-dependent.

As far as the above paper goes, I cannot access it but I'm curious. Prolog code can range from trivial from quite complex in the same number of predicates and in the same number of lines of code (e.g. as exemplified by the Logtalk compiler itself).

I'm sure the quality of Logtalk implementation could be improved. Each monthly release fixes a couple of bugs. Nevertheless, a significant number of these bugs are corner or artificial cases, never reported by any user, found by myself while inspecting the Logtalk base code. In fact, according to its users, Logtalk is quite reliable.
Parker wrote: Logtalk receives many contributions
Yes of course it does, I don't question that. What I'm saying is: how many contributions/patches *to the compiler* have there been?
Users use Logtalk to solve problems, to write applications, not to learn about compilers. Thus, is not surprising there are so few contributions to the compiler itself (from memory, I can only remember a couple of them related to MT). Anyway, these kind of contributions are also welcome but they require users to take the time to learn about the internals of the Logtalk base code, starting by reading my PhD thesis.
Parker wrote: bootstrapping is not trivial to implement.
Given that Logtalk is described as an extension of Prolog, I'd be interested to know why you think it would be harder to write in Logtalk than in Prolog.
The most obvious one is, of course, that I don't have to implement a Prolog compiler to compile the Prolog code generated by the Logtalk compiler. And in that you find the second reason: Logtalk generates Prolog code, which is quite easy to do from Prolog itself. Logtalk extends Prolog, its implementation language, which is quite a different scenario from the languages you mention above that have bootstrapped implementations. Thus, what makes you think that the same bootstrapping expertise and the same bootstrapping benefits would apply equally well in the case of Logtalk? In addition, as I wrote in my previous reply:
The requirements to generate an extensible compiler and an efficient runtime are not the same as the requirements to compile user applications. (...) [Bootstrapping] would rise new classes of bugs and it would force the Logtalk compiler to target two separate sets of requirements, those of the Logtalk compiler/runtime itself and those of the user applications, adding complexity to a base code that I strive to refine and simplify.
You wrote that bootstrapping a language is a sign of language maturity. I say that learning to say no to user requests is also a sign of language maturity. Bootstrapping is a definitive no as far as Logtalk 2.x is concerned. But please don't interpret this decision as closing this discussion thread.
Parker wrote: Anyway, I shall look forward to trying out the new cross-referencer.
The cross-referencer, if and when it happens, will most likely be a Logtalk application as Logtalk should already provide all the necessary reflection support. That should make you happy ;-)

Cheers,

Paulo
Paulo Moura
Logtalk developer

Parker
Posts: 33
Joined: Wed Feb 27, 2008 2:51 pm

Re: Why is Logtalk not written in Logtalk?

Post by Parker » Thu Oct 22, 2009 8:37 am

If Logtalk's reflection capabilities are sufficient, then that is already a good start. But bear in mind there may be a number of potential uses that could benefit from a re-usable codebase.
  • the dependency checker (as already mentioned)
  • the loader generator (automatically find dependencies so that files are loaded in the correct order)
  • static analysis tools (determinacy checker, termination checker, call-pattern checker, mode checker, static-type checker)
  • dynamic analysis tools (execution profiler, code coverage analyser)
These are the kind of tools that would substantially strengthen Logtalk and appeal to developers of large applications. Making re-use easy would help that become near-distant future rather than blue-sky future.

Also, I'm not clear to how one can be in favour of modular decomposition and yet not get entangled with Prolog modules. How would modular decomposition work in a way that is portable? That's why writing Logtalk in Logtalk seems a viable route.

Cheers,
Parker

Paulo Moura
Logtalk developer
Posts: 474
Joined: Sat May 05, 2007 8:35 am
Location: Portugal
Contact:

Re: Why is Logtalk not written in Logtalk?

Post by Paulo Moura » Thu Oct 22, 2009 2:10 pm

Parker wrote: If Logtalk's reflection capabilities are sufficient, then that is already a good start. But bear in mind there may be a number of potential uses that could benefit from a re-usable codebase.
  • the dependency checker (as already mentioned)
  • the loader generator (automatically find dependencies so that files are loaded in the correct order)
  • static analysis tools (determinacy checker, termination checker, call-pattern checker, mode checker, static-type checker)
  • dynamic analysis tools (execution profiler, code coverage analyser)
These are the kind of tools that would substantially strengthen Logtalk and appeal to developers of large applications.
Most of these developer tools can be written in Logtalk using the reflection support. For example, both the dependency-checker and a loader-generator would piggy-back in the cross-referencer. Simple examples of profilers are already available in the "examples/profiling" folder in the current Logtalk distribution. Of course, and as always, contributions are welcome.
Parker wrote: Making re-use easy would help that become near-distant future rather than blue-sky future.
What would make getting these kind of tools implemented sooner rather than later would be development resources. I.e. developers and funding. Without leaving the logic programming community, it should be easy for you to locate older, free, open-source Prolog systems with bigger user communities, significant funding, significant code contributions, a full-time developer and several partial-time developers, where most of the tools you enumerate above are still not available today. Compare this with a younger and currently smaller user community, a single partial-time developer and no development funding. Code reuse plays a very small role here.
Parker wrote: Also, I'm not clear to how one can be in favour of modular decomposition and yet not get entangled with Prolog modules.
You and me both.
Parker wrote: How would modular decomposition work in a way that is portable? That's why writing Logtalk in Logtalk seems a viable route.
Is not a worthy task for the reasons I tried to explain in my previous replies. What "seems" and what "is" are often quite different. I'm not going to repeat the arguments here.

Cheers,

Paulo
Paulo Moura
Logtalk developer

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest