|
On Mon, 6 Mar 2023 at 14:23, Jonathan Revusky <revusky@congocc.org> wrote:I haven't posted here before. I'm the lead developer of the Congo Parser Generator project. Congo, in its origins, is actually a full rewrite of the ancient JavaCC, developed at Sun Microsystems in the 90's. So the parser that it generates is in Java. There is also the ability to generate parsers in Python and C#, but that is currently less complete/polished than the Java code generation.Back in the Before Time, in the Long Long Ago, I worked with JavaCC, and it's pretty nice. It generates the sort of parser you'd create by hand ... if you had that kind of time and expertise.
Hi Frank. Thanks for the note.
The JavaCC situation is kind of strange, you know. Truth told, JavaCC is an example of a project that, because of timing and position, got a really undue amount of usage and attention -- despite its immaturity, poor implementation.... Actually, your mentioning _javascript_ is quite apt.
But I think JavaCC became very well known and popular (at least
for a software development tool) mostly because it came along in
the very early days of Java, mid to late 1996, and it had the
implicit credibility of coming from Sun MIcrosystems. It was
freely downloadable and you had a license to use it for any
purpose, but it was not open source. (Not until mid-2003.)
But it became very popular, mostly because of positioning and timing. Java was so new and there weren´t so many tools out there so it became a fairly well known thing and a lot of software was written using it. In fact, a fairly well known template library, FreeMarker, which is mostly my fault, was written using JavaCC. That's when I first used it. Late 2001 or so.
But when it was released it was really just a POC (proof of
concept) not any sort of robust, mature tool. And, really, I can
tell you that basically no meaningful work was ever done on the
thing after maybe early 1998 at the latest. So, it had been
sitting there gathering dust for some years when Sun figured they
would open-source the project in mid-2003. Then some "community"
formed around the thing but... well, I'll say it... it was really
the utterly wrong kind of people...
Well, the bottom line is that it's a very strange situation.
JavaCC ostensibly has all this history and they put out all these
releases, the latest version number being 7.12 or something like
that. But the truth of the matter is that it never had a legitimate
1.0 release. So, as I say, a lot of software, particularly in the
early days of Java, was written using it, but that doesn't mean it
was a very solid, serious tool -- though maybe it was about as
good as anything else in that space...
But man, it always was very buggy. I think the biggest single bug
(though there were certainly others) was that it did not handle
nested/recursive syntactic lookahead. And I can tell you that is
pretty un-serious for a tool of this nature. I nailed that bug in
my own codebase in mid-2020.
https://parsers.org/announcements/nested-syntactic-lookahead-works/
Properly understood, that really is something so basic. So, what I mean to say is that you could use the thing and make rapid progress initially, get a prototype working, but then you were also liable to run into a wall after that, because all sorts of things that should work just never did. But, to make matters worse, the people who controlled the project never admitted that any of the longstanding bugs were really bugs! I took one example of that and wrote a rather sarcastic (but I daresay quite amusing) essay about the situation.
https://parsers.org/rants-non-technical/a-bugs-life/
If you're in the mood for a good laugh, you could read that. It
think it's pretty funny, albeit in a rather sick kind of way.
A Lua interpreter in Java was once my dream when everyone was doing interpreters in Java (Rhino _javascript_, Jython, JRuby, etc.). I figured it would give the language more exposure. Writing the interpreter was one of my big stumbling blocks. Another was compiling Lua into Java bytecode for speed, although the ASM library looks more than capable. (Hint, hint.)
Well, CongoCC doesn't write an interpreter for you. It just
parses the source file, builds a syntax tree. Still, if anybody
wants to do a Lua interpreter, that can give you a leg up!
A Lua interpreter in _javascript_ would be kind of ironic, though. In an ideal world the ubiquitous embedded language of browsers and other things would be Lua or something like it, not a pretend Java literally hacked out in a week.
Well, yeah, if you can get a lot of people to use something, then
they become committed to it and... I guess that was my point about
JavaCC. PHP is, I think, another good example. It began as
something pretty horrible in terms of its theoretical
underpinnings (or lack thereof) but if you look at PHP 8, it's not
so horrible a language. There were enough people interested in
making it better. I mean people with eep pockets, like Facebook or
the Wordpress people...
(_javascript_ has gotten better, but under the hood _javascript_ is far more Lua-like than Java-like.) So here's hoping you generate _javascript_ parsers sooner rather than later. Anyway, good luck with Congo. I'll definitely check it out.
Thanks for the response, Frank. I wrote the above and put it on the list, but I'm sorry (to everybody) if it's off-topic. Well, it is off-topic surely!
So, maybe if people do want to talk about CongoCC (which would be great) the better place is our discussion forum.
Ciao,
JR
Frank Mitchell