Hi, author here. When I started writing PicoC it was mostly because I was thinking about how small AppleSoft BASIC was back in the day and I was curious how small you could make a C implementation. I also had in mind to use it for robotics/drone scripting on STM32 processors which have about 64KB of RAM.
PicoC runs ok in 64KB although it is a bit cramped. I like that you can write scripts in C on the actual device without needing a host computer of any kind. It's also been fairly popular for embedding as a scripting language in desktop applications, mostly because it's small and easy to integrate. It's really designed for scripting so don't expect it to be fast though.
Hey zik, has anyone tried getting it running through emscripten? It would be very cool to have interactive C purely client side in browser. Could make some cool c based jsbin like hosting websites from that.
I tried to port fastcomp (Emscripten's llvm backbend) for that purpose. Based largely off of work Alon Zakai
(kripken) already did towards porting llvm and clang to js.
Basically, I was going to compile fastcomp to js then run the commands emscripten would normally run via "arguments" function in the modules corresponding to the tool I needed (clang, llc, llvm-link and opt IIRC). But I stopped short of getting clang to work correctly with the flags I needed it to use.
I know it's possible I just needed to work around a few system dependencies like posix_spawn. I was going to trace it's use using a macro to replace calls to it with a function that prints the file and line number instead. But I got fed up with the system I was using (Amazon Linux / ssh / tablet) to do all of this on and by extension everything else that has to do with programming or trouble shooting.
How can parsing and string manipulation be easier than just compiling C to Javascript using Emscripten? The interpreter already works in C. Wouldn't translating it manually to Javascript be extra effort?
Parsing and string manipulation are (hugely) easier in Javascript than C. Compiling the interpreter to Javascript via Emscripten is certainly easier than translating to Javascript -- but porting C to Javascript should still be very easy.
Shameless self promotion of my work in progress C compiler https://github.com/andrewchambers/cc which is attempting to create a modern cross platform C compiler.
One of my goals is to make the entire toolchain rapid to port and hack on. I am pretty sick of gcc and llvm taking 20 mins just to build from source.
I am, it is referenced in the readme. Tcc's code is not really so readable in my opinion, which is something I want to address. Tcc also has no desire to become an optimizing compiler in the future. Personally I think 8cc is far superior to tcc in many important ways, but is not as mature as tcc.
I love C so considered using 8cc as a base for expansion, but there is a reason llvm and gcc are in C++ and not C. I dislike C++ for many logical and illogical reasons, and think Go is a fair compromise with keeping close to C roots.
The last time I checked, tcc lacked even a simple AST. This led to some pretty weird emitted code (such as swapping parameters on the stack.) Implementing an AST is not hard, just push and pop nodes on a stack. It also makes a nice front-end/back-end interface.
Eliminating memory allocations made tcc extremely fast, They use a value stack rather than an AST. I just think an AST is a bit easier to follow because it means the parser has less code generation logic embedded in it.
I can rephrase - bug reports are fine, But I don't need issues which ask why it cant compile xyz yet (Usually the answer is because there aren't enough hours in the day).
edit: I have tweaked the readme, don't let it put you off
I have used this to bring a form of scripting to the graphic calculator I used and hacked on during my high school years. I think this really shines on embedded systems: it's quite easy to map existing syscalls and functions to PicoC functions with minimal overhead, more complex things like struct passing are supported, and memory addresses can be accessed directly just like with "real" compiled C (something that has upsides and downsides, but I like this kind of freedom a lot, and as far I know, other scripting languages like Lua don't support it).
Unfortunately, function pointers aren't supported, and it's 10x slower than equivalent compiled C, if not more, and perhaps slower than Lua (and we aren't even talking about LuaJIT). There also appear to be some issues yet to be dealt with, as can be seen on the previous project page at Google Code: https://code.google.com/p/picoc/issues/list
> more complex things like struct passing are supported, and memory addresses can be accessed directly just like with "real" compiled C (something that has upsides and downsides, but I like this kind of freedom a lot, and as far I know, other scripting languages like Lua don't support it).
Obligatory nitpick: C libraries should be careful to prefix all exposed identifiers, to minimize the risk of collision. But when you include picoc.h, which is careful to do this, you also get interpreter.h, which... is not.
You make a good point. To some extent this reflects that PicoC wasn't originally a library at all. It was a standalone interpreter that I later converted to library form.
The other factor is that the code as it stands is designed to be as readable as possible. I'm conscious that prefixing every identifier with "PicoC" will make the whole thing a bit less easy to read. I'm not sure if maximising readability is something that people really care about though. I'd be interested in hearing people's opinions on whether they'd prefer "struct ValueType" or "struct PicoCValueType" in a thousand places throughout the code.
While I'm on the subject the same also goes for header include guards - when you get a conflict, it's actually quite annoying to track down. (Not least because it's such a rare occurrence that you probably won't expect it and will likely end up on a wild goose chase at some inconvenient moment.)
I stopped using the file name at all in my include guards a few years ago, and use a GUID instead. For example:
> Also consider the use of #pragma once - though as far as I can tell, this (still) isn't ISO, so I've decided to avoid it.
Depends on your target platform. GCC, clang/LLVM, Visual C++, and many proprietary compilers all support it. What platform are you targeting that doesn't support it?
Because if you're writing any non-trivial useful code, it's highly likely that your code is not pure ISO C; you're using some library with support for various platforms, or a system call interface, or some other interface to a real system. Once you do that, universal portability no longer applies, so you might as well think about which specific target platforms you care about.
Unless you're working on embedded hardware with proprietary compilers, #pragma once is supported by all compilers (see https://en.wikipedia.org/wiki/Pragma_once). You can safely assume it'll be available on your platform.
When I read "very small C" I'm reminded of the BD Software C compiler I used about 35 years ago for the 8080/Z80 and CP/M.
It was relatively complete K&R C except for no floating point support. It comfortably ran in under 64K bytes of memory. That's 64 kilobytes, or as they now say kibibytes!
It was quite fast to compile. It was also quite fast to run, since it generated true object code, no run-time interpreter needed.
As I recall BDS stood for "Brain Damaged Software", a joke made by Leor Zolman, the author. Leor had not taken a compiler construction class, so it's not a recursive descent parser and can be confused by overly complex expressions. It also wrote the generated code into the memory that held the source, expecting that the generated code took fewer bytes than the source it was replacing.
The fun thing about that one is that despite its amazing simplicity it can compile and interpret itself - which is considered a significant milestone toward a "real" language implementation.
I recommend: just read and reread, commenting it as you go, until it all makes sense.
It's written in a very limited dialect of C --- most notably, it doesn't use structs --- because it compiles itself. The expression parser in particular would be clearer with structs rather than array offsets. But once you realize that's why it's so gnarly in places, it's straightforward to mentally translate.
It's a very simple design: a simple lexer with a "pull" API (next()) feeds a precedence-climbing expression parser that spits out bytecode for a simple stack machine.
This is kinda great; I've thought about having something like it for "running" (simulating) embedded code on a PC, where the compiler just ignores (or asks the user for a value) when hitting anything that isn't available on the host pc (registers, etc).
No pointers to functions, I see.. not nitpicking, just playing :) I like what you've done here. I'm going to have to go through the code and hopefully learn something new.
Unfortuately not - it uses a few C tricks which its implementation doesn't support. You could probably make it self-hosting with a little effort but I can't even imagine how slow the interpreter-within-an-interpreter would be!
PicoC runs ok in 64KB although it is a bit cramped. I like that you can write scripts in C on the actual device without needing a host computer of any kind. It's also been fairly popular for embedding as a scripting language in desktop applications, mostly because it's small and easy to integrate. It's really designed for scripting so don't expect it to be fast though.