Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PicoC: A very small C interpreter (github.com/zsaleeba)
164 points by adamnemecek on June 1, 2015 | hide | past | favorite | 50 comments


Hi, author here. When I started writing PicoC it was mostly because I was thinking about how small AppleSoft BASIC was back in the day and I was curious how small you could make a C implementation. I also had in mind to use it for robotics/drone scripting on STM32 processors which have about 64KB of RAM.

PicoC runs ok in 64KB although it is a bit cramped. I like that you can write scripts in C on the actual device without needing a host computer of any kind. It's also been fairly popular for embedding as a scripting language in desktop applications, mostly because it's small and easy to integrate. It's really designed for scripting so don't expect it to be fast though.


Hey zik, has anyone tried getting it running through emscripten? It would be very cool to have interactive C purely client side in browser. Could make some cool c based jsbin like hosting websites from that.


I tried to port fastcomp (Emscripten's llvm backbend) for that purpose. Based largely off of work Alon Zakai (kripken) already did towards porting llvm and clang to js.

Basically, I was going to compile fastcomp to js then run the commands emscripten would normally run via "arguments" function in the modules corresponding to the tool I needed (clang, llc, llvm-link and opt IIRC). But I stopped short of getting clang to work correctly with the flags I needed it to use.

I know it's possible I just needed to work around a few system dependencies like posix_spawn. I was going to trace it's use using a macro to replace calls to it with a function that prints the file and line number instead. But I got fed up with the system I was using (Amazon Linux / ssh / tablet) to do all of this on and by extension everything else that has to do with programming or trouble shooting.


I don't think anyone's tried that. It sounds like a pretty cool idea though.


Given it's written in C, it seems like a direct port to Javascript might be a much better option.


Any particular reason for that?


A lot of c will translate very simply to JavaScript and any parsing or string manipulation will be insanely easier.


How can parsing and string manipulation be easier than just compiling C to Javascript using Emscripten? The interpreter already works in C. Wouldn't translating it manually to Javascript be extra effort?


Parsing and string manipulation are (hugely) easier in Javascript than C. Compiling the interpreter to Javascript via Emscripten is certainly easier than translating to Javascript -- but porting C to Javascript should still be very easy.


Shameless self promotion of my work in progress C compiler https://github.com/andrewchambers/cc which is attempting to create a modern cross platform C compiler.

One of my goals is to make the entire toolchain rapid to port and hack on. I am pretty sick of gcc and llvm taking 20 mins just to build from source.

I would love to find serious collaborators.


Did you consider contributing to libfirm[0]/cparser[1]

0: http://pp.ipd.kit.edu/firm/ 1: https://github.com/MatzeB/cparser


I've looked at libfirm a fair bit. I do think it is interesting. Thanks for the link to the c frontend.


I am all in favor of hacking for studying and fun. But for practical reasons - are you familiar with Bellard's tcc?


I am, it is referenced in the readme. Tcc's code is not really so readable in my opinion, which is something I want to address. Tcc also has no desire to become an optimizing compiler in the future. Personally I think 8cc is far superior to tcc in many important ways, but is not as mature as tcc.

I love C so considered using 8cc as a base for expansion, but there is a reason llvm and gcc are in C++ and not C. I dislike C++ for many logical and illogical reasons, and think Go is a fair compromise with keeping close to C roots.


From the README:

The code is heavily inspired by https://github.com/rui314/8cc as well as http://bellard.org/tcc/. I recommend studying the source code of 8cc before contributing here, as 8cc is currently far more mature.


The last time I checked, tcc lacked even a simple AST. This led to some pretty weird emitted code (such as swapping parameters on the stack.) Implementing an AST is not hard, just push and pop nodes on a stack. It also makes a nice front-end/back-end interface.


Eliminating memory allocations made tcc extremely fast, They use a value stack rather than an AST. I just think an AST is a bit easier to follow because it means the parser has less code generation logic embedded in it.


No malloc/free necessary. Just pile up nodes on a stack, then "deallocate" to any saved position.


> I don't need bug reports unless you are also willing to fix the bug yourself in a pull request or by emailing a diff.

That sounds very condescending and off-putting.


Sounded quite straightforward to me. How else would you have preferred him to phrase it?


I can rephrase - bug reports are fine, But I don't need issues which ask why it cant compile xyz yet (Usually the answer is because there aren't enough hours in the day).

edit: I have tweaked the readme, don't let it put you off


I have used this to bring a form of scripting to the graphic calculator I used and hacked on during my high school years. I think this really shines on embedded systems: it's quite easy to map existing syscalls and functions to PicoC functions with minimal overhead, more complex things like struct passing are supported, and memory addresses can be accessed directly just like with "real" compiled C (something that has upsides and downsides, but I like this kind of freedom a lot, and as far I know, other scripting languages like Lua don't support it).

Unfortunately, function pointers aren't supported, and it's 10x slower than equivalent compiled C, if not more, and perhaps slower than Lua (and we aren't even talking about LuaJIT). There also appear to be some issues yet to be dealt with, as can be seen on the previous project page at Google Code: https://code.google.com/p/picoc/issues/list


> more complex things like struct passing are supported, and memory addresses can be accessed directly just like with "real" compiled C (something that has upsides and downsides, but I like this kind of freedom a lot, and as far I know, other scripting languages like Lua don't support it).

Not Lua itself, but at least Luajit's FFI does: http://luajit.org/ext_ffi.html


Obligatory nitpick: C libraries should be careful to prefix all exposed identifiers, to minimize the risk of collision. But when you include picoc.h, which is careful to do this, you also get interpreter.h, which... is not.


You make a good point. To some extent this reflects that PicoC wasn't originally a library at all. It was a standalone interpreter that I later converted to library form.

The other factor is that the code as it stands is designed to be as readable as possible. I'm conscious that prefixing every identifier with "PicoC" will make the whole thing a bit less easy to read. I'm not sure if maximising readability is something that people really care about though. I'd be interested in hearing people's opinions on whether they'd prefer "struct ValueType" or "struct PicoCValueType" in a thousand places throughout the code.


Use a typedef in your own source files.


While I'm on the subject the same also goes for header include guards - when you get a conflict, it's actually quite annoying to track down. (Not least because it's such a rare occurrence that you probably won't expect it and will likely end up on a wild goose chase at some inconvenient moment.)

I stopped using the file name at all in my include guards a few years ago, and use a GUID instead. For example:

    #ifndef HEADER_6AFF21D71B5B43DEB079AA612E4118B4
    #define HEADER_6AFF21D71B5B43DEB079AA612E4118B4

    #endif//HEADER_6AFF21D71B5B43DEB079AA612E4118B4
Also consider the use of #pragma once - though as far as I can tell, this (still) isn't ISO, so I've decided to avoid it.


> Also consider the use of #pragma once - though as far as I can tell, this (still) isn't ISO, so I've decided to avoid it.

Depends on your target platform. GCC, clang/LLVM, Visual C++, and many proprietary compilers all support it. What platform are you targeting that doesn't support it?

Because if you're writing any non-trivial useful code, it's highly likely that your code is not pure ISO C; you're using some library with support for various platforms, or a system call interface, or some other interface to a real system. Once you do that, universal portability no longer applies, so you might as well think about which specific target platforms you care about.


Good question - probably the main reason (and you don't have to agree that it's a good one) is that I'd never have to think about it again ;)


Unless you're working on embedded hardware with proprietary compilers, #pragma once is supported by all compilers (see https://en.wikipedia.org/wiki/Pragma_once). You can safely assume it'll be available on your platform.


When I read "very small C" I'm reminded of the BD Software C compiler I used about 35 years ago for the 8080/Z80 and CP/M.

It was relatively complete K&R C except for no floating point support. It comfortably ran in under 64K bytes of memory. That's 64 kilobytes, or as they now say kibibytes!

It was quite fast to compile. It was also quite fast to run, since it generated true object code, no run-time interpreter needed.

It's now open source and public domain. http://www.bdsoft.com/resources/bdsc.html


As I recall BDS stood for "Brain Damaged Software", a joke made by Leor Zolman, the author. Leor had not taken a compiler construction class, so it's not a recursive descent parser and can be confused by overly complex expressions. It also wrote the generated code into the memory that held the source, expecting that the generated code took fewer bytes than the source it was replacing.


Very nice. This is the kind of philosophy I was thinking about when I wrote it.


c4 remains the master class in minimal interpreted C implementations:

https://github.com/rswier/c4

It's (of course) less complete than PicoC, but it might be the single most useful introduction to compiler construction I've read.


The fun thing about that one is that despite its amazing simplicity it can compile and interpret itself - which is considered a significant milestone toward a "real" language implementation.


Rather enjoyed flicking through that. I wish there was more of an overview for those of us who haven't thought about this stuff for a few years.


I recommend: just read and reread, commenting it as you go, until it all makes sense.

It's written in a very limited dialect of C --- most notably, it doesn't use structs --- because it compiles itself. The expression parser in particular would be clearer with structs rather than array offsets. But once you realize that's why it's so gnarly in places, it's straightforward to mentally translate.

It's a very simple design: a simple lexer with a "pull" API (next()) feeds a precedence-climbing expression parser that spits out bytecode for a simple stack machine.

If you grok expr(), you grok the whole thing.


Alight then, I will - though I may have to come back to you if I get stuck :) (I haven't written anything in C for about 15 years)



Some time back i used picoc to write c programming visualizer. Such a tool is possible only with picoc.

http://dev.pointers.io/#filename=test4.c


This is kinda great; I've thought about having something like it for "running" (simulating) embedded code on a PC, where the compiler just ignores (or asks the user for a value) when hitting anything that isn't available on the host pc (registers, etc).


What an amazing thing!

I wish I had this back in the day when I first learnt C and was doing all this by hand with pencil and paper. The VCR-style rewind is a nice touch.


Do people who use this call it "peacock" or "pico see"?


When I first saw it I thought "pico see".

The "peacock" interpretation is funny and went unnoticed by me. Cheers!


I call it "pico-see".


More than 20 years ago, I used a C interpreter called EiC.

Hmm, someone tried to resurrect it on SourceForge a couple of years ago, it seems, but it's dead again.

http://sourceforge.net/projects/eic/

Someone threw it on GitHub:

https://github.com/kungfooman/EiC-C-Interpreter

PDF of doc [2009]:

http://www.mirrorservice.org/sites/downloads.sourceforge.net...


Ah! This is very cool! Strangely enough, I was looking for something exactly like this and staring down having to write my own. Perfect!


Nice!

No pointers to functions, I see.. not nitpicking, just playing :) I like what you've done here. I'm going to have to go through the code and hopefully learn something new.


Can it interpret itself?


Unfortuately not - it uses a few C tricks which its implementation doesn't support. You could probably make it self-hosting with a little effort but I can't even imagine how slow the interpreter-within-an-interpreter would be!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: