Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i've been thinking about a closely related feature in a different context: adding block arguments, as in smalltalk or ruby or especially lobster, to a language more like c, with static types and stack allocation

i think this would be favorable for (among other things) clu-like iterators and imgui libraries, where you often want to do something like

    submenu("&Edit") {
        command("&Cut") { clip_cut(getSelection()); }
        ...
    }
this is especially useful in a context where you're heap-allocating sparingly or not at all, because the subroutine taking the block argument can stack-allocate some resource, pass it to the block, and deallocate it once the block returns; python context managers and win32 paint messages are two cases where people commonly do this sort of thing, but things like save-excursion, with-output-file, transactional memory, and gsave/grestore also provide motivation

the conventional way to do this is to package up the block into a closure, then use a full-fledged function invocation to invoke it, using a calling convention that supports closures. but i suspect a more relaxed and efficient approach is to use an asymmetric coroutine calling convention, in which the callee yields back control to its caller at the entry point to the block, and the block then resumes the callee when it finishes. so instead of merely dividing registers into callee-saved and call-clobbered, as subroutine calling conventions do, we would divide them into callee-saved upon return but upon yield containing callee values the block must have restored upon resumption; caller coroutine context registers, which are callee-saved upon return and also on yield; and call-clobbered. you also need in many cases a way for the block to safely force an early exit from the callee

this allows the caller's local variables to be in registers its blocks can use without further ado, or at least indexed off of such a register, while allowing the yield and resume operations to be, in many cases, just a single machine instruction. and it does not require heap allocation

as an example of taking this to the point of absurdity, here's an untested subroutine for iterating over a nul-terminated string passed in r0 with a block passed in r1, using a hypothetical coroutine convention which passes at least r4 through from its caller to its blocks

    itersz: push {r6, r7, r8, lr}
            mov  r7, r0
            mov  r6, r1
    1:      ldrb r0, [r7], #1
            cbz  r0, 1f
            blx  r1
            b    1b
    1:      pop  {r6, r7, r8, pc}
and here is another untested subroutine which uses it to calculate a string hash

    hashsz: push {r4, r5, r9, lr}
            movs r4, #53
            adr  r1, 1f
            blx  itersz
            mov  r0, r4
            pop  {r4, r5, r9, pc}
    1:      eor  r4, r0, r4, ror #27
            bx   lr
even in this case where both the iteration and the visitor block are utterly trivial, the runtime overhead per item (compared to putting them in the same subroutine) is evidently extremely modest; my estimate is 7 cycles per byte rather than 4 cycles per byte on in-order hardware with simple branch prediction, so, on the order of 1 ns on the hardware russ used as his reference. for anything more complex the overhead should be insignificant

it's less general than the mechanism russ proposes here (it doesn't solve the celebrated samefringe problem), but it's also an order of magnitude more efficient, because the yield and resume operations are less work than a subroutine call, though still more work than, say, decrementing a register and jumping if nonzero



corrected, tested, and commented version of the above example code

            @ hashsz s = {
            @   h := 53u
            @   itersz s, b => {
            @     h = (h >> 27 | h << 5) ^ b
            @   }
            @   return h
            @ }

            .syntax unified
            .thumb
    itersz: push {r6, r7, r8, lr}   @ r6 and r7 are preserved by blocks
            mov  r7, r0             @ so put the string pointer in r7
            mov  r6, r1             @ and the block code pointer in r6
    1:      ldrb r0, [r7], #1       @ loop over string bytes, post-indexing
            cbz  r0, 1f             @ bail out on NUL, but otherwise
            blx  r1                 @ invoke block argument with byte, then
            b    1b                 @ repeat loop
    1:      pop  {r6, r7, r8, pc}   @ restore registers and return

    hashsz: push {r4, lr}           @ r4 is callee-preserved and passed to blocks
            movs r4, #53            @ initial value of hash
            adr  r1, 1f+1  @ load Thumbified pointer for block using gas local label
            bl   itersz             @ invoke iterator subroutine
            mov  r0, r4             @ load completed hash into return value
            pop  {r4, pc}           @ restore and return
    1:      eor  r4, r0, r4, ror #27  @ block argument rotates & xors the byte in r0
            bx   lr                 @ then resumes the iterator


fe de ratas, the above should read

            blx  r6                 @ invoke block argument with byte, then
those responsible for this error have been sacked

years ago, in fact




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: