From the description, one small part of the code: --64[T=1[O=32[L=(X=\*Y&7)&1,o=...

lifthrasiir · on March 20, 2024

It is in fact quite reasonable if you know a bit of C obfuscation and some expectation about 8086 interpreters. First, indentations:

    --64[
        T=1[
            O=32[
                L=(X=*Y&7)&1,
                o=X/2&1,
                l
            ]=0,
            t=(c=y)&7,
            a=c/8&7,
            Y
        ]>>6,
        g=~-T?y:(n)y,
        d=BX=y,
        l
    ]

`a, b` will return `b` but evaluate `a` as a side effect, so anything preceding `,` can be (recursively) pulled out.

    L=(X=*Y&7)&1,
    o=X/2&1,
    O=32[l]=0,
    t=(c=y)&7,
    a=c/8&7,
    T=1[Y]>>6,
    g=~-T?y:(n)y,
    d=BX=y,
    --64[l]

`a[b]` is semantically equivalent to `*(a + b)`, so it is unexpectedly commutative. Let's swap them all and add some spacing. Ah, also some assignment expressions have been reused as a part of other expressions, which I will also pull out:

    X = *Y & 7,
    L = X & 1,
    o = X / 2 & 1,
    O = l[32] = 0,
    c = y,
    t = c & 7,
    a = c / 8 & 7,
    T = Y[1] >> 6,
    g = ~-T ? y : (n) y,
    d = BX = y,
    --l[64]

Now it should be much more clear that a large chunk of this code is instruction decoding. At this point we need to expand macros to make more sense out of it:

    #define Z short
    #define a(_)*(_*)&
    #define y a(Z)Y[++O]
    #define n char

It would need more reading to determine the exact meaning of `Y`, but given that its type is `unsigned char*`, it should be some sort of pointer, most likely the instruction pointer. The macro `y`, which expands to `*(short*)&Y[++O]`, increases `O` so it looks like reading from IP+0, IP+1, ... while not actually advancing the IP.

So the mysterious code can be now mostly decoded:

    X = *Y & 7,              // [IP+0] = ?????XXX
    L = X & 1,               //        = ???????L
    o = X / 2 & 1,           //        = ??????o?
    O = l[32] = 0,           // Reset `O` for further uses of `y`

    c = y,                   // [IP+1] = yyyyyyyy
    t = c & 7,               //        = ?????ttt
    a = c / 8 & 7,           //        = ??aaa???
    T = Y[1] >> 6,           //        = TT??????
    g = ~-T ? y : (char) y,  // Read [IP+2] and do something

    d = BX = Y[++O],         // [IP+3] = dddddddd
    --l[64]

[IP+1] surely looks like the ModR/M byte, so you can guess that `g` is a displacement even if you don't know what the heck is `~-T` (equals to `T - 1` in twos' complement). [IP+0] is too generic to pick the exact instruction, but my guess is that this code is shared for a lot of similar opcodes to save the code footprint.

pkphilip · on March 20, 2024

Thank you. That is interesting!