Whitesmith's C

I have to say, I'm not impressed with the Whitesmith's C compiler. It's probably because it is very early C, around the White Bible era, and it was justly famous for originally having a really weird libc, with names that were different from the V7 libc just for the heck of it. Eventually, they came around to a more reasonable libc, but the compiler still had a truly bizarre assembly format, called a-natural.

It looked kinda c-ish, here's hello world:

L1:0150,0145,0154,0154,0157,054,040,01670157,0162,0154,0144,012,0public _printfpublic _main_main:call c.enthl=&L1sp<=hlcall _printfaf<=spjmp c.ret

Also odd, and this breaks almost every C code you can get from anywhere; porting even the most simple program means adding a lot of explicit zero initializers to get the thing to link.

The whitesmith's C compiler is very lame in one important way: BSS symbols never get allocated in the object file. That means that code like

int foo;bar() { foo = 9; }

does not link. this is craptastic beyond belief. The only workaround is to modify the source to move foo to data by giving an explicit = 0;

FEH.

the object file format, which I needed to crack in the 'exec' call and other tools, is pretty straightforward. There's code for this in src/sgs/wslib.c. The micronix man page for link is pretty comprehensive, which I found after reverse-engineering it by inspection.

it's got a 16 byte header,magic number is 0x99, followed by a config byte, 0x14 if relocations present 0x94 if not it's actually more general, but these are the values encountered in micronix. the low 3 bits times 2 plus one is the symbol length, so '9'. the bit for 0x8 is set for 4 byte ints, else 2 byte ints, so 0. the bit for 0x10 is true if little-endian, so 0x10. the next two bits are alignment requirement I think, but 0. finally, the high bit is set if relocs are absent.
16 bits of symbol table size16 bits of text segment size16 bits of bss segment size16 bits of stack + heap16 bits of text segment base16 bits of data segment base
then we have the text segmentthen the data segmentthen the symbol tablethen relocation bytes
symbol table entries are 16 bits of value,8 bits of flagsand 9 characters of name, zero terminated if less than 9 chars.
relocation entries are variable length, where the bytes are decoded byte by byte. relocation control bytes take the following values: 0 : end of relocs 1 - 31 : skip 1 - 31 bytes of segment 32 - 63 : skip 0 to 31 pages, plus the next byte 64 : unused 68 : add text offset 72 : add data offset 76 : unused 80 - 248 : symbol table entry 0 - 42 252 : symbol table 43 and beyond read next byte 0 - 127 symbol table entries 43 - 170 128-256 symbol table entries 256 * (n - 128) + 175 + next byte