Whitesmith's C

Morrow needed a C compiler, and they picked up what for the time was a pretty complete compiler that generated pretty good code, for the 8080. The documentation was stellar, and they shipped the compiler in unlinked form, so you could replace the OS layer and not need to ship source code. As the compiler had 4 passes, it was not exactly fast on the 4Mhz Z80.

I have to say, I'm not impressed with some aspects of the Whitesmith's C compiler. It's probably because it is very early C, around the White Bible era, and it was justly famous for originally having a really weird libc, with names that were different from the V7 libc, putfmt instead of printf, etc. Eventually, they came around to a more reasonable libc, but the compiler still had a truly bizarre assembly format, called a-natural.

It looked kinda c-ish, here's hello world:

L1:0150,0145,0154,0154,0157,054,040,01670157,0162,0154,0144,012,0public _printfpublic _main_main:call c.enthl=&L1sp<=hlcall _printfaf<=spjmp c.ret

Also odd, and this breaks almost every C code you can get from anywhere; porting even the most simple program means adding a lot of explicit zero initializers to get the thing to link.

The whitesmith's C compiler is very lame in one important way: BSS symbols never get allocated in the object file. That means that code like

int foo;bar() { foo = 9; }

does not link. this was a bad call, I think. The only workaround is to modify the source to move foo to data by giving an explicit = 0;

FEH.

the object file format, which I needed to crack in the 'exec' call and other tools, is pretty straightforward. There's code for this in src/sgs/wslib.c. The micronix man page for link is pretty comprehensive, which I found after reverse-engineering it by inspection.

it's got a 16 byte header,magic number is 0x99, followed by a config byte, 0x14 if relocations present 0x94 if not it's actually more general, but these are the values encountered in micronix. the low 3 bits times 2 plus one is the symbol length, so '9'. the bit for 0x8 is set for 4 byte ints, else 2 byte ints, so 0. the bit for 0x10 is true if little-endian, so 0x10. the next two bits are alignment requirement I think, but 0. finally, the high bit is set if relocs are absent.
16 bits of symbol table size16 bits of text segment size16 bits of bss segment size16 bits of stack + heap16 bits of text segment base16 bits of data segment base
then we have the text segmentthen the data segmentthen the symbol tablethen relocation bytes
symbol table entries are 16 bits of value,8 bits of flagsand 9 characters of name, zero terminated if less than 9 chars.
relocation entries are variable length, where the bytes are decoded byte by byte. relocation control bytes take the following values: 0 : end of relocs 1 - 31 : skip 1 - 31 bytes of segment 32 - 63 : skip 0 to 31 pages, plus the next byte 64 : unused 68 : add text offset 72 : add data offset 76 : unused 80 - 248 : symbol table entry 0 - 42 252 : symbol table 43 and beyond read next byte 0 - 127 symbol table entries 43 - 170 128-256 symbol table entries 256 * (n - 128) + 175 + next byte

Decompilation:

the NSA has released a stupendous program, Ghidra, which is a wildly sophisticated reverse engineering framework. It is extensible, and supports dozens of processors, including the Z80. I have published an extension the consumes the Whitesmith's object file format, and and am working on getting the decompiler to understand the calling sequence and Whitesmith's special worker functions, which are well documented. https://github.com/cm68/ghidra-whitesmiths is where you'll find it; it's a work in progress.