enumer8's blog

[Assembly-x86] - Windows with Windows

Unnamed project discussion

Okay, here's what I want to do: write a renderer in basically 100% asm using the Win32 API. At first, I was pretty stuck on using just GDI alone and being content with software rendering. I mean, I love software rendering and I've already managed to achieve drawing to the screen with X11/Xlib as well as OpenGL (both of these were tested on my laptop with discrete onboard graphics and not an external GPU, so...).

This is about a secret project I've been hammering away at since mid-2023, let's say around July or so. I got the idea for it after hacking through renderr, that X11 software renderer and managing to get line drawing working - and then getting sick of working with Xlib (it's not that bad, and I'll have to revisit it at some point). Doing this new project entirely in assembly wasn't the goal from the beginning; originally, I wanted it to be a healthy mix of C89 and assembly, using asm to optimize hotspots. But as time went on, I realized that I wasn't content with delegating asm to that role, because ultimately it still was a black box in my brain. Why was asm considered this efficient in these specific problem areas? Couldn't a compiler do the same thing as my handwritten assembly? No, it turns out... it can't. Months passed and I just kept going deeper and deeper into the asm rabbit hole. I wasn't entirely fresh to programming with assembly, but I hadn't put this many consecutive hours into it before. My entire interest in programming really started with hex editing ROMs and debugging NES/SNES emulators when I was quite young, messing with values and realizing they affected the programs I was running. From that point onwards, I was never content with high-level abstractions and letting frameworks or APIs take over to run the show; I had to know how and why every single detail of my program ran the way it did, which led neatly to working with C and assembly. Here we arrive at the crux of this particular application - a 32-bit rendering program that uses the Win32 API to open a window, and then... hardware acceleration AND software rendering in order to display... things. Primitives. OBJ models. Whatever.

The real breakthrough with this project was realizing that not only was I now dedicated to making the entire damn thing work in x86 assembly code, I wanted to understand SIMD better - and here's where the real understanding sort of started to take shape in my mind. By this time (early 2023) I was fairly comfortable with poking around in assembly and implementing scalar programming. Most of my assembly shenanigans were straightforward and not very remarkable. They're still not remarkable, but I'm having a LOT more fun now - which brings me to SIMD and realizing that the program I'm writing must take advantage of MMX and SSE instructions. So, in essence this is a multimedia application that takes advantage of streaming video and sound within the parameters of rendering - and those domains are perfect for SIMD work. The PS1 development work that I'm doing stops at Gouraud shading; and with this, I have the opportunity to try and tackle Phong shading which is another exciting step. This is a program that I dream of running on a computer with something like a 500MHz to 1GHz CPU and 512MB to 1 GB total of RAM (which is a lot to play around with, don't get me wrong). But the constraints are really what drives the development here.

Checking for possible instruction sets on a particular CPU to augment the code (setting up a 32bit DWORD and having each bit represent a factor in identifying and cataloging the CPU like whether x87 FPU instructions are possible, MMX, what CPU family they come from, etc) - this was an idea taken from the Little Big Adventure 2 engine, the specific header file you can find on GitHub here. I mean, over half the codebase of the first game is in assembly! That in itself is inspiring to me because as buggy as the game itself actually is, the fact that it ran so well and was full of so many cool features implemented in such a granular way really motivates me to work on my own program.

Here's how the CPU signature function works as documented in the codebase.

; Reserved-------------------------++++++++|||||||||||||||||||||||| ; Multimedia Extensions (MMX)--------------+||||||||||||||||||||||| ; Reserved----------------------------------+++++++|||||||||||||||| ; Conditional Move Instruction---------------------+||||||||||||||| ; Machine Check Architecture------------------------+|||||||||||||| ; Global Paging Extension----------------------------+||||||||||||| ; Memory Type Range Registers-------------------------+|||||||||||| ; Reserved---------------------------------------------+||||||||||| ; Reserved----------------------------------------------+|||||||||| ; APIC---------------------------------------------------+||||||||| ; CMPXCHG8B Instruction-----------------------------------+|||||||| ; Machine Check Exception----------------------------------+|||||||
; Physical Address Extensions (2MB Pages/36bit adresses)----+|||||| ; Model Specific registers-----------------------------------+||||| ; Time Stamp Counter------------------------------------------+|||| ; Page size extensions (4MB pages)-----------------------------+||| ; Debugging Extensions (I/O Breakponits)------------------------+||
; Virtual Mode Extensions----------------------------------------+| ; Floating Point Unit on Chip-------------------------------------+

You can just bitmask to figure out what elements are available to take advantage of on the hardware you're running on, which is remarkable to me - and definitely something I want to try implementing in my own program. There's more, but let me get onto the other involved section of this project - DirectX.

DirectX

To tell you the truth, I've been going back and forth on whether I want to involve DirectX at all, because in truth it doesn't fit the vision I have for the program. And to get it out of the way from the beginning, I really, really, really hate VS Code. I don't mind developing on Windows (still not as convenient as Linux) because I use WinAsm; it works fine for my purposes and iteration times are very fast. Get down some scraps of code and press Shift+F8 to assemble, link, and run at once. If it breaks the output window gives me errors and I can work through them to fix things before trying again. The workflow itself is so nice.

Success!

Well, for better or for worse, I finally have some results.

ASM!

Granted I'm rendering a bitmap using just GDI, but that goal feels like it's been reached. The real aim of course has been to put together a rasterizer - but actually being able to display stuff to the screen is a real win. I'll take it!