[Langdev] - Tephra, an idea for a new systems programming language - [1]
date: 1 September 2024
Background
So, I've decided to jump on the 'creating a programming language from scratch' bandwagon - it was only a matter of time, but my current setup is conducive for experimenting and I feel a bit more established in my programming skills to actually move forward with this decision.
There was no single moment where I really felt like diving into creating my own programming language, or some grand idea - I've been programming in C and assembly for a few years now, basically starting from scratch (my own programming/tech history is tangential and not something I'll really get into here) and I've always felt a bit out of place because I never took any computer science courses either in high school or college, and most of the paradigms and concepts were over my head (and many still are, but I'm crawling my way through that jungle).
So as a result of jumping into this endeavor headfirst without much fanfare or really anyone paying attention, here's what I want to do:
The language itself
I threw a few words around until one sort of stuck, so I'm calling it Tephra.
As a basic framework, Tephra is a language that will be:
- [ ] statically typed;
- [ ] compiled;
- [ ] allows for low level memory access;
- [ ] has support for SIMD
- [ ] a C-like syntax for familiarity
- [ ] smooth C interop because it's vital
- [ ] built-in vector and matrix types for data manipulation and graphical applications
- [ ] immediate-mode, barebones graphics API to get up and running - window on the screen along with color and basic graphics tools like points, lines, 2D and 3D primitives
- [ ] opt-in checks for array bounds and null pointers - allow for setting flags within the code as compiler directives
- [ ] most of all, it should be simple and fast
Semantic choices
I want to stick with a C-style syntax given its ubiquity and familiarity, but with a few stylistic choices. I'm laying these out right now just for the sake of helping me consolidate exactly how I want it to operate.
// Example program - Hello, world!
@insert <io>
int main(){
io.write("Hello, world!\n);
exit 0;
}
Why do I want to write a C-like language when C already exists? Honestly, just for the fun of it. I'm a hobbyist and I like reinventing wheels. At this stage it's probably going to end up a tiny subset of C, anyway - something closer to the barest, most essential bits and pieces of embedded C, suited for graphics demos, basic UI, maybe even some simple games.
This means that the most logical roadmap to follow would be to implement a very barebones C compiler; but adapt it for Tephra - and there are many guides available for that, thank goodness.
The compiler
Given that I want Tephra to be compiled down to baremetal assembly, this is one area that's going to take a lot of my energy and mental effort - which I already feel like I'm going to enjoy just as a massive challenge. The goal for both the compiler and the language is to be lean and lightweight - not to an extreme degree, but ideally they should both be simple and fast. Easier said than done, but there's no better way to learn than to do.
It seems like those two words will be the guiding principles behind this language - simple and fast do a lot of heavy lifting, but they're pretty helpful in this stage, at least. Additionally, the compiler should ideally be self-hosting, in that it can compile itself.
I'm taking inspiration from a few sources - Fabrice Bellard is a major one, with TCC and FBCC. Ideally, I am leaning towards something like TCC - and the idea of compiling programs down into the native architecture of embedded systems or low-resource ones is very appealing.
Purpose of the language
Now is probably a good time to get this part out of the way - what do I actually want to do with this language that I couldn't do with C, or even with assembly?
This might be the wrong way to look at it. I'm approaching this in an exploratory way because by writing this language from the ground up, in all its components:
- Tokens
- Grammar
- Syntax
- Stylistic choices
- Logical choices
I get a much clearer look at C's and assembly's (x86) internals -> why they are the way they are. A big deal for me is pointer arithmetic, and the logic of pointers, which is something I want to explore in Tephra.
So, to set out a limited application - I'd envisioned using it in a DOS context, being compiled to x86 assembly for use as an alternative (maybe?) to a limited subset of C.
- SOURCE LANGUAGE - Tephra
- IMPLEMENTATION - C or C3, with assembly routines
- TARGET ARCH - x86 32-bit machines (for now)
Atomics, multithreading
This isn't currently a priority in my langdesign because the environment I'm implementing this for isn't really one where multithreading operates (i.e. a DOS env) - but it would be interesting to try and integrate pthreads/POSIX-compliance for more modern applications. Still, I have to first get this language off the ground before any of this more complicated stuff gets talked about!