Attribution: This article was based on content by @hellerve on GitHub.
Original: https://github.com/hellerve-pl-experiments/cj

Introduction

A tiny, dependency-free just-in-time (JIT) compiler that spits out machine code for both x86-64 and ARM64 (AArch64) — autogenerated from ISA (instruction set architecture) specs — is an irresistibly nerdy toy and a useful engineering exercise. That’s exactly what the cj project by @hellerve on GitHub attempts: a minimal JIT written in plain C, with backends produced by small JavaScript scripts and a lightweight abstraction layer for common chores like function prologues and emitting trampolines. The repo includes a minimal language example that shows the end-to-end path from a toy IR to executable code.

This article walks through what cj does, the design trade-offs of a no-deps, autogenerated backend, potential pitfalls (ABI and relocation handling, executable memory), and practical steps to make a project like this robust and reusable. I’ll point to real use cases, compare the approach to mainstream toolchains, and offer concrete recommendations for next steps.

Key Takeaways

  • A no-deps, autogenerated backend simplifies distribution and learning, but pushes correctness and portability burdens on the generator and tests.
  • ABI details, relocations, and executable-memory safety are the usual trouble spots for homegrown JITs; plan tests and validation early.
  • If you want production performance and portability, consider integrating a register allocator and systematic lowering or reuse parts of proven projects such as LLVM/Cranelift.
  • Incremental goals (simple trampolines → prologue templates → register allocator) keep the project manageable and testable.

Background: JIT (just-in-time) compilation generates machine code at runtime from a higher-level representation to improve performance or enable dynamic language features.

Main concepts: what cj implements and why it matters

At its core, a JIT translates some intermediate form into machine code, usually emitting a sequence of opcodes, fixing up references (relocations), and making the memory pages executable. That process exposes several subproblems:

  • Instruction emission: assembling bytes for each instruction according to the target ISA.
  • ABI (application binary interface) handling: ensuring function call conventions, stack layout, and callee/caller-saved registers are correct for the platform.
  • Relocation and addressing: fixing up references to constants, jumps, and external symbols; dealing with PC-relative addressing when needed.
  • Executable memory management: allocating memory, setting permissions (read/write then execute), and applying mitigations such as W^X (write XOR execute).
  • Register allocation and spilling: mapping virtual registers in your IR to physical registers or stack slots.

cj focuses on instruction emission and a small abstraction layer for prologues/epilogues; it deliberately leaves out a register allocator and many ABI corner cases. The notable twist is that its backends are autogenerated from ISA descriptions via JavaScript scripts. That reduces external dependencies (no assembler or LLVM needed) and makes the generated code transparent and tweakable — great for learning and experimentation.

Why autogenerated backends?

Autogeneration can be a force-multiplier: if you have a machine-readable description of an ISA, you can produce encoding tables and emission logic automatically rather than hand-writing hundreds of instruction encodings. This reduces human error in boilerplate encodings and allows quick multi-architecture support.

Trade-offs:

  • If the spec parser/mapping is buggy, every generated backend will inherit the bug.
  • Generated code can be harder to debug unless the generator emits readable tests and diagnostics.
  • Maintaining generator scripts (here, “horrible, horrible JS scripts” as the author jokes) can become a bottleneck unless cleaned and documented.

Practical applications and examples

  1. Embedded scripting in apps Embed a tiny JIT to accelerate hot paths of a scripting language or DSL (domain-specific language). The repository’s minimal language example shows how a toy AST/IR can lower to emitted machine code that runs immediately. This is perfect for small VMs, REPLs, or plugins that need low-latency code generation without pulling in LLVM.

  2. Generating specialized numeric kernels For workloads where inner loops benefit from architecture-specific encodings (SIMD, special instructions), a small JIT can generate tight code paths at runtime for the dataset dimensions or hardware detected at startup. Autogenerated encoders ease targeting multiple ISAs without duplicating hand-written assemblers.

  3. Educational and research tool For compiler students and systems researchers, a no-deps JIT is a clean environment to experiment with instruction selection, calling conventions, trampolines, or interpreter-to-JIT transitions without the noise of large toolchains.

How cj stacks up against mainstream options

LLVM’s MCJIT/ORC layers and Cranelift provide tested, high-performance backends with robust relocation, register allocation, and many targets (Lattner & Adve, 2004). They ship decades of engineering: optimizations, platform quirks, and mature testing. In contrast, cj sacrifices those features for minimalism and ease of inspection.

If your goal is learning, rapid prototyping, or tiny embedded use with controlled ABI expectations, a project like cj is excellent. If you need production-grade performance, advanced optimizations, and cross-platform stability, integrating a larger engine or borrowing its algorithms will pay off.

Best practices and recommendations

  1. Explicit relocation model and test suite Design a clear relocation table emitted alongside code that captures pc-relative vs. absolute references, external symbol fixups, and literal pools. Add unit tests that exercise forward/backward jumps, large code distances, and inter-procedure references.

  2. Separate emission from permission changes Allocate pages with write permissions, emit code, then flip to executable with mprotect or similar. Prefer secure mechanisms available on the platform (e.g., memfd_create on Linux + mmap) and avoid long-lived writable-executable pages. For security context, attacks like return-oriented programming demonstrate why executable memory must be handled carefully (Shacham, 2007).

  3. Start simple with ABI and prologues Create a small set of prologue/epilogue templates for the platform’s calling convention. These templates should be explicit about callee-saved registers and stack alignment. Test them by calling into JIT code from C and vice versa across x86-64 and AArch64.

  4. Plan for register allocation Even a greedy or linear-scan register allocator pays off. The lack of a register allocator forces you to rely on stack-based temporaries or risky ad-hoc spilling. Classic compiler texts discuss this thoroughly (Muchnick, 1997; Appel, 1998).

  5. Embrace fuzzing and cross-architecture CI Autogenerated backends need tests that run on both architectures. Use fuzzers to generate instruction sequences and validate behavior against a reference (e.g., run generated code that implements a deterministic function and compare results). Continuous integration on x86-64 and ARM64 (cloud CI runners or local devices) will catch ABI and encoding differences early.

  6. Instrumentation and validation hooks Emit optional debug instrumentation (e.g., an instruction sequence that writes a magic value before and after a block) so you can detect crashes, miscompiled code, or stack corruption more easily.

Implications & insights

A minimal, autogenerated JIT like cj is an excellent sandbox: it reduces external friction, provides a clear code path from ISA to machine bytes, and lets you experiment with prologue design, trampolines, and code emission concerns without wrestling a huge codebase. That said, the painful bits of any JIT—register allocation, relocation handling, ABI corner cases, and executable memory security—don’t disappear; they simply need to be handled carefully as the project grows.

Academic and industrial experience shows that code generation and register allocation are deep topics with subtle correctness constraints (Muchnick, 1997; Appel, 1998), while executable memory management is a security hot spot (Shacham, 2007). Projects that move beyond experimentation should plan how to either reimplement those features correctly or integrate with proven libraries.

Conclusion & concrete next steps

cj is an engaging project: compact, auditable, and ideal for learning and small-scale JITing. If you want to build on it, consider these actionable next steps:

  • Add a small regression test harness that runs the included minilang on both x86-64 and ARM64.
  • Implement a basic linear-scan register allocator and measure the code-size and speed impact.
  • Formalize relocations and add a verifier that checks PC-relative encodings and immediate ranges.
  • Harden executable memory handling (W^X), add sanitizer-friendly modes, and instrument for crashes.
  • If you need production features later, evaluate integrating Cranelift or LLVM for more advanced backend work while keeping the autogenerated backend as an educational or fallback path.

Original source and further reading

The project repository and examples are available from the author on GitHub: hellerve-pl-experiments/cj (announcement on Hacker News by @hellerve). For deeper reading on compiler backends and code generation, see Lattner & Adve (2004) on LLVM, Muchnick (1997) and Appel (1998) for classical compiler engineering, and Shacham (2007) on exploit techniques that motivate careful executable memory handling.

Further reading / selected references

  • Lattner, C. & Adve, V. (2004). LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation.
  • Muchnick, S. S. (1997). Advanced Compiler Design and Implementation.
  • Appel, A. W. (1998). Modern Compiler Implementation in C/Java/ML.
  • Shacham, H. (2007). The geometry of innocent flesh on the bone: Return-oriented programming and exploit mitigation.

If you want, I can: (a) walk through the repo’s minilang example and explain how IR maps to emitted bytes, (b) propose a minimal linear-scan allocator patch, or (c) draft tests to validate ABI correctness across x86-64 and ARM64. Which would you prefer?

References