I am doing a hobby project of implementing my own Metal.framework implementation and a software renderer to do the actual rendering once the MTLCommandBuffer is committed. I am trying to come up with a way to model GPU threads on the CPU (using pthreads perhaps). To see if it's feasible, I managed to compile a simple shader.metal into LLVM bitcode (.air file) and then into ARM assembly using LLVM’s llc tool using below command line.
lc -march arm64 shader.air -o shader.s
I also managed to take the ARM assembly, pass it through an assembler and produced an object file
gcc -c shader.s -o shader.o
(Alternatively) I can also hand edit the shader.s and change the shader name to _main and then produce an executable using below:
gcc shader.air.s -o shader
At this point, I am thinking my next steps should be to use Apple’s Hypervisor framework, where each pthread should run a virtual CPU which runs the shader compiled using above steps. I have a sample hypervisor code that runs a small ARM snippet and I’m currently figuring out if I can write my own “driver” to read the shader mach-o file, extract segments from it and map it into the hypervisor memory so the VCPU can read from it.
My questions are:
- Is there an obvious way to do this that I'm not seeing? Specifically, is there an easier way of launching multiple threads on the CPU that do vertex/fragment/kernel processing from the shader executable I just compiled above?