The other day I read an interesting article by Jonathan Rentzsch, on the implementation of mach_override and mach_inject. I think it's interesting, because last year I wrote almost identical code for Windows. Reading this gave me a funny sense of déjà vu.
The main difference between Windows and OSX for this code, is that the mechanisms for doing it in Windows are not properly documented. There are articles for doing it on the net, but they all suffer from problems that I had to overcome.
For a start, Windows does not document that fundamental DLL files (like system32.dll) are mapped into the address space of every process at the same location. This is essential, or else it would not be possible to know where the function to override could be found. Fortunately, it has been well established by people like Matt Pietrik that this can be relied upon, though future versions of Windows could theoretically break this assumption.
More importantly, the only establish technique of code injection in Windows is to create a new thread in the remote process and have it load the code. Unfortunately, we discovered that this causes a race condition that can occasionally cause applications to crash. Since we were building commercial code I needed a better solution.
The solution here was to create the process with the primary thread already paused. Once the program was loaded I could allocate some extra memory (in the same way that mach_override uses
vm_allocate) and write some of my own code into it. Then I could look for the entry point and overwrite the start of the program with a
JMP instruction to go to my freshly created code.
This all sounds straight forward, but it isn't. There are a number of APIs for looking at a remote process's address space, but it turns out that none of them work properly when the process is started in a paused state. It seems that all of the information describing a process's memory is initialised at some point by the application itself.
Fortunately, the memory is partitioned into modules, even though they are not correctly labeled. The solution is to iterate through these modules, and look at the data in them. Since an executable program gets mapped into memory, I realised that one of the modules must start with the head of the file, so I went looking for a Windows executable file header. Once I found it, I was able to trace through the various headers and pointers to find the address of the program entry point. Adding this to the offset of the executable image gave me the entry point of the program.
So while all of the APIs which are supposed to help here were actually useless, it was still possible to find the entry point, and modify the code found there. The code that the program was instructed to jump to would load a DLL which did the rest of the work during its intialisation later on. The end of this code then replaced the original instructions at the program entry point, and jumped back to it. Then when the application was run, it would start with my injected code before doing anything else. It didn't require threads and was therefore free of the errors from the other technique. It also prevented a program from setting anything up which might preclude a function interceptor from working.
The initialisation of the DLL that was loaded ended up doing the work of overwriting any functions that we needed to intercept. This worked almost identically to the mach_override code with one minor exception. mach_override works on a RISC architecture, and so the instructions are guaranteed to be a particular width. Unfortunately, the CISC instructions on Intel chips need to be decoded in order to work out how many bytes wide they were. Luckily, there is a short list of opcodes which are used to start all the functions in Windows, and DavidM spent a few days tracking them all down. This meant that we only needed a small table to work out just how many bytes were needed to be copied from the beginning of a function, so they could be replaced with a
JMP to the replacement code. Fortunately, none of those instructions referred to relative addresses, so no re-writing of these opcodes was necessary.
The weirdest thing I discovered was when various memory pages started changing the permissions. By default, memory with executable code will generate a fault when it is written to, so its permissions have to be updated first. However, I was finding between my loading of my DLL, and the initialisation of the DLL, some pages which I set to being writable had reverted back to being read-only/executable, but only for certain "Office 2000" applications. This was how I discovered that these applications contained statically linked DLLs (which were loaded by the OS after the program was started, but before the entry point of the process was reached) which were modifying the pages of the entry point as well. In other words, someone at Microsoft was also dynamically re-writing the binary at the start of the executable, just like I was.
This kind of dynamic re-writing is simply patching the code after compilation. Given that it comes from Microsoft, I'm surprised that the writer didn't just go and ask to modify the original code! People like me, outside of Redmond, have to perform hacks like this, but Microsoft shouldn't have to! :-)
So the details were very different, but the principles were the same. I always liked this code, and was looking forward to releasing it, but I never seemed to get around to it. DavidW asked that I write all of this up, and I've been very slack about doing it. I've written it down here mostly as a reminder that I need to get around to it one day. Maybe I'll do it while procrastinating about writing my thesis. :-)
Tuesday, April 05, 2005