====== AOT Compiler ====== The Ahead of Time Compiler that was originally used to compile SharpOS did its job but suffered from a few architectural weaknesses that made it difficult to expand. The SharpOS developers thus decided to concentrate on a new compiler with a more compartmentalized design. However, the AOT Compiler is still a valuable learning tool for those people interested in what it takes to generate machine code from Common Intermediate Language instructions. ===== Phases ===== ==== IR Generation ==== The IR generation phase of the AOT compiler generates a tree of IR information which is rooted in the SharpOS.AOT.IR.Engine instance. It creates a SharpOS.AOT.IR.Class instance for each type, and a SharpOS.AOT.IR.Method instance for each method. The AOT skips methods which are marked as ADC stubs, and replaces calls to them to the architecture-specific implementation which is selected for the current compilation. It also skips methods which are marked as unmanaged (for now, as there is currently no way to compile native code into the kernel). The AOT also generates IR for itself, but only for classes marked with SharpOS.AOT.Attributes.IncludeAttribute applied. ==== IR Processing ==== ==== Encoding ==== In the method Run after processing the IR, asm.Encode (this, options.!OutputFilename), is called, where this is the engine containing all the IR data and the second parameter is the name of the binary file that will contain the encoded data (currently a PE file). As you noticed asm is only an interface (IAssembly) that every encoder has to implement in order to be used by the engine. The x86 encoder starts with adding the PE header (AddPEHeader) that is also containing the MultiBoot (AddMultiBootHeader) structure so that Grub recognizes the binary as a valid kernel and runs it. After calling AddEntryPoint, which actually adds the code that is called by grub once the kernel was loaded, which also setups the x86 stack and calls all the Type Initializers (.cctor) of all defined types in the kernel, the encoder gives the control to AssemblyMethod for every defined method. After encoding all the methods, AddHelperFunctions gets called which adds support for arithmetic operations that need long operands. ===== Architecture Dependent Code ===== The SharpOS.AOT provides a method of switching between implementations (layers) of a common class interface. The AOT looks for an assembly attribute of type SharpOS.AOT.Attributes.ADCInterfaceAttribute, which informs the AOT which namespace represents the ADC interface. This namespace would be the one used by other kernel code to make calls into the ADC implementation. Each method that is supposed to be implemented by the underlying ADC layer should be marked with SharpOS.AOT.Attributes.ADCStubAttribute. Each implementation must also provide an ADCLayerAttribute that expresses the namespace containing the implementation code and the name of the processor that the layer applies to. The SharpOS.AOT chooses which ADC layer to use based on what processor it is compiling for. When the AOT finds a reference to an ADC stub method, it translates the reference into an equivalent one that points directly to the ADC implementation method. It does this by first chopping off the ADC interface namespace (provided by the assembly-level SharpOS.AOT.Attributes.ADCInterfaceAttribute), and replacing it with the ADC layer namespace (provided by the assembly-level ADCLayerAttribute). Thus, the code gets compiled as if the original code referenced the processor-specific implementation. This translation is done before the IL is converted to the internal representation (while the SharpOS.AOT is still working with data provided by Mono.Cecil). ===== AOT Source Code ===== * IR Intermediate Representation related classes. * IAssembly.cs provides the IAssembly interface, used to encode the IR code into the final executable. An implementation of IAssembly applies to a specific processor type, therefore this is the primary interface to implement when porting the AOT. For now there is only SharpOS.AOT.X86.Assembly. * Engine.cs This is where most of the magic happens. It selects the IAssembly implementation based on the target architecture string provided by the frontend (it’s controlled by command line options in the current frontend). It then takes the list of source assemblies provided by the frontend (which are already compiled as .NET DLLs) and selects the classes and the methods that are going to be processed and compiled into the kernel. It creates an IR object for each class and method which is being compiled. It then processes each method separately (using SharpOS.AOT.IR.Method.Process()). Finally, the kernel binary is encoded (created) using the selected IAssembly implementation. * Class.cs Provides the SharpOS.AOT.IR.Class class, that represents a type in the IR. A “Class” instance can also represent a structure. The class enumerates the IR methods (represented by the Method class in the same namespace) that belong to it. The “Engine” class (which is running the show) maintains a list of Classes (internally) that are going to be processed and compiled into the kernel binary. * Method.cs Methods are represented in IR using this class. All method-specific data is contained here. This class is responsible for splitting it’s list of IL instructions into discrete code blocks (represented by the “Block” class) and linking them to each other. It is also responsible for transforming the IR to/from it’s Single-Static-Assign (SSA) representation, and performs all processing related to its instructions. It also takes care of allocating the registers for the identifiers used in its instructions. * Block.cs Each Block object contains chunks of IL and their IR instructions. The class handles conversion from IL to the IR representation. The code blocks are separated by potential flow, so that it is always possible to hop from block-to-block while still following the possible code flow. This is called code flow analysis. * Operators The files in this directory represent the different types of operators like unary, binary, boolean and so on. * Binary.cs Represents a binary operator (has two operands). Operators handled with this class are defined by the SharpOS.AOT.IR.Operators.Operator.BinaryType enumeration: * Addition (+), in three flavors, normal addition (no overflow check), and signed/unsigned addition w/ overflow check * Subtraction (-), in three flavors, unchecked and signed/unsigned checked subtraction. * Multiplication (*) in three flavors, unchecked and signed/unsigned checked subtraction. * Division (/) in two flavors: signed/unsigned * Remainder (%, “modulus”): in two flavors, signed/unsigned * SHL (<<, “shift left”) * SHR (>>, “shift right”): two flavors, signed/unsigned * And (&, “bitwise AND”) * Or (|, “bitwise OR”) * Xor (^, “bitwise exclusive-OR”) * Boolean.cs Represents a boolean operator. These include TRUE, FALSE, AND (&&), OR (||), and “Conditional” * TODO more * Operands Everything in this directory is related to IR operands as constants, registers, fields and so on. ’’‘(TODO: Describe every file in that directory.)’’’ * Instructions The classes related to the IR instructions are contained in this directory. ’’‘(TODO: Describe every file in that directory.)’’’ * Attributes Here are all the attributes that are needed by the kernel to change its behavior. ’’‘(TODO: Describe every file in that directory.)’’’ * X86 The x86 encoding related files, after the processing of the IR has been accomplished, are contained in this directory. (TODO: Describe every file in that directory.) ===== Step by Step ===== 1. Loads the IL using Mono.Cecil (AOT/Core/IR/Engine.cs:448) 2. Creates intermediate representation (IR) objects to represent types/methods. (AOT/Core/IR/Engine.cs:521) 3. Converts the IL bytecode into the intermediate code representation (fills the methods) (AOT/Core/IR/Engine.cs:587) 4. Performs a number of IR optimizations on each method (AOT/Core/IR/Method.cs:2712) 5. Calls an IAssembly implementation to handle encoding to native code (AOT/Core/IR/Engine.cs:622) 6. IAssembly prepares the IR for encoding as native code. The rest of the sequence is dependent on the IAssembly (aka, the architecture) being compiled for. For X86, the encoding is like this: 1. PE/COFF header: provides method names when debugging 2. Multiboot header: hints for the boot loader, including entry point address 3. Methods 4. Helper functions 5. Data 6. Symbols Then it saves all this encoded data to the file. This process happens in X86.Assembly.Encode() (AOT/Core/X86/Assembly.cs:1271).