I've seen the terms "frontend" and "backend" thrown around for compilers, but never got their precise meaning. I get that frontend encompasses parsing, and backend includes code generation, but what's the dividing line? Is type checking part of the frontend or backend? What about optimization?
5 Answers
what's the dividing line?
It's a bit fuzzy, and you don't need to draw a crisp line. The way I think about it is: traditionally a compiler can be thought of as a partial function from strings to strings. A string goes in, either the string is accepted, and another string comes out, or the string is rejected and an error comes out. Moreover, we think of the traditional compiler as translating a string in one language into an equivalent string in another language.
The portions of the compiler that deal with the input language make up the "front" end, and the portions of the compiler that deal with the output language make up the "back end".
Is type checking part of the frontend or backend? What about optimization?
I would typically think of type checking as part of the front end, since it deals with the rules of the input language. There are lots of ways to optimize a program, but the "traditional" optimizations like peepholes or whatever are typically taking advantage of details of the output language, so I'd put them in the back end.
As Simon Farnsworth points out in his answer, optimizations where you're reorganizing some intermediate form might be part of the fuzzy middle ground between the front and back end.
This question is particularly fuzzy for me because the majority of the compiler work I've done in my career has been on the semantic analysis after lexing and parsing but before code generation; from my perspective, the middle is where the fun stuff goes.
Moreover, there's a frame challenge to your question. I deliberately said that traditionally a compiler is a translation function between languages. But modern compilers are better thought of as a suite of code analysis services, one of which is translation. In the "compiler as a service" model the "back end" service which does translation is still vitally important, but the analysis services provided by the front end become more prominent.
Is it useful to talk about "front" and "back" ends of the compiler when the kind of problem you're throwing at the compiler every few seconds is "what's the most likely completion of Console.Wri"? When we were designing Roslyn we did not think so much about "who is writing the front end? who is writing the back end?" Rather, we framed it as "what services do all of our stakeholders need? who is available to design and implement those services?"
-
$\begingroup$ So backend can be thought of as another "service" for the frontend whose job is to do the codegen? $\endgroup$Seggan– Seggan2025-08-29 16:00:07 +00:00Commented Aug 29 at 16:00
-
3$\begingroup$ @Seggan: Well the point of my frame challenge is to say maybe front end and back end aren't useful buckets to put the code into. I think of a modern compiler as providing services such as lexing, syntax coloring, code formatting, parsing, completion generation, type analysis, taint analysis, linting. Translation to IL is just another service. $\endgroup$Eric Lippert– Eric Lippert2025-08-29 17:40:01 +00:00Commented Aug 29 at 17:40
-
1$\begingroup$ @Seggan, The back end of a compiler does not "serve" any need of the front end. The front end does not "serve" any need of the back end. They have different jobs to do, and in principle, those jobs could be done at completely different times in completely different processes. Rather than thinking of either one of them as "serving" or, as being driven by the other, you should think of them as different stages in a pipeline. $\endgroup$Ohm's Lawman– Ohm's Lawman2025-08-31 19:20:12 +00:00Commented Aug 31 at 19:20
A typical optimizing compiler can be considered as three parts: a front end, a fuzzily defined (and possibly absent) middle, and a back end.
Front end
The front end of a compiler knows about the syntax and semantics of the language you're compiling, and handles translating that into the internal representation the middle uses.
It takes your code, parses it (generating errors here if your code is syntactically invalid), does semantic analysis of the parsed code (generating errors here if your code doesn't type check, or otherwise has language-specific errors in it), and outputs an internal representation that has the same semantic content as the source code, or an error if it's impossible to do that.
In LLVM land, something like rustc or clang can be driven as a pure front-end, if you tell it to only output LLVM bitcode (which is a serialization form of LLVM's internal representation).
Middle
This is the bit that's target-independent. It takes the internal representation from the front end, analyzes it to find interesting facts for optimizations to work with, optimizes it (in passes, one after the other), and outputs internal representation for the back end to work with.
In some compilers (notably any LLVM based compilers), the optimizations in the middle are very focused, and you use heuristics to determine which optimizations might be worth running at this point in the process.
Some optimizations are also worth rerunning; as an example, dead code elimination can be worth running once on the input from the front end (to remove all the code that's obviously dead, but that was generated from generics, templates, macros or similar mechanisms - with things like obvious tautologies like "is this unsigned value negative?"), and again after something like "value range analysis", where the compiler can now infer that more code is dead because it knows the possible values for some variables, and thus can see that a conditional is always true (for example, if you know that a variable can be between 1 and 230, you now know that checks to see if it's zero are false).
In simpler compilers, there may be no clear middle; it's not worth separating this from the back end if you're only supporting one language and one target system, and you can mingle this with the back end.
LLVM's opt command is a standalone middle.
Back end
This is the bit that depends on the final target. It takes internal representation from the middle, applies any target specific optimizations (those which are only applicable for this target, such as converting arithmetic to the form needed for x86's LEA instruction where possible), does register allocation (choosing which registers to use for which bits of data) and spilling (moving data to and from memory to allow for not having enough registers), converts to instructions, and does instruction scheduling.
These bits all end up intertwined; on an out-of-order CPU, for example, instruction scheduling wants to push instructions as far away as possible from the instructions that generate their input values (so that the previous instructions are more likely to have completed by the time the CPU considers this instruction), while register allocation wants to keep instructions close together so that registers are live for the smallest time (and thus reusable for other data).
Most CPUs have more than one sequence that can implement the same outcome, and the "right" one to use varies by CPU and code details; for example, do you use a conditional instruction (like CSEL or CMOV), or a branch? Additionally, some (mostly older or special purpose) ISAs have delay slots or other oddities that have to be considered during instruction scheduling; you may not be able to read the destination of a multiply instruction until a set number of instructions after the multiply itself, or might have delay slots after branches (instructions that come after a branch in program order, but which are unconditionally executed before the branch - these are especially nasty when they apply to conditional branches).
LLVM's llc is a standalone backend.
-
19$\begingroup$ Your point about dead code elimination in the "middle" passes reminds me of a nasty bug. There was such an optimization much too early in the C# 2 front end which could trim certain kinds of erroneous code before the error checking pass. I moved the optimization to after the error checking pass and some programs that compiled before stopped compiling, but those breaking changes were almost certainly in programs that were already wrong, so we didn't feel too bad about it. $\endgroup$Eric Lippert– Eric Lippert2025-08-28 21:03:16 +00:00Commented Aug 28 at 21:03
I think you have got some good answers from persons actually involved in modern compilers. I will take a slightly different approach - about econonomics. Lets start with some assumptions:
There are a lot of different programming languages. From the top of my head i can recall 8 different ones: Ada, C, C++, Fortran, Cobol, Pascal, Algol, Rust. Wikipedia has long list of languages, but I have selectively pointed out a few.
There are lot of different computer architectures. From the top of my head I can recall 4 different ones: x86, arm, ibm z, risc-v. Again, there are a lot more.
So if you want to support all languages on all architecture you need to have 32 different compilers (8 x 4). But if you draw a line somewhere between the language and the architecture you only need to create 12 compilers (8 + 4). We may call the parts on one side of the line front-end, and the parts on the other side back-end, why not? Could be a lot less expensive.
As it happens, this is what is done by as example LLVM. It has several front-ends, one per language, and several backends, one per architecture. Exactly where that line is drawn, if there are additional middle layers, and how tha different parts communicate are design decisions specific for LLVM. Other compilers have in history and over time made different decisions.
So to come back to your question. Frontend and backend are general concepts, but exact implementaiton is dependant on which compiler you use. Optimizations, as example, can be done in several steps, maybe parts in the front-end and some parts in the middle and some in the back-end -- all dependant on the exact compiler.
-
3$\begingroup$ The most famous example of this is probably GCC, which started its life as the GNU C Compiler but quickly became the GNU Compiler Collection, supporting such diverse frontends as C, Objective-C, C++, Go, D, Fortran, Ada, Pascal, Java, JVM byte code, and practically every backend under the sun, including every OS and CPU architecture you have ever heard of, and an order of magnitude more that you haven't. Also pertinent to this discussion, the GCC has evolved to include not one but multiple layers of "middle-end". $\endgroup$Jörg W Mittag– Jörg W Mittag2025-08-29 19:05:05 +00:00Commented Aug 29 at 19:05
The term “intermediate representation” implies that it’s between front end and back end, and many compiler textbooks describe the “front” as converting from source to some kind of abstract syntax tree, the “middle” as transforming that intermediate representation into another, and the “back” as transforming that into the target language. “Crafting interpreters,” for one, uses climbing up one side of a mountain and down the other as a recurring metaphor for compilation.
You start off at the bottom with the program as raw source text, literally just a string of characters. Each phase analyzes the program and transforms it to some higher-level representation where the semantics—what the author wants the computer to do—become more apparent.
Eventually we reach the peak. We have a bird’s-eye view of the user’s program and can see what their code means. We begin our descent down the other side of the mountain. We transform this highest-level representation down to successively lower-level forms to get closer and closer to something we know how to make the CPU actually execute.
In many software systems, the part of the application users interact with (the frontend) is built using different programming languages compared to the part that runs in the background (the backend). Because of this, they each use their own special tools for building and processing the code.
For example, the frontend is often created with languages like JavaScript or TypeScript, which use tools to convert code into a format that browsers can understand. On the other hand, the backend might be developed in languages like Java, Python, Go, or C#. These backend languages also have their own specific tools to handle the code.
Since each programming language has its own unique way of writing and processing code, the frontend and backend work separately. Even though they are all part of the same application, they are built, updated, and deployed independently based on the different languages used to create them.
-
$\begingroup$ I have yet to see a compiler/interpreter where the frontend and backend are written in 2 different languages $\endgroup$Seggan– Seggan2025-11-25 13:43:33 +00:00Commented Nov 25 at 13:43
-
1$\begingroup$ This answer doesn't really explain the relationship that's specific to compilers. $\endgroup$Barmar– Barmar2025-11-25 17:14:39 +00:00Commented Nov 25 at 17:14