
An Introduction to CompilersÂ
Most machines are pretty simple things, hardwired to carry out a small set of tasks. Think toaster, TV, or radio. Computers, however, have always been the machine that you could communicate with and through a mysterious ‘coding’ language, bend to your specific commands.
But how does a piece of English-based code, whether it’s the printf() function in C++ or the Python print() equivalent, cause a machine to do anything?
This is the moment every student discovers the notion of the even more mysterious ‘compiler’. It is this esoteric piece of software that makes the magic to happen between human and machine.
So what is a compiler? How do they work? What heck is an interpreter? And how do you use them?
What is a compiler?
Computers are digital machines which means they communicate in 1s and 0s, referred to as ‘bits’ of information. These bits are charges held in transistors and other components of modern computers. In fact, they are the lowest level of communication with a computer and they can be manipulated via the use of machine code.
Machine code is the lowest level, direct computer-speak from machine to machine. The interface between the high-level computer code that software designers use and this low-level machine code is called a compiler.
Believe it or not, computer code is always created to be intuitive and as human-like as possible, although it doesn’t always seem that way. This is one of the reasons that very intuitive languages like Python have become so popular.
This high-level code is meaningless to your machine and needs to be translated or ‘compiled’ into something it can understand. A compiler, therefore, is a piece of software that converts a programming language into machine code for a computer.
Or according to Webster’s a compiler is a:
‘computer program that translates an entire set of instructions written in a higher-level symbolic language (such as C) into machine language before the instructions can be executed’
Merriam-Webster Dictionary
How do compilers work?
Between the moment you hit the compile button and the creation of your executable code, what exactly is happening? There are a number of steps to go through before your code is usable. The first three involve a deep analysis of the code you have written. After this, the first version of usable code is generated. At this point, a good compiler will go through an additional optimization step before creating the final machine code instructions for your machine. Some describe this process in four steps, others in seven. Here we’ve broken it down into seven steps, but essentially these varying approaches all describe the same process.
- Lexing or lexical analysis – the compiler breaks your code up into smaller units called lexemes and creates tokens that represent these lexemes. Lexemes include things like function names, variable names, operators, etc. This process is also referred to as tokenisation.
- Parsing – the resulting tokens are arranged into a ‘parse tree’ that represents the data structure of your code, including loops, expressions, etc. This phase is also called syntax analysis and the structure created an abstract syntax tree or AST.
- Semantic analysis – the abstract syntax tree is now examined for any semantic errors such as use of undeclared variables, wrong use of variable types, etc.
- Intermediate code generation – this is machine-independent code created for an ‘abstract machine’. This type of code is easier to optimize than actual machine code.
- Optimization – the compiler examines the intermediate code for inefficiencies such as unused variables, dead ends in the code, etc. The focus is on using as few resources as possible while retaining the original meaning of the high-level code.
- Machine code generation – the hardware in your computer will use this version directly.
The main types of compiler and their applications
Here are nine different types that you are likely to encounter in your career as a developer.
- Traditional Compilers are old-style compilers typically used with languages like C++ and Pascal. They convert your high-level code directly into your computer’s native machine code.
- Converters can compile directly from one high-level programming language to another saving time on rewriting the code with all the potential for error this can lead to.
- Incremental Compilers save time and resources by only compiling any parts of the source code that were changed on the last iteration. This can save an enormous amount of time compared to compiling the entire body of code every time a change is made.
- Cross-Compilers operate on one machine but are capable of creating code for another computer. This allows a developer to create code for multiple platforms from one machine.
- Single-Pass Compilers go through every line of code one time, compiling as they go. This makes it more efficient and smaller than a multi-pass compiler.
- Two-Pass or Multi-Pass Compilersscan the source code more than once with specific goals each time. Typically the first pass will involve lexical, syntactic, and semantic analysis while the second pass will handle optimization.
- Just-In-Time (JIT) Compilers perform key optimization of code at the time of execution of the progamme. The Java Runtime Environment (JRE) uses this to make sure that every time an executable runs it will make the best use of the resources available to it.
- Ahead-of-Time (AOT) Compilers compile frequently used sections of code at startup of the program to save time. Both .NET and the JVM use them.
- Binary Compilers or Recompilers take the executable already compiled binary code output of a program and either analyze and optimize it into better code or translate code created for one hardware platform into machine code for another.
What is an interpreter?
Sometimes the term interpreter is used interchangeably with compiler and certainly, the two can be easily confused. So what are the main differences between these two?
- Compilers generally convert all of your code into machine code in one go, although as we have seen, this is not always the case. Interpreters convert your code one line at a time.
- Compilers generate generally convert to intermediate code first while interpreters do not. This makes a difference in memory usage.
- Compilers show all the bugs and errors generated in a program at the same time. Interpreters stop at the first error encountered so you can deal with each issue as a separate problem before you continue.
- Compile time tends to be more with compilers, their executable are more efficient. Interpreters are the reverse.
- More traditional languages like C, C++, and Java tend to use compilers while languages like Python, PHP, and Ruby tend to use interpreters.
Compilers for the Python programming language
You are spoiled for choice when it comes to great compilers/interpreters for Python and each has its advantages and disadvantages. Rather than just choose one that you love it’s worth looking at the project needs and deciding what will be the best one for the job. Here are some of the most popular compilers in use for Python today.
- PyCharm – one of the most popular.
- Spyder – very popular among engineers and scientists due to its excellent scientific environment.
- Pydev – preferred by most developers for its high usability.
- Rodeo – for data scientists and analysts for its great extraction tools.
- IDLE – a great simple interface that is perfect for beginners.
These are just some of the most popular but there are many more choices when it comes to compiling your Python code so don’t be afraid to look around for what works best for your needs.
The Python Compilation Process
You will see a couple of file types generated when you compile your Python code, so what’s going on? The truth is that Python source code is both compiled and interpreted and here’s how that works. The first step is to save your human-readable files with the .py extension. These files are then compiled into byte code which goes on to be converted into machine code with the .pyc extension. The next big step is linking. In this phase, the various functions in your code are linked together with their definitions and become usable.
The compilation part of the process is essentially hidden from you, the programmer, and what you see on screen as you debug has the behaviour of an interpreter. For this reason, Python is said to be an ‘interpreted’ language. When you program with Python you’ll find you have the efficiency of compiled executable files with the ease of use of an interpreted language.