Day 30: My Journey into Assembly Begins

I started out trying to install a new CLI program and when it told me to copy some binary code into the PATH variable I figured I should just start reading a book I was recommended about the basics of programming and assembly code. Now it might seem like I took an extreme turn instead of learning a small topic, and you would be right. However, it’s more about understanding how everything is connected rather than just learning how to perform specific tasks. I figure if I keep looking to understand the connections between basic concepts in computing I will development further and will be more flexible when learning new technologies. I expect that eventually I will have connected all the dots and solved my problems along the way. I hope…

TLDR;

Okay, so here are the highlights of what I did:

  • I read two chapters of the book called “Programming from the Ground Up” by Jonathan Bartlett. It started of cool but got intense once it started talking about the different types of data accessing methods. I will very likely need to go over the material more than once but I just really want to jump into the assembly code and gain exposure to the rest of the book as soon as possible.
  • I tried to add the pup program to Git Bash only to fail LOL. I will try again tomorrow after looking into how program are connected to the computer / operating system.

Notes from “Programming from the Ground Up”


GCC Tool Set

The GCC (GNU Compiler Collection) tool set is an Application Stream containing more up-to-date versions of development and performance analysis tools. It is basically a software collection offered by GNU and is commonly installed on many versions of the GNU/Linux operating systems.

The Kernel

The kernel is the core part of an operating system that keeps track of everything. The kernel is both an fence and a gate. As a gate, it allows programs to access hardware in a uniform way. Without the kernel, you would have to write programs to deal with every device model ever made. The kernel handles all device-specific interactions so you don’t have to. It also handles file access and interaction between processes. For example, when you type, your typing goes through several programs before it hits your editor. First, the kernel is what handles your hardware, so it is the first to receive notice about the keypress. The keyboard sends in scancodes to the kernel, which then converts them to the actual letters, numbers, and symbols they represent.

Keyboard -> Kernel -> Windowing system -> Application program

In our case, the kernel is Linux. Now, the kernel all by itself won’t do anything. You can’t even boot up a computer with just a kernel. Think of the kernel as the water pipes for a house. Without the pipes, the faucets won’t work, but the pipes are pretty useless if there are no faucets. Together, the user applications (from the GNU project and other places) and the kernel (Linux) make up the entire operating system, GNU/Linux.

Machine language

Machine language is what the computer actually sees and deals with. Every command the computer sees is given as a number or sequence of numbers.

Assembly Language

This is the same as machine language, except the command numbers have been replaced by letter sequences which are easier to memorize. Other small things are done to make it easier as well.

High-Level Language

High-level languages are there to make programming easier. Assembly language requires you to work with the machine itself. High-level languages allow you to describe the program in a more natural language. A single command in a high-level language usually is equivalent to several commands in an assembly language.

Von Neumann architecture

Modern computer architecture is based off of an architecture called the Von Neumann architecture, named after its creator. The Von Neumann architecture divides the computer up into two main parts – the CPU (for Central Processing Unit) and the memory. This architecture is used in all modern computers, including personal computers, supercomputers, mainframes, and even cell phones.

In addition to all of this, the Von Neumann architecture specifies that not only computer data should live in memory, but the programs that control the computer’s operation should live there, too. In fact, in a computer, there is no difference between a program and a program’s data except how it is used by the computer. They are both stored and accessed the same way.

The CPU

The CPU (Central Processing Unit) reads in instructions from memory one at a time and executes them. This is known as the fetch-execute cycle. A computer functions not by simply storing data but rather by using the CPU to to access, manipulate, and move the data. The CPU contains the following elements to accomplish this:

  • Program Counter -> used to tell the computer where to fetch the next instruction from. It holds the memory address of the next instruction to be executed.
  • Instruction Decoder -> Figures out what the instruction means. This includes what process needs to take place (addition, subtraction, multiplication, data movement, etc.) and what memory locations are going to be involved in this process. Computer instructions usually consist of both the actual instruction and the list of memory locations that are used to carry it out. The instruction comes from the memory address provided by the ‘Program Counter’.
  • Data bus -> used to fetch the memory locations to be used in the calculation. The data bus is the connection between the CPU and memory. It is the actual wire that connects them. If you look at the motherboard of the computer, the wires that go out from the memory are your data bus.
  • General Purpose registers -> where the main action happens (Addition, subtraction, multiplication, comparisons, and other operations). In addition to the memory on the outside of the processor, the processor itself has some special, high-speed memory locations called registers. There are two kinds of registers – general registers and special-purpose registers. However, computers have very few general-purpose registers (Is this like RAM??). Most information is stored in main memory, brought in to the registers for processing, and then put back into memory when the processing is completed. special-purpose registers are registers which have very specific purposes. We will discuss these as we come to them.
  • Arithmetic and logic unit -> further processes the relevant data and decoded instructions it receives from the ‘General Purpose Registers’. Here the instruction is actually executed. After the results of the computation have been calculated, the results are then placed on the data bus and sent to the appropriate location in memory or in a register, as specified by the instruction.

Data Accessing Modes

Processors have a number of different ways of accessing data, known as addressing modes. There are many forms but the most important ones are listed below.

  • Immediate Mode -> The data to access is embedded in the instruction itself. It is the simplest mode. For example, if we want to initialize a register to 0, instead of giving the computer an address to read the 0 from, we would specify immediate mode, and give it the number 0.
  • Register Addressing Mode -> the instruction contains a register to access, rather than a memory location.
  • Direct Addressing Mode -> the instruction contains the memory address to access. For example, I could say, please load this register with the data at address 2002. The computer would go directly to byte number 2002 and copy the contents into our register.
  • Indexed Addressing Mode -> the instruction contains a memory address to access, and also specifies an index register to offset that address. For example, we could specify address 2002 and an index register. If the index register contains the number 4, the actual address the data is loaded from would be 2006. This way, if you have a set of numbers starting at location 2002, you can cycle between each of them using an index register. On x86 processors, you can also specify a multiplier for the index. This allows you to access memory a byte at a time or a word at a time (4 bytes).
  • Indirect Addressing Mode -> the instruction contains a register that contains a pointer to where the data should be accessed. For example, if we used indirect addressing mode and specified the %eax register, and the %eax register contained the value 4, whatever value was at memory location 4 would be used. In direct addressing, we would just load the value 4, but in indirect addressing, we use 4 as the address to use to find the data we want.
  • Base Pointer Addressing Mode -> This is similar to indirect addressing, but you also include a number called the offset to add to the register’s value before using it for lookup.

Conclusion

That’s all for today. If you are interested in the MIT course you can check out the video lecture I’m currently going through. The lecture is helpful but isn’t sufficient by itself. Anyways, until next time PEACE!