Now that exams and projects are all finished for this semester, I finally have the chance to present some of this semester’s most exciting accomplishments to the world. First on this list is YAMS. This was a collaborative course project for EECS 314, Computer Architecture. I worked with Katherine Cass, Jeff Copeland, Andrew Mason, Aaron Neyer, and Thomas Murphy in order to write an HTTP 1.0 server in MIPS assembly.
If you don’t know what assembly language or HTTP are, read this section. Otherwise, you can skip it!
MIPS is an instruction set architecture. In a nutshell, that means that it defines a set of low-level operations for a processor to do. These operations are (very) simple arithmetic, memory access, conditional branching, etc. The operations specified by the instruction set are actually performed by circuits in the processor. A while back, when processor speed was much slower (and compilers were not very sophisticated), programmers would write code directly in assembly language, which is just a text representation of these instructions. This allowed programmers to have fine control over what the hardware was doing.
However, compilers and processor speeds have changed most of that. Compilers are programs that translate other programs from one language to another. Frequently, compilers translate from a high level language into assembly. As compiler technology has improved, compilers have gotten better than humans at optimizing assembly, and so now programmers mainly write in higher level languages. Since this was a course project, we had to write a program in MIPS assembly. The fun and interesting thing about this is that everything is a little harder in assembly.
HTTP is the protocol of the World Wide Web. When you run a web browser, you want to view pages that are on other computers. So, your computer has to talk to those computers to get the pages. When your computer does that, it’s using HTTP. In a nutshell, HTTP has commands like GET, where the client asks for a page, and POST, where it sends some information (like data from a form) and gets a response back from the server.
Just like you need to run a program (a web browser) to get webpages, the server computers have to run programs to serve them to you. These programs are called web servers. The life of a web server is pretty simple. It listens for requests from web browsers, and then sends responses back.
There are versions of HTTP too. The important ones are 1.0, 1.1, and 2. HTTP 1.0 is rather old, but some still use it. Most sites use 1.1, which is a bit more sophisticated (which means more difficult to implement). In both of those versions, the client sends a message in plain text to the server that looks roughly like this:
GET /blog/index.html HTTP/1.1
Host: brennan.io
HTTP 2 is an up and coming standard that aims to speed up HTTP a lot. It’s out of the scope of this discussion, but it’s also pretty interesting. We decided to implement an HTTP 1.0 server because it’s simpler than HTTP 1.1, and that matters a lot when doing assembly programming.
Our team consisted of six extremely geeky EECS students, so we had to make sure the name of our project measured up. YAMS stands for YAMS: the Awesome MIPS Server. Not only is this acronym recursive, but it also contains another acronym (MIPS) inside it. Plus, it brings to mind images of sweet potatoes, and it has served as the source for plenty of dorky puns.
MIPS is not the most common instruction set on the block, at least for personal computers. So, you can’t run a MIPS program on your computer without a MIPS simulator. We used a simulator named MARS.
Since MIPS doesn’t actually have any capability to interact with the Internet, we had to extend MARS by adding system calls. System calls are a really important concept in computers. When a program needs to do anything with the computer that involves outside resources (input, output, dealing with files, etc), it could simply start using that resource. But, if another process was using that resource at the same time, then all sorts of bad things would happen. Plus, the programmer would have to know how to use that resource, when ideally they only should have to know what they want to do with it. So, one of the main functions of an operating system is to provide an interface to those outside resources. When a program needs them, it talks to the operating system by using a system call. The operating system does what the program asks and returns control to the program when it’s done.
In a typical operating system (I’ll talk about UNIX, but there are equivalents
for Windows), there are system calls for networking. We use something called a
“socket”, which is (put simply) just a connection you can throw data into.
Typical system calls for sockets include: creating sockets (socket(3)
),
binding them to IP addresses (bind(3)
), accepting connections on them
(listen(3)
, accept(3)
), and of course reading and writing to/from them
(read(3)
, write(3)
).
MARS allows us to create our own system calls in Java. So, we implemented those system calls (using their Java equivalents) to allow our MIPS program to access the Internet through a similar socket interface to UNIX’s.
Once we had implemented the socket system calls, getting a simple program that
listened for connections and sent a canned response wasn’t too hard. However, a
web server needs to be able to understand requests and send back the correct
file. So, we had to “parse” the requests. This wouldn’t be too difficult in
most programming languages. But in assembly, there are no functions for working
with strings (that is, text). So, we had to implement our own versions of
several common string functions. This was one of my tasks, so I know the most
about it. I wrote 12 functions that are typically found in the C header file
string.h
.
In order to make sure the code worked, I wrote tests for each function. Many programming languages have tools for writing tests quickly and easily. Of course, assembly does not. I came up with a “unit testing framework” that would run a series of tests, and print a report on whether each one passed or failed. With this, I was able to make sure that the string functions worked before we ever used any of them.
Using these string functions, we were able to parse two types of requests: GET and POST. Since I didn’t write any code for parsing requests, I can’t go into very many details. But, the main point was that this code took a request and returned the request type, URL, important headers, and any data included in the request.
A GET request contains no data. In order to respond to these, we simply opened the file. If the file didn’t exist, we sent back an HTTP 404 error. If it did, we sent it to the client. One important issue is that you normally need to know the length of a file before sending it back to the client. This would mean reading the whole file into a buffer which may not be big enough. Instead of doing that, we simply used the HTTP header “Transfer-Encoding: identity”. This basically means that the response will be completed when you close the connection, and so the server doesn’t need to declare the size of the file beforehand. Therefore, we had a very simple implementation in which we read sections of the file into a buffer, wrote that into the socket, and repeated until the file ended.
A POST request contains data, but other than that it is pretty similar to a GET request. Semantically, POST means that the client is sending data which will change the server’s state. Since our server only serves files (i.e., it’s static), supporting POST requests doesn’t really make sense. The reason we supported it is for our little Easter Egg, a Brainf*** interpreter.
Brainf*** is a very silly little programming language. It’s meant to be the simplest language that can be Turing-complete (i.e. compute any computable function). It consists of just 8 different commands (that are one character each). Basically, this language simulates a Turing machine. You are given a “tape” of memory that is very large. You have commands to move left and right along the tape, increment and decrement the value (a single byte) stored at that location, and to input/output the value at that location. Finally, there are branching instructions that allow conditional branching based on the current value in the cell.
Writing any program in Brainf* is very difficult (even more so than
assembly!). However, many people have taken the challenge and produced
interesting Brainf* programs, which can be found all over the Internet.
However, the trade-off with such a difficult language is that it is very easy to
implement! As an Easter-Egg feature in our web server, we created a Brainf***
interpreter (which was the other thing I wrote the code for). The public
interface had two parts. First, you load a program by sending it in a POST
request to the URL /load
. Then, you run it by sending a POST request with any
input to the URL /run
. The interpreter returns its output in the POST
response.
The code loading function performs a few tasks. Its primary task is to take code and copy it into the server’s internal BF code buffer. As it does so, it removes all text that isn’t one of the 8 control characters of BF, and it makes sure that all looping brackets are balanced. In a sense, this is a syntax checking phase of the interpreter.
The /run
function is a loop that processes each command in order. It
maintains a pointer to the current instruction, input and output buffers, and
the current tape cell. Increment, decrement, input, output, left, and right
instructions are all fairly simple – they modify the pointers and values in the
tape cell. The looping constructs are slightly more complex. First, they test
the loop condition. In the case that they need to skip the block of code, or
loop back to the beginning, they read through each instruction until they find
the matching bracket. All in all, the /run
command is implemented in only 140
lines of assembly code (including comments and whitespace).
In order to use the Brainf*** interpreter, we included a webpage that serves as
a user interface for the interpreter. It sends AJAX requests to the web server
to load and run code. However, you could use something as simple as the curl
command to send programs and input to the server.
If you want to try all this fun stuff out for yourself, you can, since all the code is on GitHub! There are instructions on the project homepage for patching MARS and running it. Since building the modded version of MARS requires Unix, you can download a prebuilt version at the release page.
If you decide to look at the presentation, you’ll need to get the “reveal.js”
code placed in the html/reveal.js
folder. If you cloned our repository using
Git, you can just git submodule init
and git submodule update
. Otherwise,
you’ll have to download their ZIP and extract it into that folder.
To run everything, you just fire up MARS, load mips/main.asm
, and hit the run
button. Then, point your browser at http://localhost:19001/. You can see the
presentation at http://localhost:19001/presentation.html and the interpreter
at http://localhost:19001/brainfuck.html.
I’m also hosting my own copy of the YAMS website content here. You can look at our presentation here. Not all the links work correctly (and the interpreter won’t work since the server isn’t running). This way you don’t need to get reveal.js to see our presentation.
We spent a lot of time toiling away at YAMS, and the end result is something we think is pretty darn cool. There’s a lot I didn’t get the chance to present in this blog post. I focused mostly on my own code because I knew it best, but you can see it all at GitHub. Hopefully somebody out there will find it pretty interesting!