Zephyr

Zephyr is an 8-bit CPU written in Verilog. The goal of this project was to explore the entire RTL-to-GDSII flow and how ASICs are made.

This helped me learn many ECE skills along the way and learn about the industry I plan to pursue a career.

The Plan

I wanted to learn about the entire process. From transistors to (relatively) high-level assembly code, and everything in between.

This meant that I had to create my own assembly language, zasm, my own compiler, interpreter, and assembler, all the Verilog code needed to describe the CPU, and utilize open-source RTL-to-GDSII flows to generate a fab-ready .gds file.

I'll go bottom-up, starting at Verilog.

Verilog

Zephyr is 8-bit, so I had to first think about what actions I wanted it to be able to perform. With two bits, I could define the CPU's four functions as NOP, LOAD, STR, and ALU. NOP is no operation, LOAD will load an 8-bit from RAM into one of Zephyr's four CPU registers (more on this later), STR will store an 8-bit value from one of Zephyr's registers into RAM, and the ALU operation will perform adding, subtracting, multiplying, and dividing operations.

Zephyr is equipped with a bank of four zregisters, R0-R3:

module zregister (
    input [7:0] IN,
    input OPCODE,  // reading or writing
    input [1:0] REG_SEL,  // which register to read/write

    output reg [7:0] OUT
);

  // registers 8 bits wide, an array of 4 of them
  reg [7:0] registers[3:0];

  // reading from the register file
  always @(*) begin
    case (OPCODE)
      1'b0:  // read
      OUT = registers[REG_SEL];
      1'b1:  // write
      registers[REG_SEL] = IN;
      default: OUT = 8'b0;
    endcase
  end

endmodule

Each register can be read from or written to, depending on the OPCODE. The purpose of these registers is to give Zephyr quick access to commonly used values and as a separation from storing these values in program memory.

Zephyr also has access to a block of RAM, zram, for a whopping total of 16 words of program memory:

module zram (
    input [3:0] ADDRESS,
    input [7:0] DATA_IN,
    input OPCODE,  // reading or writing

    output reg [7:0] DATA_OUT
);

  reg [7:0] registers[15:0];

  always @(*) begin
    case (OPCODE)
      1'b0:  // read
      DATA_OUT = registers[ADDRESS];
      1'b1:  // write
      registers[ADDRESS] = DATA_IN;
      default: DATA_OUT = 8'b0;
    endcase
  end

endmodule

Like zregister, the ADDRESS is selected, and data can be read or written from that address.

Zephyr is also equipped with an ALU to perform one of four math operations:

module alu (
    input [1:0] OPCODE,
    input [7:0] DATA_A,
    input [7:0] DATA_B,
    output reg [7:0] DATA_OUT
);

  always @(*) begin
    case (OPCODE)
      2'b00:  // Addition
      DATA_OUT = DATA_A + DATA_B;
      2'b01:  // Subtraction
      DATA_OUT = DATA_A - DATA_B;
      2'b10:  // Multiplication
      DATA_OUT = DATA_A * DATA_B;
      2'b11:  // Division
      DATA_OUT = DATA_A / DATA_B;
      default: DATA_OUT = DATA_A + DATA_B;
    endcase
  end

endmodule

It takes a two-bit input, OPCODE, which selects what mathematical operation to perform, and 8 bits of data from two registers, storing the result of the operation on the first register, DATA_A.

The main Verilog file describes the functionality of Zephyr, which involves a state machine to synchronize every action on clock edges. Zephyr has a program counter, PC, and an instruction register, IR, to read from zram (program memory). It performs actions through three main states: FETCH, DECODE, and EXECUTE, with several others to manage ALU operations and memory writeback.

  always @(posedge CLK or posedge RESET) begin
    if (RESET) begin
      // Set up all default values
      ...
      zstate <= FETCH;
    end else begin
      case (zstate)
        FETCH: begin
          RAM_OP   <= 1'b0;  // read
          RAM_ADDR <= PC;
          ZREG_OP  <= 1'b0;  // default to read
          zstate   <= DECODE;
        end

        DECODE: begin
          IR <= RAM_DATA_OUT;  // store fetched instruction
          OP <= RAM_DATA_OUT[7:6];  // extract opcode
          zstate <= EXECUTE;
        end

        EXECUTE: begin
          case (OP)
            2'b00: begin  // NOP
              PC <= PC + 1;
              zstate <= FETCH;
            end

            2'b01: begin  // LOAD
              // Set up the memory read
              RAM_ADDR <= IR[3:0];  // Get target addr
              RAM_OP   <= 1'b0;  // Read operation
              ZREG_SEL <= IR[5:4];  // Select target register
              zstate   <= MEMREAD;  // Go to memory read state
            end

            2'b10: begin  // STR
              RAM_ADDR <= IR[3:0];  // Set RAM addr
              RAM_OP <= 1'b1;  // Write operation
              ZREG_SEL <= IR[5:4];  // Select source register
              ZREG_OP <= 1'b0;  // Read from register
              zstate <= MEMWRITE;  // Go to memory read state
            end

            2'b11: begin  // ALU
              ALU_OPCODE <= IR[5:4];  // Set ALU op
              ZREG_SEL <= IR[3:2];  // Select source register A
              ZREG_OP <= 1'b0;  // Read from register

              zstate <= FETCHDATAB;  // Go to fetch data B state
            end

            default: begin
              PC <= PC + 1;
              zstate <= FETCH;
            end
          endcase
        end
    ...

This is just a snippet of the entire state machine.

Zasm

What good is a CPU if it can't be programmed? I wrote my own flavor of assembly so that it could be highly customized to my needs. Let's walk through a simple program.

.ORG 0
    LOAD R0, [Y]
    LOAD R1, [X]

.ORG 14
    DATA X, 0xDA
    DATA Y, 0xD0

The .ORG directive defines what location in program memory to start writing to. The program begins at address 0, and each new instruction will be located at the next memory location. So, LOAD R1, [X] will be stored at memory location 1.

DATA is a directive that works as a variable in zasm. Essentially, it writes the hex value to the memory location. 14 will have data 0xDA, 15 0xD0.

The instruction LOAD R0, [Y] can be interpreted as: "load into register R0 the value that the memory location Y is associated with." If we want to take a look at machine code:

LOAD = 10
R0   = 00
R1   = 01
X    = 1110
Y    = 1111

LOAD R0, [Y]  => 10001111
LOAD R1, [X]  => 10011110

It is this machine code that is written to zram, which Zephyr executes instructions from. Let's look at a more complicated program:

.ORG 0
	LOAD R0, [X]
	LOAD R1, [Y]
	ADD  R0, R1
	LOAD R0, [X]
	MUL  R0, R1
	LOAD R0, [X]
	SUB  R1, R0
	LOAD R0, [Y]
	LOAD R1, [Y]
	DIV  R0, R1

.ORG 14
	DATA X, 0x02
	DATA Y, 0x05

This program showcases the ALU's functionality.

RTL-to-GDSII Flow

I went with OpenROAD, an open-source tool to create .gds designs on a variety of platforms. For Zephyr, I performed the flow on the nangate45 process design kit, a 45-nanometer process, primarily because of its wide support and tooling.

After several minutes of computation, I was greeted with this glorious design:

gds view of zephyr

I've labeled the large blocks of data latches used for the four zregisters and for the zram. The rest of the components and wires relate to Zephyr's state machine and ALU functionality.

Reflection

It was an amazing experience to create this entire design, from programming language to Verilog, to transistor schematics, and I learned a ton (probably a whole class's worth of information).

I never understood what a transistor looked like at the silicon level, but I was able to zoom right in and see the various layers of n and p-type semiconductor material, contact, metal, vias, wires, and so forth.

diagram comparing silicon level inverter design to schematic

A diagram comparing the silicon-level design of an inverter to the commonly taught transistor schematic, with the corresponding components highlighted.

I gained a whole new insight into the microscopic world of semiconductor devices and it only further motivated me to pursue my passions and this career.