Summary: This module describes the basic architecture of Texas Instruments TMS320c6211 CPU.
The C62x consists of internal memory, peripherals (serial port, external memory interface, etc.), and most importantly, the CPU that has the registers and the functional units for execution of instructions. Figure 1-1 on the next page illustrates the internal structure of the CPU and the relation with the peripherals outside the CPU. Although you don't need to care about the internal architecture of the CPU for compiling and running programs, it is necessary to understand how the CPU fetches and executes the assembly instructions to write a highly optimized assembly program.
We demonstrate the architecture and basic function of each CPU unit through the development of simple assembly language programs.
In many DSP algorithms, the Sum of Product or
Multiply-Accumulate (MAC) operations are very common. A
DSP CPU is designed to handle the math-intensive
calculations necessary for DSP algorithms. For efficient
implementation of the MAC operations, the C6211 CPU has
two multipliers and each of them can perform a 16-bit
multiplication in each clock cycle. For example, if we
want to compute the dot product of two length-40 vectors
MPY .M a,x,prod
ADD .L y,prod,y
Ignore .M and .L for now. Here,
a,x,prod,y are numbers stored in memory and
the instruction MPY multiplies two numbers
a and x together and stores the
result in prod. The ADD
instruction adds two numbers y and
prod together storing the result back to
y.
Where are the numbers stored in the CPU? In C62x, the numbers used in operations are stored in the registers. Because the registers are directly accessible through the data bus of the CPU, accessing the registers are much faster than accessing data in the external memory.
The C62x CPU has two register files consisting of sixteen 32-bit registers each. There are two separate register files (A and B). Each of these files contains sixteen 32-bit registers (A0-A15 for file A and B0-B15 for file B). The general-purpose registers can be used for data, data address pointers, or condition registers.
The general-purpose register files support data ranging in
size from 16-bit data through 40-bit fixed-point. Values
larger than 32 bits, such as 40-bit long quantities, are
stored in register pairs. In a register pair, the 32 LSBs
of data are placed in an even-numbered register and the
remaining 8 MSBs in the next upper register (which is
always an odd-numbered register). In assembly language
syntax, a colon between two register names denotes the
register pairs, and the odd-numbered register is specified
first. For example, A1:A0 represents the register pair
consisting of A0 and A1. But you don't need to be
concerned with the 40-bit numbers too much. Throughout
this course, you will be mostly handling either 16 or
32-bit values stored in a single register. Let's for now
focus on file A only. The registers in the register file
A are named A0 to A15. Each register can store a 32-bit
binary number. The numbers such as a,x,prod,y above
are stored in these registers. For example, register
A0 stores a. For now, let's assume we
interpret all 32-bit numbers stored in registers as
unsigned integer. Therefore, the range of values we can
represent is 0 to
a,x,prod,y are in the registers
A0,A1,A3,A4, respectively. Then, the above assembly
instructions can be written specifically
MPY .M1 A0,A1,A3
ADD .L1 A4,A3,A4
The TI C62x CPU has a load/store architecture. This means
that all the numbers must be stored in the registers for
being used as operands for the operations for instructions
such as MPY and ADD. The numbers
can be read from a memory location to a register (using,
for example, LDW, LDB instructions) or a
register can be loaded with a constant value. The content
of a register can be stored to a memory location (using,
for example, STW, STB instructions).
In addition to the general-purpose register files, the CPU has a separate register file for the control registers. The control registers are used to control various CPU functions such as addressing mode, interrupts, etc. You will learn more about some of the control registers when we learn each individual topic.
Then, where do the actual operations such as
multiplication and addition take place? The C62x CPU has
several functional units that perform
the actual operations. Each register file has 4 functional
units named .M, .L,
.S, and .D. (See Figure 1-1).
The 4 functional units connected to the register file A
are named .L1, .S1,
.D1, and .M1. Those connected to
the register file B are named .L2,
.S2, .D2, and
.M2. See Figure 1-1. For example, the
functional unit .M1 performs multiplication
on the operands that are in register file A. When the CPU
executes the MPY .M1 A0,A1,A3 above, the
functional unit .M1 takes the values stored
in A0 and A1, multiply them
together and stores the result to A3. The
.M1 in MPY .M1 A0,A1,A3
indicates that this operation is performed in the
.M1 unit. The .M1 unit has a 16
bit multiplier and all the multiplications are performed
by the .M1 unit.
Similarly, the ADD operation can be executed
by the .L1 unit. The .L1 can
perform all the logical operations such as bitwise AND
operation (AND instruction) as well as basic
addition (ADD instruction) and subtraction
(SUB instruction).
For complete list of instructions executed by each function unit, see Table 3-2 in the handout TMS320C62x/C64x/C67x Fixed-Point Instruction Set. We will later learn more about assigning the functional units for assembly instructions.
Read the description of ADD and
MPY instructions in the TI manual handed
out. Write an assembly program that computes
A0*(A1+A2)+A3.
solution here