<< Chapter < Page Chapter >> Page >

Then, why didn't the designer of the CPU make such that LDW instruction takes 5 clock cycles to begin with, rather than let the programmer insert 4 NOPs ? The answer is that you can insert other instructions other than NOPs as far as those instructions do not use the result of the LDW instruction above. By doing this, the CPU can execute additional instructions while waiting for the result of the LDW instruction to be valid, greatly reducing the total execution time of the entire program.

More on instructions with delay slots

The Table 3-5 in TI's instruction set description shows the execution of the instructions with delay slots in moredetail. The instructions with delay slots are multiply ( MPY , 1 delay slot), the load ( LDB, LDW etc. 4 delay slots) instructions, and the branch ( B , 5 delay slots) instruction.

The functional unit latency indicates for how many clock cycles each instructions actually use afunctional unit. All C62x instructions have 1 functionalunit latency, meaning that each functional unit is ready to execute the next instruction after 1 clock cycle regardlessof the delay slots of the instructions. Therefore, the following instructions are valid:

1 LDW .D1 *A10, A4 2 ADD .D1 A1,A2,A3

Although the first LDW instruction do not load the A4 register correctly while the ADD is executed, the D1 functional unit becomes available in the clock cycle right after the one in which LDW is executed.

To clarify the execution of instructions with delay slots, let's think of the following example of LDW instruction. Let's assume A10 = 0x0100 A2=1 , and your intent is loading A9 with the 32-bit word at the address 0x0104 . The 3 MV instructions are not related to the LDW instruction. They do something else.

1 LDW .D1 *A10++[A2], A92 MV .L1 A10, A8 3 MV .L1 A1, A104 MV .L1 A1, A2 5 ...

We can ask several interesting questions at this point:

  • What is the value loaded to A8 ? That is, in which clock cycle, the address pointer isupdated?
  • Can we load the address offset register A2 before the LDW instruction finishes the actual loading?
  • Is it legal to load to A10 before the first LDW finishes loading the memory content to A9 ? That is, can we change the address pointer before the 4 delay slotselapse?
Here are the answers:
  • Although it takes extra 4 clock cycles for the LDW instruction to load the memory content to A9 , the address pointer and offset registers ( A10 and A2 ) are read and updated in the clock cycle the LDW instruction is issued. Therefore, in line 2, A8 is loaded with the updated A10 , that is A10 = A8 = 0x104 .
  • Because the LDW reads the A10 and A2 registers in the first clock cycle, you are free to change these registers and do not affect the operationof the first LDW .
  • This was already answered above.

Similar theory holds for MPY and B (when using a register as a branch address) instructions. The MPY reads in the source values in the first clock cycle and loads themultiplication result after the 2nd clock cycle. For B , the address pointer is read in the first clock cycle, and the actual branching occurs after the5th clock cycle. Thus, after the first clock cycle, you are free to modify the source or the address pointer registers.For more details, refer Table 3-5 in the instruction set description or read the description of the individualinstruction.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Finite impulse response. OpenStax CNX. Feb 16, 2004 Download for free at http://cnx.org/content/col10226/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Finite impulse response' conversation and receive update notifications?

Ask