+ Reply to Thread
Results 1 to 1 of 1
Discuss VHDL Component: Brent-Kung Adder (Generic) at the VHDL within the OpenHDL - Verilog and VHDL Discussion Forums; Hi all, Now let's take a look at the Brent-Kung adder, a fairly advanced design ...
  1. #1
    Administrator
    Join Date
    Oct 2011
    Posts
    69
    Blog Entries
    19

    Default VHDL Component: Brent-Kung Adder (Generic)

    Hi all,

    Now let's take a look at the Brent-Kung adder, a fairly advanced design prefix-tree adder I previously discussed with the Kogge-Stone adder. The Brent-Kung (BK) adder is a good balance between area and power cost (where the Kogge-Stone adder lacks) and performance. This adder has a complex carry and inverse-carry tree that can be quite challenging to implement generically using VHDL. Here's a look at the overall carry tree for the Brent-Kung adder:



    An examination of the tree reveals that the tree can be divided into a tree and an inverse tree. The upper tree is sparse based on periodic powers of 2. The inverse tree is offset 1, beginning from the bottom of the matrix and expanding outward at powers of 2 (not to interfere with previously carried bits). The rest of the code is nearly identical to the Kogge-Stone adder. Let's take a look at the VHDL code for the Brent-Kung adder:

    VHDL Code:
    1. LIBRARY ieee;
    2. USE ieee.std_logic_1164.all;
    3. USE work.my_funs.all;
    4.  
    5. ENTITY bk_adder IS
    6. GENERIC (
    7. width : INTEGER := 7
    8. );
    9. PORT (
    10. a : IN STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    11. b : IN STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    12. c_in : IN STD_LOGIC;
    13. sum : OUT STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    14. c_out : OUT STD_LOGIC
    15. );
    16. END bk_adder;
    17.  
    18. ARCHITECTURE behavioral OF bk_adder IS
    19. CONSTANT nn: INTEGER := clogb2(width);
    20. CONSTANT inv_nn: INTEGER := clogb2(width+2**(nn-2))-2;
    21. TYPE T_type IS ARRAY(nn+inv_nn-1 DOWNTO 0, width-1 DOWNTO 0) OF STD_LOGIC_VECTOR(1 DOWNTO 0);
    22. SIGNAL T: T_type;
    23. BEGIN
    24.  
    25. -- Carry tree with maximum number of stages
    26. tree_proc: PROCESS(T,a,b,c_in)
    27. BEGIN
    28. -- First bit is a full adder
    29. T(0,0)(0) <= (a(0) AND b(0)) OR (c_in AND (a(0) XOR b(0)));
    30. T(0,0)(1) <= a(0) XOR b(0) XOR c_in;
    31.  
    32. -- Leaves of tree
    33. FOR j IN width-1 DOWNTO 1 LOOP
    34. T(0,j)(0) <= a(j) AND b(j); -- Generate bit base
    35. T(0,j)(1) <= a(j) XOR b(j); -- Propagate bit base
    36. END LOOP;
    37.  
    38. -- Carry tree
    39. FOR i IN 1 TO nn-1 LOOP
    40. FOR j IN width-1 DOWNTO 0 LOOP
    41. IF(j mod 2**i = (2**i)-1) THEN
    42. IF((j-2**(i-1)) >= 0) THEN
    43. T(i,j)(0) <= (T(i-1,j)(1) AND T(i-1,j-2**(i-1))(0)) OR T(i-1,j)(0); -- G = (P_i and G_i_prev) or G_i
    44. T(i,j)(1) <= T(i-1,j)(1) AND T(i-1,j-2**(i-1))(1); -- P = P_i and P_i_prev
    45. ELSE
    46. T(i,j)(0) <= T(i-1,j)(0); -- G = G_i (since we are at tree's edge, there is no G_i_prev)
    47. T(i,j)(1) <= T(i-1,j)(1); -- P = P_i (since we are at tree's edge, there is no P_i_prev)
    48. END IF;
    49. ELSE
    50. T(i,j)(0) <= T(i-1,j)(0);
    51. T(i,j)(1) <= T(i-1,j)(1);
    52. END IF;
    53. END LOOP;
    54. END LOOP;
    55.  
    56. -- Inverse carry tree
    57. FOR i IN nn+inv_nn DOWNTO nn+1 LOOP
    58. FOR j IN width-1 DOWNTO 0 LOOP
    59. IF((j-2**(nn+inv_nn-(i))) mod 2**((nn+inv_nn-(i))+1) = 2**((nn+inv_nn-(i))+1)-1) THEN
    60. IF(j >= 2**(nn+inv_nn-i)) THEN
    61. T(i-1,j)(0) <= (T(i-2,j)(1) AND T(i-2,j-2**((nn+inv_nn-(i))))(0)) OR T(i-2,j)(0); -- G = (P_i and G_i_prev) or G_i
    62. T(i-1,j)(1) <= T(i-2,j)(1) AND T(i-2,j-2**((nn+inv_nn-(i))))(1); -- P = P_i and P_i_prev
    63. ELSE
    64. T(i-1,j)(0) <= T(i-2,j)(0);
    65. T(i-1,j)(1) <= T(i-2,j)(1);
    66. END IF;
    67. ELSE
    68. T(i-1,j)(0) <= T(i-2,j)(0);
    69. T(i-1,j)(1) <= T(i-2,j)(1);
    70. END IF;
    71. END LOOP;
    72. END LOOP;
    73. END PROCESS;
    74.  
    75. -- Basic summation for carry tree
    76. sum_proc: PROCESS(T)
    77. BEGIN
    78. sum(0) <= T(0,0)(1);
    79. FOR i IN width-1 DOWNTO 1 LOOP
    80. sum(i) <= T(0,i)(1) XOR T(nn+inv_nn-1,i-1)(0);
    81. END LOOP;
    82. END PROCESS;
    83.  
    84. c_out <= T(nn+inv_nn-1,width-1)(0) OR (T(nn+inv_nn-1,width-1)(1) AND T(nn+inv_nn-1,width-2)(0));
    85.  
    86. END behavioral;


    I created two separate processes for the tree and inverse tree. The inverse tree can be challenging to understand but functions correctly (however, if you find any bugs, please let me know!). Next, I decided to write the test bench for the most tricky implementation of the Brent-Kung adder using operands of width 6. Here's a look at the bench VHDL code:

    VHDL Code:
    1. LIBRARY ieee;
    2. USE ieee.std_logic_1164.all;
    3. USE ieee.numeric_std.all;
    4. USE std.textio.all;
    5.  
    6. ENTITY bk_adder_tb IS
    7. GENERIC (
    8. width: INTEGER := 6
    9. );
    10. END bk_adder_tb;
    11.  
    12. ARCHITECTURE tb OF bk_adder_tb IS
    13. SIGNAL t_a: STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    14. SIGNAL t_b: STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    15. SIGNAL t_sum: STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    16. SIGNAL t_c_in: STD_LOGIC := '1';
    17. SIGNAL t_c_out: STD_LOGIC;
    18.  
    19. COMPONENT bk_adder
    20. GENERIC (
    21. width: INTEGER := 16
    22. );
    23. PORT (
    24. a : IN STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    25. b : IN STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    26. c_in : IN STD_LOGIC;
    27. sum : OUT STD_LOGIC_VECTOR(width-1 DOWNTO 0);
    28. c_out : OUT STD_LOGIC
    29. );
    30. END COMPONENT;
    31.  
    32. FUNCTION to_string(sv: Std_Logic_Vector) return string is
    33. USE Std.TextIO.all;
    34. USE ieee.std_logic_textio.all;
    35. VARIABLE lp: line;
    36. BEGIN
    37. write(lp, to_integer(unsigned(sv)));
    38. RETURN lp.all;
    39. END;
    40. BEGIN
    41. U_bk_adder: bk_adder
    42. GENERIC MAP (
    43. width => width
    44. )
    45. PORT MAP (
    46. a => t_a,
    47. b => t_b,
    48. c_in => t_c_in,
    49. sum => t_sum,
    50. c_out => t_c_out
    51. );
    52.  
    53. -- Input Processes
    54. inp_prc: PROCESS
    55. VARIABLE v_a: INTEGER := 0;
    56. VARIABLE v_b: INTEGER := 2**(width-1);
    57. VARIABLE v_c_in: INTEGER := 0;
    58. BEGIN
    59. FOR i IN 0 TO 2**width LOOP
    60. v_a := v_a + 1;
    61. v_b := v_b - i;
    62. IF(v_b < 0) THEN
    63. v_b := v_b + 2**width-1;
    64. END IF;
    65. t_a <= std_logic_vector(to_unsigned(v_a,width));
    66. t_b <= std_logic_vector(to_unsigned(v_b,width));
    67. WAIT FOR 1 ns;
    68. IF t_c_in = '1' THEN
    69. v_c_in := 1;
    70. ELSE
    71. v_c_in := 0;
    72. END IF;
    73. ASSERT TO_INTEGER(UNSIGNED(t_sum)) = (v_a + v_b + v_c_in)
    74. REPORT "Invalid sum! "&to_string(t_sum)&" != "&to_string(std_logic_vector(to_unsigned(v_a + v_b + v_c_in,width)))&"!\n"&to_string(std_logic_vector(t o_unsigned(v_a,width)))&" + "&to_string(std_logic_vector(to_unsigned(v_b,width )))&" + "&to_string(std_logic_vector(to_unsigned(v_c_in,wi dth)));
    75. WAIT FOR 9 ns;
    76.  
    77. END LOOP;
    78. END PROCESS;
    79.  
    80. c_in_process: PROCESS
    81. BEGIN
    82. t_c_in <= not(t_c_in);
    83. WAIT FOR 10 ns;
    84. END PROCESS;
    85. END tb;


    And, running the test bench through Modelsim, we can verify proper operation of the design:



    For our 6-bit operand BK adder, here's the synthesized logic:




    Synthesis of this design set to 64-bit operands, using Synplify Pro and choosing Xilinx Virtex2 XC2V40 with CS144 Package and -6 Speed yields the following performance data:

    Code:
    Performance Summary
    *******************
    
    
    Worst slack in design: -1.028
    
                       Requested     Estimated     Requested     Estimated                Clock        Clock                
    Starting Clock     Frequency     Frequency     Period        Period        Slack      Type         Group                
    ------------------------------------------------------------------------------------------------------------------------
    top|clk            167.6 MHz     143.0 MHz     5.967         6.994         -1.028     inferred     Autoconstr_clkgroup_0
    ========================================================================================================================
    
    ...
    
    
    Resource Usage Report for bk_adder
    
    Mapping to part: xc2v40cs144-6
    Cell usage:
    FD              229 uses
    MUXF5           1 use
    LUT2            52 uses
    LUT3            86 uses
    LUT4            270 uses
    
    Mapping Summary:
    Total  LUTs: 408 (79%)
    Upon inspection, we can see that this 64-bit Brent-Kung adder runs slightly slower (due to the greater number of stages) but occupies about half the area and significantly less routing than the 64-bit Kogge-Stone adder previously discussed.

    Take care!

    VHDLCoder
    Attached Images Attached Images    
    Attached Files Attached Files
    Last edited by VHDLCoder; 10-14-2011 at 06:39 PM.

 

 

Quick Reply Quick Reply

If you are already a member, please login above.

Please enter the six letters or digits that appear in the image opposite.

Posting Permissions

  • You may post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts
  •