FSM Verilog Example (Final)

`timescale 1ns / 1ps

//////////////////////////////////////////////////////////////////////////////////
// Company:         Christopher D. Nagy
// Engineer:        Christopher D. Nagy
// 
// File Name:        uart_baud_rate_generator.v
// Create Date:     05/19/2025 12:51:02 PM
// Design Name: 
// Module Name:     uart_baud_rate_generator
// Project Name:    Nagy_Example_Project 
// Target Devices:  Artix-7 [xc7a75tfgg484-1]
// Tool Versions:   Vivado 2024.2 
// Description: 
//         
//        Takes in a system clock signal and divides it down to a desired 16x oversampled baud rate signal meant for use in serial UART based tx/rx driver modules.
//        Given a desired baud rate (passed in as a parameter, default: 115200 bps), this module will output a single clock pulse [tick_16x] at a rate 16 times the desired baud rate value. 
//        The outputted tick can be inputted to Tx and Rx driver modules to be courted and used to frame serial data transmissions and sampling data receptions.
//
//        Includes enable and asynchronous-clear input signal bits to allow resetting and disabling of this finite state machine module.
//        Default driving clock signal value is 150MHz. 
//
// Dependencies: 
// 
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
//        This example is very verbose with comments. They are meant to act as an example and explanation of what goes on in my head.
//        I don't typically write it all out in comments like I do here. 
//
//
//        A fixed point accumulator is used for counting the clock pulses rather than a simple counter register. 
//        The number of clock cycles one needs to count to obtain the desired framing tick rate is not a whole number; at least not with the default values. It's fractional; approximately 81.38 clock cycles.
//            150,000,000 Hz / (115,200 bps * 16) = 81.3802083333 clock cycles
//        If you round this value down and always count 81 clock cycles between issuing the output tick, the timing of the generated ticks drifts to eventually being too slow (baud rate drops below 115200 bps)
//        If you round this value up and always count 82 clock cycles between issuing the output tick, the timing of the generated ticks drifts to eventually being too fast (baud rate rises above 115200 bps)
//        By using a fixed point accumulator for fractional counting and using the most significant bit [msb] for the threshold of when to issue a framing tick, drift can be compensated for. 
//        Sometimes the msb will go high at 81 clock cycles, sometimes it will go high at 82; it should all average out to be about 81.38 and a baud rate of 115200 is maintained.
//        The most significant bit of the accumulator acts as an overflow indicator and is cleared every clock cycle (via bit-masking) except for the one in which the overflow occurs. Watching this bit results in a single clock cycle tick. 
//        I chose a 32-bit wide accumulator. 21 bits would be the minimum to keep an 115200 baud rate compensated*, and 32 might be overkill, but I like powers of 2 and FPGAs handle 32-bit adders nicely.
//        *Math/Data Science Tangent:
//            In general, to calculate the required number of bits to compensate: The error should be less than 1 part of the number of desired ticks per second (115200 * 16).
//            115200 * 16 = 1843200 ==> The bit-width we seek must be able to resolve the value of 1843200 or greater  in other words, the fractional bits should be able to represent 1/1843200 [~5.425e-7]
//            Now I come to a crossroad:
//                I can simply plug this resolution value into the calculator on the windows machine I'm using, change to binary mode and count the number of significant bits (21 in this case)
//                I can also use a proper equation. Though it still takes me a calculator to solve (I'm sure there are others that don't need to)
//                    If:     N is the number of bits needed to resolve value R.
//                    Then:    N = log2(1/R).
//                    The result of log2(1/1843200) is ~20.81 bits, but since bits are "whole" and we're looking for a resolution more than the value so that error will be less than 1 part, 21 bits is the minimum number of bits.
//                I'm not sure which method is better, since my bit-width is hardcoded as a local parameter...I guess it's about preference once you know the reasoning.
//
//        This Module is very much overkill in terms of the state machine detail and number of states needed.
//        That being said, it's been written with purpose. 
//        It generalizes the 2-always, state-registered output finite state machine technique and includes many debugging helpers that I use in more complex systems
//        I find that generalizing how I write modules makes development time shorter and debugging easier
//        Some things can certainly be removed to tailor certain modules for size constraints...
//        but if you're running out of flip-flops, you probably have bigger issues.
//
//        I HIGH RECOMMEND checking out the published papers C. Cummings 
//        In particular: "Coding And Scripting Techniques For FSM Designs With Synthesis-Optimized, Glitch-Free Outputs" 
//            available at: http://www.sunburst-design.com/papers/CummingsSNUG2000Boston_FSM.pdf    
//        Mr. Cummings publications are absolute gold and my style is derived heavily from his writings.         
//        I don't follow the described structure 100% faithfully, as I've added my own experience to my formatting and encoding style. 
//        I certainly follow the technique of registered output for glitch-free output.
//            For example, larger bit wide outputs such as a counter's value that might be used as an input for another module are not explicitly mentioned in regards creating state registered outputs. 
//            It doesn't make sense to make hundreds of states for each output (with each possible output of the counter) and the flow of the state machine Combinational logic would be atrocious to write.  
//            So rather than that, I assign large variable outputs to registers of matching width that are in turn assigned only in the sequential logic always block and only rely on registered conditional logic (i.e state value and registered inputs)
//
//////////////////////////////////////////////////////////////////////////////////

/*    
    Example of Instantiation:

        wire             clk150;
        wire             reset_n;
        wire            uart_baud_rate_generator_enable;
        wire             uart_tick_16x;
        wire            uart_baud_rate_generator_running;
        wire            uart_baud_rate_generator_done;
        wire    [7:0]    uart_baud_rate_generator_status;
        
        uart_baud_rate_generator    #(    .BAUDRATE(115200), 
                                        .CLK_FREQ(150_000_000))    UART_BAUD_RATE_GENERATOR(
                                                                    .main_clk        (clk150),
                                                                    .areset_n        (reset_n),
                                                                    .enable            (uart_baud_rate_generator_enable),
                                                                    .tick_16x        (uart_tick_16x),
                                                                    .running        (uart_baud_rate_generator_running),
                                                                    .done            (uart_baud_rate_generator_done),
                                                                    .status            (uart_baud_rate_generator_status)
                                                                );
*/

(* fsm_encoding = "user" *)
module uart_baud_rate_generator #(
    parameter BAUDRATE = 115200, 
    parameter CLK_FREQ = 150_000_000
)     (

        input            main_clk,
        input            areset_n,
        
        input            enable,
        
        output            tick_16x,
        
        output            running,
        output            done,
        output    [7:0]    status
    );


    // Local Parameters for the accumulator. 1.31 Fixed point (32 bits total)
    localparam                  ACC_WIDTH              =    32;
    
    //    The steps in calculating the fractional "STEP" value results in needing more than 32 bits to calculate
    /*
        Method and Reasoning:
            We know we want CLK_FREQ/(BAUDRATE * 16) clock cycles per tick
            We figured out earlier that 32 bits is more than enough for the compensating for drift if we use fractional accumulation...
            We will use 1.31 Fixed point notation such that the most significant bit of a 32 bit register represents the integer and the remaining 31 bits represent the fractional
            We want some value that when we add it to a 32 bit register each clock cycle, the msb bit gets set after (CLK_FREQ/(BAUDRATE * 16)) clock cycles
            The value of (2^31 - 1) is exactly 31 bits if unsigned [31'h7FFF_FFFF]. Adding 1 would mean an overflow into the 32nd bit.
            So the "STEP" value we seek would be 32'h8000_0000 [2^31] divided by the number of clock cycles per tick [CLK_FREQ/(BAUDRATE * 16)]
            i.e. STEP = (2^31 / (CLK_FREQ/(BAUDRATE * 16))) which is equivalent to (2^31 * (1 / (CLK_FREQ/(BAUDRATE * 16)))  [inverting the divisor]
            And (1 / (CLK_FREQ/(BAUDRATE * 16))) is equal to ((BAUDRATE * 16) / CLK_FREQ)  [multiplying by the reciprocal]
            Therefore, STEP = 2^31 * ((BAUDRATE * 16) / CLK_FREQ)
            I want to allow different baudrates, so I need to code this STEP value at least as an evaluation time / compile time constant. 
            Using default values would be: (115200 * 16 * 2^31) / 150000000 = approximately 26,388,279
            However, (115200 * 16 * 2^31) = 3,958,241,859,993,600 which is 52 bits! [52'hE_1000_0000_0000]
            If we don't temporarily expand the number of bits for the sake of calculating things at elaboration time, the STEP value parameter will be truncated to 0!
            I widen things using a placeholder parameter that forces things to 64 bits (power of 2 is always nice) and then use another placeholder for the 32-bit value

    */
    localparam    [63:0]             BAUD_TICK_STEP_64     =     ( (BAUDRATE * 16) * (64'd1 << ACC_WIDTH-1) ) / CLK_FREQ ;    
    localparam     [ACC_WIDTH-1:0]    BAUD_TICK_STEP        =     BAUD_TICK_STEP_64[ACC_WIDTH-1:0];
    
    //    Knowing that I'm going to need to keep the counter going once I pass the threshold for issuing a tick
    //    I will want to clear the most significant bit the clock cycle after it gets set. 
    //    I chose to set up a bitmask parameter for making the bitwise operation more readable later on. 
    //    Bitmask will be 32'h7FFF_FFFF which is the inverse of 32'h8000_0000 [2'31] ==> ~(1 << 31)
    localparam     [ACC_WIDTH-1:0]    CLEAR_ACC_MSB         =     ~(1 << (ACC_WIDTH - 1));
    
    // The following list of local parameters describe the output possibilities for most or all of the module's outputs. 
    // They, along with 'status' values are used to create the encoding that describes the state machine's states. 
    localparam    TICK_H                                =    1'b1,
                TICK_L                                =    1'b0;
                
    localparam  RUNNING_H                             =     1'b1,
                RUNNING_L                              =     1'b0;
            
    localparam  DONE_H                                =     1'b1,
                DONE_L                                 =     1'b0;    

    /* 
    *    Notes about the local parameter list making up state-machine state encoding [state names]:
    *
    *     The values that make up each state name embed the module's output values for that state. 
    *     All state names/encoding value are guaranteed to be unique given that each 8-bit 'status' value attached to the most significant bits of the state are unique.
    *    In most cases, the 'status' value does not need to be 8-bits, and they certainly don't need to be sequential. I make it that way for ease of development.
    *        A one-hot style encoding could save a few registers but would usually need to be tailored for each module written. Same with only adding minimum state encoding bits for states that have a non-unique output value.
    *        Using a generalized 8-bit sequential decimal count allows for quick encoding and easy look up of the state machines current state when using a logic analyzer [ila] or during simulation testing.
    *    The starting 'status' value [IDLE state] is 8'd1 rather than 8'd0
    *        This is because I've ran into issues when testing/debugging where the state machine falls into an unknown/unaccounted for state and the status shows up as 8'd0
    *        Starting at 8'd1 lets me know that I'm in the IDLE state at least and not in something completely unexpected or ambiguous.
    *        In other word, a 'status' value of 8'd0 will always mean there is something wrong with the module's logic or how it's written and needs development correction. 
    *    I always set the end 'status' value [DONE state] to 8'd255.
    *        Since the 8-bit register already exist, and 'DONE' is the last state, I just set it to 0xFFFF for easy identification when debugging.
    *        If for some reason more than 255 states are needed (256 is you include the implied error state of status == 8'd0), then I know where to start and end rebasing things.
    *        Also: I can't imagine a single module that needs 255 states. 
    *            I've made some very large state machines before during early prototype development stages (just to get my thoughts out of my mind and into Verilog)
    *            I always found timing closure and synthesis time to be much better after I redesign and divide the module into multiple modules [Divide and Conquer for the win]
    *            The cost is sometimes a clock cycle or two, but this overhead can be removed [if necessary] by acknowledging that the inputs of one module are guaranteed to be registered due to coming from a known registered output of the connected module.
    */
    
    localparam  IDLE                                =     {     8'd1,
                                                            TICK_L,
                                                            RUNNING_L,
                                                            DONE_L},
                                                      
                COUNTING                            =     {     8'd2,
                                                            TICK_L,
                                                            RUNNING_H,
                                                            DONE_L},        
                                                      
                THRESHOLD_REACHED                    =    {    8'd3,
                                                            TICK_H,
                                                            RUNNING_H,
                                                            DONE_L},

                DONE                                =     {     8'd255,
                                                            TICK_L,
                                                            RUNNING_H,
                                                            DONE_H};


    //     STATE_SIZE local parameter is declared here and used in elaboration time calculations to determine the state register's bit width
    //    The value is the total number of output bits, which is the same for all states. 
    //     I use to not include this local parameter but found I was constantly making the error of either not updating the state or the output assignment's bit width when adding additional states during development.
    //     The use of the local parameter means I only change one declared value when modifying the number of states. 
    //    It's actually a been a pretty good time saver due to mitigating human error when it comes time to generate the bitstream.
    localparam    STATE_SIZE                             =     11;    

    //    State Register and State Register Output Assignments
    reg        [STATE_SIZE-1:0]    state, next;     
    assign     {    status,
                tick_16x,
                running,
                done}                                 =    state[STATE_SIZE-1:0];
    
    // Other Registers for Inputs, Additional Outputs and Internals
    reg        ENABLE;        

    reg        [ACC_WIDTH-1:0]        BAUD_TICK_ACCUMULATOR;

    // always block #1 - Sequential Logic [sets/updates the registers at the time of a clock or reset edge given the current values of particular registers and inputs]
    always @(posedge main_clk or negedge areset_n)
    begin
        if (!areset_n) 
        begin
            state                                    <=    IDLE;
            
            ENABLE                                    <=     1'b0;
            
            BAUD_TICK_ACCUMULATOR                   <=  {ACC_WIDTH{1'b0}};
        end
        else
        begin
            state                                     <=    next;
            
            ENABLE                                     <=     enable;

            if( (state == IDLE) )
                BAUD_TICK_ACCUMULATOR               <=  {ACC_WIDTH{1'b0}};
            else if( (state == COUNTING) || (state == THRESHOLD_REACHED) )
                BAUD_TICK_ACCUMULATOR               <= (BAUD_TICK_ACCUMULATOR & CLEAR_MSB_MASK) + BAUD_TICK_STEP;
            else
                BAUD_TICK_ACCUMULATOR                <=    BAUD_TICK_ACCUMULATOR;

        end
    end    

    // always block #2 - Combinational Logic [describes the conditions to go from one state to another]
    always @(*)
    begin
        case(state)
            IDLE                                    :    if( (ENABLE == 1'b1) )
                                                            next = COUNTING;
                                                        else                                                                                        
                                                            next = IDLE;

            COUNTING                                :    if( (BAUD_TICK_ACCUMULATOR[ACC_WIDTH - 1] == 1'b1) )
                                                            next = THRESHOLD_REACHED;
                                                        else                                                                                        
                                                            next = COUNTING;
                                                        
            THRESHOLD_REACHED                        :    if( (ENABLE == 1'b1) )
                                                            next = COUNTING;
                                                        else                                                                                        
                                                            next = DONE;                                                                                                                         
        
            DONE                                    :    next = IDLE;
                
            default                                 :   next = state;
        endcase 
    end                
    
endmodule
FSM Verilog Example (Final with Corrected Alignment and Mobile View)