rust-hdl-core 0.1.0

## Visitor pattern overhead

The use of the visitor pattern (while elegant) imposes a significant cost overhead
that I was surprised by.  When everything was in a single translation unit, the
performance was fine (2.6 seconds for the standard 100M clock benchmark).  However,
when things were split up into multiple translation units, the performance dropped
to 13+ seconds.  That is a significant loss.  Probably too much to really tolerate
at this stage.  It is a given that we will give away performance over time as we
stretch the core design to handle more scenarios and more sophisticated features.

However, I don't feel comfortable giving away the performance up front.  Using
the macro engine to generate the `update_all` and `has_changed` functions seems
like a small cost to pay to keep most of the performance gains in eliminating the
complex ARC/RC based design I had before.

## Performance and multi-threading

To achieve true multi-threading of the simulation, we must separate out
the state from the manipulation of the state.  For example

```rust
struct Widget {
    clock: Signal<In, Clock>,
    enable: Signal<In, Bit>,
    counter: DFF<Bits<6>>,
    strobe: Signal<Out, Bit>,
}
```

Now suppose that the logic does not mutate, but instead creates a
new structure for the signals.  Since the signals do not have wiring
internal to them anymore (they are just signals), we can do something
like this for `update`:

```rust
fn update(mut w: Widget) -> Widget {
    w.counter.clk.next = w.clock.val;
    if w.enable.val {
        w.counter.d.next = w.counter.q.val + 1;
    }
}
```

Next steps...

- Need to add constants, state enums, and local signal support.

## Async test benches

An async test bench looks something like this:

```rust
pub async fn fifo_packet_drain<T: Synthesizable>(
    clock: Clock,
    mut reader: FIFOReaderClient<T>,
    len: usize,
    packet_len: usize,
    prob_pause: f64,
    pause_len: u64,
) -> Result<Vec<T>, HDLError> {
    let mut ret = vec![];
    let mut index = 0;
    reader.read.set(false);
    clock.next_negedge().await;
    while index < len {
        while reader.almost_empty.get() {
            reader.read.set(false);
            clock.next_negedge().await;
        }
        if rand::thread_rng().gen::<f64>() < prob_pause {
            reader.read.set(false);
            clock.delay_negedge(pause_len).await;
        }
        for _p in 0..packet_len {
            ret.push(reader.output.get());
            reader.read.set(true);
            index += 1;
            clock.next_negedge().await;
        }
        reader.read.set(false);
    }
    reader.read.set(false);
    clock.next_negedge().await;
    Ok(ret)
}
```

There are several characteristics of an async test bench that are important

- The use of `async/await` syntax means that you can write sequential logic in 
a polled context (the resulting FSM is auto-generated by Rust)
  
- You can use local temporary values that are preserved as part of the state
for you.
  
- Test benches are composable!

These are all fine, but I can get the same with threads, and at a fraction
of the intellectual burden.  Here is the idea.  First, define an action function as
something that has a signature like

```rust
type Action<T> = fn(time: u64, x: T) -> T;
```

1.  We have a top level `Simulation` struct that is generic over a type `T`.

2.  We can register `Action` functions to be called at the beginning when the simulation
is initialized (i.e., `time = 0`).
    
3.  We can register `Action` functions to be called at the end with the simulation
    is complete. (i.e., `time = end_time`).
    
4.  We can register `Action` functions to be called periodically with a specified 
    interval of time `DeltaT`.  These will be called with the time, and must 
    consume and return `T`.
    
5.  We can register `Action` functions to be called at a specific time `t_0`.

This allows various stateless operation, but we also need to provide stateful
operation.  As such, we want a thread to be able to 

1. Pause itself and indicate when it wants to be woken up.

2. Be woken up at that time, with the object to mutate.

3. Return the object to mutate along with some indication of when it wants
to be re-awoken.
   
There seem to be multiple ways to accomplish this.  We want something like

```rust
fn my_rust_testbench(sim: &Sim<T>) {
    let x = sim.start();
    // Do stuff to x
    let x = sim.at_time(t0, x);
    // Do stuff to x
    let x = sim.at_time(t1, x);
    // Do stuff to x
    sim.finish(x)
}
```

Done.  Need to add an error type and insert `Result` return types.

## TODO

- Add auto enum handling/code generation to the macro
- Finish VCD probe