Monday, October 1, 2012

Why Hardware Doesn’t Fit the Agile Model | AgileSoC (Guest Blog: A Heretic Speaks)

Guest Blog: A Heretic Speaks (Why Hardware Doesn’t Fit the Agile Model)

Posted on September 30, 2012 by nosnhojn

Fair to say that what we’ve posted on AgileSoC.com to date is decidedly pro-agile. Bryan, myself and the guest bloggers we’ve had thus far believe in agile hardware development so we haven’t spend much time talking about why agile hardware wouldn’t work. No surprise there. But when you’re getting a steady diet of opinions from one side of an argument, it can be easy to forget that there can be some very practical arguments on the flip side to the coin. Today – after a little cajoling from Bryan over the past year – Mike Thompson from Huawei in Ottawa brings a little balance to AgileSoC.com by examining the flip side of the coin.

In A Heretic Speaks, Mike talks about reasons why agile and hardware just don’t go together. Being that Mike brings years of knowledge to the discussion, it’s hard to call this anything but a fair assessment. But I’d love to know what you think. Do you agree with Mike’s assessment? Extra points to those involved with physical design that jump into the discussion!

Take it away, Mike!

A Heretic Speaks

For a couple of years now, I’ve been reading the AgileSoC blogs. They are great reading, and each update provides an opportunity to see SoC development from a different perspective. Every now and then a tidbit from one of the blog entries you’ll find at this site will make its way into my own work – which is probably the whole point. Having said that, I simply do not agree with the idea that the Agile software development model can be applied to SoC development. There. I said it. I can feel the virtual slings and arrows of an outraged Agile community allied against me.

There is a contraction in what I just said. If I do not believe that Agile can be applied to SoC development, then why do I continue to read this blog and even adopt some of the concepts discussed here? The answer is simple: I am Verification geek and more-and-more SoC verification has become a software activity. So Agile can teach verifiers a thing or two about their craft. However, apart from Verification, SoC development is not software development. SoC development is hardware development. That distinction is real and it matters.

Hardware is not Software

As Verifiers, our view of the overall SoC[1] development cycle can be somewhat stunted. Our view of the design is mostly formed by our interactions with the RTL and the designers who write the RTL. Of course, RTL is just a specific type of software right? It can be compiled into an executable for simulation, or it can be synthesized into a gate-level model that can be mapped onto a netlist of logic cells from a library. Sounds like software to me. The trouble is, RTL is just the tip of the iceberg. It’s the part of the design that is the easiest for us to see, but it is not all there is to the design – and it’s not the biggest part – not by a long shot. Let’s have a look at a few reasons why.

Hardware is not Virtual

One of the coolest features of software is that it can use virtual resources. Need more memory – just call malloc()! Need another object – just call new()! We do this all the time in Verification. Heck, if you are a SystemVerilog user, you can even have virtual interfaces. You can’t do this in hardware. All resources – the amount and type of memory needed, the number and type of interfaces, the amount of logic to perform tasks – all of this is fixed and is not easily changed. Why? Because you are building hardware and all of these resources are physical. Worse, all of them cost money. Before your organization lets you build a chip, the-powers-that-be will want to know what it costs. As in how-many-dollars-per-chip? In order to answer this question, you’ll need to answer to a bunch of important questions such as what process node you will use (e.g 32nm) and how big the die will be (e.g. 10mm x 10mm). In order to answer those questions, you’ll need to know things like how many and how big the I/O interface macros are (e.g. high-speed SerDes and DDR PHYs), and the types and geometries of memories you can use. In short, you will need to know a lot about your design. And you’ll need to know it long before you start writing any RTL. This is one of the reasons that Test Driven Development (TDD) doesn’t fix RTL coding very well… but I’m getting ahead of myself…

RTL is not Abstract

Describing digital logic at the Register Transfer Level is probably the single biggest advance in hardware development ever. RTL reduces effort and improves quality by allowing digital logic to be described at a functional, as opposed to structural, level of abstraction. But it’s not that abstract – not really. An RTL designer still needs to understand and deal with clocks, resets, complex interfaces to memories, clock gating and clock-domain-crossings. She needs to know how many levels of logic she can get away with between flops; which are a function of clock period and the cell library. Also, RTL doesn’t allow designers to explicitly control things like area, timing and power. A lot of very fancy tools have been deployed to improve this situation, but for the most part, getting all this right still involves a lot of blood, sweat and tears. The implication here is that designers often spend more time on structural issues, such as clock-domain crossings and clock-gating, than they do on functional issues. Strike 2 for TDD…

Physical Design Takes a Loooong Time

One of the biggest differences between software and hardware is how they are mapped into a product that actually does something useful. Software is compiled. For a large software product (tens of millions of lines of code), a compile/build cycle can be many hours or days. Long enough, but not too bad and most of it is fully-automated. The implication is that software can cut a release at just about any time. The equivalent task in SoC development is called Physical Design, and it’s a big task. For a large SoC (tens of millions of gates), the physical design – getting from that netlist to working hardware – can be several weeks or even months. To get it right, several iterations may be required. Much of the work cannot be automated. The implication here is that SoC development time is driven primarily by Physical Design, not Functional Verification. The PD cycle is set very early on in the project and it is extremely difficult, if not impossible, to change it. One of the things a good Program Manager will try to do is fit the Verification cycle entirely within the PD cycle, so that Verification doesn’t drive the schedule’s critical path[2].

Why Hardware Doesn’t Fit the Agile Model

OK, hopefully by now you’re convinced that hardware development is a distinct task from software development, and SoC development is mostly a hardware development activity. That still doesn’t explain why Agile can’t be applied to SoC development. I’ll try to do that by making reference to a couple of AgileSoC posts from last year: “When Done Actually Means DONE” and “TDD And A New Paradigm For Hardware Verification”.

You’re not Done until PD Say you’re DONE.

In “When Done Actually Means DONE”, a contrast is made between the Waterfall development model and the Agile model:

Figure 1: Waterfall vs. Agile Development Models

I doubt that anyone is working on a team that is using the waterfall model anymore. All development teams that were doing this have long ago gone out of business because their products were either very late to market, full of defects, or both. The Agile model is much better – except that it doesn’t work for hardware. There are a couple of big reasons why this is so. First, since hardware is not virtual, some aspects of the physical design must start first. Yes, even before the Specification and certainly before the RTL coding. Second, the pace of the development is set by PD. This effectively negates one of the biggest advantages of Agile development: the ability to quit development and generate a release of your product whenever it suits the project.

So most large SoC projects look more like a hybrid of the Waterfall and Agile methods as shown in Figure 2. Let’s call this model “Concurrent Engineering”. Now you could say that this is a visual description of how SoC development can take advantage of some aspects of Agile development. Getting rid of that waterfall and replacing it with concurrent tasks really does improve quality and reduce schedules. But we cannot build the SoC at the 50% mark. Software can do that. Not hardware.

Figure 2: Concurrent Engineering of SoC Tasks

TDD is not a good fit for RTL coding

In Figure 2, the RTL coding task has been explicitly broken out. This was done to illustrate an important point that was alluded to earlier: designers actually do not spend that much time coding RTL. There are two big reasons for this:

Most of the real design work such as spec’ing out memories, clocking, resets, resource allocation/management, etc. must be done before RTL coding starts.
In order to begin PD trials, the bulk of the RTL must be available early – long before it is fully verified. This is an important point: at this stage, the RTL need not be functionally correct, but it must be ‘complete’ in that all functions are implemented[3].

So, the actual RTL coding doesn’t take that much time (3 to 6 weeks is typical). Given that, a Test Driven Development (TDD) process for RTL is a bad idea since it slows done the delivery of the code that is needed for PD. Note that this is not true for the verification code. TDD for testbenches, testcases, coverage models, etc. is very beneficial and should be part of all verification flows[4].

Conclusion

Well, if you’ve gotten this far then you are pretty open minded. After all, if you are even a semi-regular reader of this blog, then you probably think Agile is a pretty darn good idea. I do too – but just not for the hardware side of SoC development. For Verification and Emulation, it’s a different matter. In our Verification shop we cherry pick many Agile concepts (such as unit-testing of verification code and continuous integration of the verification environment and RTL) and for Emulation we pretty much go all out with Agile. So I hope that this blog continues to be an open forum for applying Agile to these activities.

Thanks to Neil for setting up this blog and continuing to drive it, and special thanks for permission to re-use his slide from “When Done Actually Means DONE”. Not many people have the self-confidence to provide ammo for the opposition in a public forum. J Also, if you haven’t already – get a copy of SVUNIT and use it!

–Michael Thompson
ASIC Verification Manager
Huawei Technologies
Ottawa, Ontario, Canada
michael.thompson@huawei.com

[1] Actually, I’d prefer to use the term “integrated circuit” or “IC” in this discussion. By “IC”, I mean any large scale integrated circuit implemented as an ASIC or ASSP. However, SoC is used throughout this site, so I’ll continue to use the term, or sometimes I’ll just say “chip”.

[2] I know, I know. It is rarely possible to keep Verification off the critical path for at least part of the project.

[3] It shouldn’t be brain-dead either. In our shop, PD trials start on a netlist that has passed at least some basic functional testing.

[4] If you do not already have a methodology in-house of TDD of your verification code, I strongly recommend that you take a good look at SVUNIT.