|
|
Not a Success Story |
|---|
Home page....: Frigus Primore home page
Calculator.....:
Thermal territory for a heat source
Figure 1
Thermal territory for the communication device.
Introduction
It is always nice to read a success story but from a
professional point of view I find failure stories more interesting.
They are often truly sad but they do push things forward.
Experience is also to a large extent based on mistakes made.
A thermal designer that has a large stack of failure stories
is therefore well equipped for the job.
Speaking in very general terms there are thousands of ways to do
things wrong but only a few ways to them right. From that
perspective it is surprising that not more mistakes are made. It
is nevertheless inevitable that things go wrong now and then and
at those occasions it is important to learn as much as possible
about the reasons.
As any other professional group thermal designers also get their
share of failures. One reason why they not are told more frequently
is that they often are embarrassing both from the personal point
of view and from that of the company. Another reason is that one
tends not to make the same mistake twice and certainly not three
times. Repeated mistakes of the same kind are therefore
exceptional and the ones that remain are consequently difficult
to relate to as typical.
What might appear a bit strange in this context is that I never
have seen a case where poor calculation accuracy has been the
cause of an important failure. I do not deny their existence
because I have seen many calculations that have been terribly
wrong but the error was in all those cases detected so soon
that it did not cause much harm. The idea that calculation
accuracy is the key to successful thermal design is therefore
not at all in line with my experience. Since I do not want to
be misinterpreted on this pointI do stress that it is important
to make accurate calculations but only to the degree that they
reasonably well match all other uncertainties. Quite a few of
the failures that I have seen have however been caused by a total
neglect of thermal considerations, particularly in the early
phases of the design process.
A very apparent example of a thermal failure is when a system
dies while exposed to extreme temperatures. I have seen half a
dozen cases. They are always dramatic because they appear
unexpected and they also tend to create panic in the sales
department. These failures are either caused by extreme component
temperatures or by poor component quality. I have heard several
stories about the former but I have only experienced the latter.
Somewhat surprisingly a few of those were actually provoked by
temperatures at the low end of the scale.
To prevent system failures is the classical argument for thermal
design. It has however lost much of its bite as the numbers of
incidents, for many different reasons, have declined. It is
therefore high time to focus on what thermal design can do to
make the design process smooth and in particular, how to avoid
costly time delays and redesigns. The problems that appear in
the design process form a very heterogeneous group. There are
nevertheless a couple of common denominators: they are always
discovered too late and they are often caused by an unfortunate
combination of several mistakes. The story that follows is a
typical example.
The problem
A system often consists of several PCBs that need to communicate.
As far as an amateur in electronics understands it, this
communication is always problematic but there are several
electrical tricks that can facilitate the matter. In a particular
project it was subsequently decided to create a custom design
circuit that substantially would ease the inter talks. The
solution apparently had many advantages but one disadvantage was
that the device had to be placed on every PCB in the system. To
speed up the design process it was further decided to run the
design of the custom circuit and the PCBs simultaneously. The
thermal aspects were, for reasons unknown, forgotten at the
project launch.
The project leader consulted me some weeks later. He had a diffuse
feeling that the thermal issues had to be considered. At that time
the project had advanced to the point that a decision had been
taken on the package for the custom circuit. A heat dissipation
estimate was also at hand, 3 W.
Given just the data for the component it is of coarse impossible
to make any reasonable thermal prediction. With a bit of additional
information about the PCBs and the air velocity one can however
produce an interpretable estimate. A method that is convenient for
the discussed case is the thermal territory method. A thermal
territory can shortly be described as the smallest rectangular
surface of a PCB that a component needs for its cooling. Those
who not are familiar with the method can try a simplified
calculation procedure in the link provided at the beginning of
this article.
My first insight into the matter resulted in an image that looked
something like figure 1. A more precise assessment would have
been a great help at this point but it could not be produced on
the bases of the data available. A swift glance at the image was
in this particular case sufficient to realise that it would be
difficult to keep other heat dissipating components away from the
thermal territory of the custom circuit. Both the project leader
and myself therefore agreed that we were dealing with a grey zone
case.
What we did not agree on, was how to deal with the problem. The
project leader argued that the difficulty could be managed if
everything was done to focus on it in the layout phase. I argued
that the only safe solution was to prepare for the eventuality
that a heat sink should become necessary. The reason for this
difference of opinion soon became apparent to me.
Figure 2
Cavity up and cavity down packages favour different
heat flow paths.
Cavity up or cavity down
The package that had been selected was of the ceramic pad array
type with a cavity facing upwards. This kind of package strongly
favours the heat flow path from the chip to the PCB and does not
at all perform well with a heat sink, figure 2. An alternative
could have been a cavity down type of package for which these
tendencies are the reverse.
To change the package at this phase was however not that easy. The
signal configuration of the pads had already been decided and
changing the package would essentially mirror this configuration.
Since several of the PCB projects already were working on their
preliminary layouts, such a change would inevitably result in some
redesign and consequent time losses.
After having consulted the PCB designers it was decided not to
make any changes. I had done my best to inform about the problem
and was in addition sure that I had been understood. There was
not much else to do than to accept the decision.
An additional problem
The project leader came to see me several months later. An
additional problem had appeared. Measurements had detected that
the maximum junction operation temperature for the custom circuit
was 85 degC and not 100 degC as had been specified. It is
apparent that a design which already is on the margin to what
can be done not can put up with such a radical decline of a
thermal property.
I nevertheless made several efforts to save the project with a
heat sink. In vain, the junction to case thermal resistance was
just to large. It could not be done. What I could detect however,
was that a cavity down package would have provided the margin need
to deal with the set back.
The consequences
The incident caused a 3 week project delay. It is true that
the major cause was a mistake made on the chip level design and
that it was difficult to predict. The second mistake however, was
not to use thermal design from the project start. One can only
speculate about the outcome if this had been done but one
possibility is that it would have been different.
It is not easy to calculate the cost for an incident like this.
If one disregards the market costs and assumes that 20 persons
were involved it is nevertheless possible to make a coarse
estimate. It indicates a cost that is 50 times the cost to get
started with thermal territory calculations, software and training
included. Another way to express this ratio is that if an incident
of this kind can be avoided every 5 years, the pay back time for
the software is of the order 5 weeks. This is 30 times better
than what usually is regarded as a good investment. It is thus
possible to lower the assumptions in this cost analysis with a
factor 10 and yet get a very favourable outcome for a front-end
thermal design software investment.
This story did unfortunately not have a happy end. It does however
show that it is important to practise front-end thermal design
methods, however simple or primitive they may seem.
Ake Malhammar