Designs for Reliable (low V) Cell Monitoring electronics

Alan B

100 GW
Joined
Sep 11, 2010
Messages
7,809
Location
San Francisco Bay Area, USA
Some folks have had problems with Battery Management Systems (BMS) causing damage to their battery packs. In some cases this has lead to a very negative response to all BMS electronics. So I am considering what it would take to make a reliable cell monitoring system primarily focussed on the essential on-board ebike systems.

Gathering more accurate data about actual BMS failures was suggested below and is being handled in another thread. This design is evolving as the issues are identified and the design matures. Here is the data gathering thread:

http://www.endless-sphere.com/forums/viewtopic.php?f=5&t=22382

One consideration in maximizing reliability is to minimize the electronics on the bike. This argues for charging and balancing off-bike. Failures in the balancing circuits often cause the associated cell to be overdischarged. All that really must remain on-bike is monitoring cells for informational purposes and for low voltage shutdown. So this design splits the BMS into on-bike cell monitoring and protection, and off-bike charging and balancing.

So how simple and reliable can the on-bike cell monitoring and protection system be made?

The traditional circuits commonly found for this use a voltage monitor chip (such as a TC54) and an optical isolator per cell. A zener may be added to provide some protection for the somewhat fragile voltage monitor part (the fact that this was done indicates there was experience with damage to the units from connecting in the wrong order). This circuit will not work below some voltage around 1.5 volts due to the optical isolator LED. The low voltage cutout threshold of this design is not easily adjustable. If the voltage monitor chip fails (and they are susceptible to too much voltage from incorrect wiring, spikes or ESD) there is no indication that a cell is no longer protected. It is difficult to test the individual undervoltage circuits. There is no overvoltage protection.

What are the risks to the system? The batteries have potentially significant transients, especially during connect cycles where high currents flow to charge the controller capacitors. Connecting the input wires to the wrong cell junction presents another challenge to the electronics. The voltage monitor chips have a fairly limited input voltage range. ESD from the user during connecting, or touching the circuit board is another issue. The initial wiring process presents a significant risk of incorrect connections as well. The connect/disconnect cycle presents risks such as random order make-break of the connectors.

Another problem is protection when the bike is not in use (or when there is a failure in the ebrake circuit). Sending a signal to the ebrake line doesn't work when the bike is stored as no one will notice it.

The question is, can we do better without getting too costly or complicated?

So here are some requirements (version 0.7):

1) Low cell voltage triggers optical isolator to ebrake (same functionality as standard circuits).
2) Robust against voltage spikes and connecting to the wrong cell or connecting in any order.
3) Self detection of correct wiring and operation, providing visual indication of proper operation.
4) Continue to operate if any one or two cells go to 0 volts. (perhaps expand this to all but 2 cells at 0V?)
5) Protect pack against overdischarge even when ebike is not in use.
6) Consume less than half of 5AH per year if left plugged in. (285 uA)
7) Avoid failure modes that "drain" the pack. Semiconductors that short should not draw the pack down.
8) Accurate enough to catch the 1V plus transition from "normal" cell voltage to "nearly empty", so 0.05 volt.
9) Have a low parts count (and a low cost).
10) Be easily unplugged from the pack for extended storage.
11) Use robust wiring and connectors.
12) Avoid high currents running through the monitor.
13) Components should be readily available.

The common designs based on the TC54 voltage monitor fail to meet requirements 2,3,4,5,9. These designs can be improved but it is unlikely that all requirements can be met, and the parts count is already getting quite high.

The CellLogs fail to meet requirement 6 (and require external circuitry to meet 1). There is at least one report of a CellLog getting "hung" and failing to meet requirement 3.

So how can we meet these requirements?

I sketched out a design for an 8S cell monitoring based on a single chip microcontroller that appears to meet these requirements. This requires:

8 analog inputs (for 8 cell monitoring)
1 output for optical isolator to ebrake
1 output for piezoelectric beeper (to chirp like smoke alarm when battery gets low in case it is not on the bike)
4 pins for programming
2-4 pins for ground and vcc

Why 8 cells? The common ADCs on these parts are 10 bits, or 1 part in 1024. At 35V full scale that will give 35 millivolt resolution. Going higher in cell count will reduce the resolution and make it difficult to meet requirement 8. (Also my pack is 16S in two 8S sections)

I did a parts search and find the Atmel ATTiny261 (20 pins) meets these requirements (with 2 pins for future communications). In 25 quantity this part is only $1.82!

So what does it take to protect the micro? The ADC inputs are the most at-risk by virtue of coming from offboard. The ADC has a 1V full scale range, so a divider network is required to bring the +30V of the 8S pack into that range. A capacitor to provide filtering also provides transient protection, and the impedance of the divider network controls the current as it scales the voltage. The micro has diode protection internally as well. Since the resistive divider is reducing 30V to 1V and the chip can handle 5V the transient would have to exceed 150V to even begin to trigger the protection diodes.

The voltage regulator is a type intended for battery operation and well protected with such features as reverse polarity protection. It is augmented with capacitors and series surge limiting resistor.

Power consumption can be very low. Setting the resistance of the voltage dividers high helps minimize current there. The micro can be put into sleep on a timer to minimize CPU power consumption. The regulator is an ultra low power type with very low idle current.

If an ADC becomes damaged it "fails safe" as the reading won't fall into the correct valid range, and the beeper/ebrake will indicate a problem. So it can self check.

A reverse bias zener diode across the optical isolator output protects it against reverse voltage and locks out the controller until it is polarized correctly.

An onboard LED that periodically blinks (again taking after smoke alarms providing an indication at very low power consumption) indicates proper CPU operation. This can be visually checked each time the pack is charged to insure the monitoring is working.

The watchdog timer in the CPU reboots if the program gets "hung" and fails to respond with the timer's limit.

The CPU Brownout detector insures that more than 2 cells at 0V will be indicated by the lack of blinking on the operating LED, and that the CPU won't try to operate when the voltage is too low for reliable functionality.

If a cell (or two) goes to 0V there is still enough voltage to power the monitor, so it will continue to disable the controller and chirp the piezo. This is an improvement over the traditional design that stops protecting when a cell falls below about 1.5V. Most traditional designs also do not have the audible alarm.

There are two types of connections to the battery pack in this design. One is for providing power to the Cell Monitor. The other is to sample the pack voltage at each cell juncture. Sampling the pack voltage is done through a resistive voltage divider. A failure of the silicon in the micro could at worst be a short of the 1V portion of the 30V divider and would not appreciably increase the current. The regulator failure modes, especially a short in the regulator present a more significant risk to the battery. If the regulator fails the LED will stop blinking, so it might be noticed. Choice of the fuse and series impedance for the regulator will be important and will be reviewed in this light as the design progresses.

A good connector is important, perhaps a DB9M on the board, and use commercial DB9 cables cut and re-terminated to the batteries for the other side. For even more protection against shorts and KFF (kentucky fried fingers) small fuses can be installed at the battery. This same connector can be used by the charge balancing system off-bike.

Installing clear heatshrink over the PCB protects it from various short circuit problems while allowing visibility of the LEDs. It is more serviceable than a conformal coating (though not as waterproof). Protection from water should be handled by the system packaging.

The remaining ESD risk is through the programming pins. These pins are less protected. Once programmed these pins should be covered with the heatshrink or a connector placed on them to avoid direct contact. A programming fixture could be used that would avoid installing the pins on the board. Just pushing the connector into the holes is generally adequate for the programming cycle of a few seconds. Program verification insures that it is properly programmed.

So we have a basic 8 cell monitor in three chips (instead of the usual 12-16 chip solutions that have no self-checking). For higher cell counts use more of the 8 cell modules. For lower cell counts (or remainders) adjust the software to ignore the unused input channels. With a bit more software and a couple more optical isolators it could even read out individual cell voltages to a remote display, but I will leave that for an upgrade... Though I will arrange the micro pinout so the serial I/O pins are available. The final pinout of the schematic will occur with the PCB layout, flexibility helps in the layout process.

lvm8s004.jpg


Link to larger schematic:
http://picasaweb.google.com/lh/photo/4CXPkVY13XaJinAj1zZ7WIPSvVPFxOvt9gQSg1eZ7bg?feat=directlink


Software

Risks from software errors can be minimized by keeping it simple, testing, and making the sourcecode public for review by others.

Requirements

Essential

Wakeup on Watchdog Timer periodically
Perform self tests
Read cell voltages, average, apply calibration factor
Difference, Compare against window (low, high)
If all selftests and voltages okay then periodically blink green LED
If out of range then set ebrake and squeak piezo and blink red LED
Configure for low power and go to sleep

Nice to Have

Test for "motor running" and adjust thresholds, sample less often, conserve power


Design architectures to review:

TC54 power on reset voltage monitor chip
MAX11068,MAX11080 hi cell count monitor/management chips
DS2726 cell monitor/balancing chip
LTC6802 battery management chip
ATTINY261 microprocessor chip



I am not certain at this point if I will build this. If a clearly worthwhile design emerges from the process and there is interest, perhaps a collaborative build might be done. My time is limited, perhaps someone wants to help with pcb layout and production. It could be a through-hole kit. It could be surface mount and very small. I can handle the software. Programmers for this chip start at about $15 so that's not an impediment for those who want to play with the code or make their own.

Comments on this evolving design (and others that meet these requirements) are welcome in this thread.

Thanks for your input!
 
Alan I wonder if your approach would benefit from an open discussion on the known failure modes of current systems and their approximate probability of demonstrated occurrence first? This would then provide a sort of statistic based design to guide.
 
bigmoose said:
Alan I wonder if your approach would benefit from an open discussion on the known failure modes of current systems and their approximate probability of demonstrated occurrence first? This would then provide a sort of statistic based design to guide.

It just might.

Data on failure modes experienced could be useful.
 
bigmoose said:
Alan I wonder if your approach would benefit from an open discussion on the known failure modes of current systems and their approximate probability of demonstrated occurrence first? This would then provide a sort of statistic based design to guide.

Very good thinking, a truly excellent idea. The thing that always seems to concern me is the creation of complex solutions to problems that may not be as significant as they might seem.

I've worked with high reliability systems most of my working life (as I suspect you have) and have learned, from bitter experience, that it's very easy to let technology dictate the development of a supposed 'solution' when a bit of clear thinking and some unbiased analysis often gives a different answer. I remember a calibration problem we had with some airborne instrumentation. The initial fix was to increase the frequency and apparent accuracy of the pre-flight calibration process. It turned out that too much calibration was the problem, as the disturbance this caused to the sensors increased the errors. The real fix was to decrease the frequency of calibrations events, which markedly improved both the quality and reliability of the system (and gave us more flying time, to boot). The, perhaps apocryphal, tale of the million pound development of the space pen, versus the USSR solution of using a pencil, tends to float to the top of my conciousness when complex solutions to simple problems are proposed.

A good starting point might be to try and do a FMECA on the failure modes of battery packs, although I think getting hard data to work with will be challenging. I bet the outcome would surprise a few people (may be me, too!).

Jeremy
 
Spot on Jeremy! You put into words what was rattling around my head. I was thinking that if connectors, crimps and poor conformal coating were the high probability events (just naming a few of the failure modes that may or may not be statistically significant), that fixing silicon wouldn't be the answer.

I do not have enough personal experience on BMS's to start the list, (perhaps we should do it in in an open poll) for board members to comment on?

BTW, my apologies to Alan, I sure don't want to run your thread into the ditch, so perhaps we should start a new one? I'll let Alan make the call.

PS: For those that might not know FMECA stands for Failure Modes and Effects Critically Analysis (at least that is what we call it on this side of the pond :p ) and it is quite a powerful tool to design out known failure modes in systems.
 
bigmoose said:
Spot on Jeremy! You put into words what was rattling around my head. I was thinking that if connectors, crimps and poor conformal coating were the high probability events (just naming a few of the failure modes that may or may not be statistically significant), that fixing silicon wouldn't be the answer.

I do not have enough personal experience on BMS's to start the list, (perhaps we should do it in in an open poll) for board members to comment on?

BTW, my apologies to Alan, I sure don't want to run your thread into the ditch, so perhaps we should start a new one? I'll let Alan make the call.

It would probably collect more data in a better-named thread for the purpose than in this thread. Why don't you start the data thread, bigmoose?
 
Sounds like a plan Alan! I'll work on figuring out how to do a poll in a new thread, and it is located here:

http://endless-sphere.com/forums/viewtopic.php?f=5&t=22382

PS: Thanks Alan for letting us sidetrack your thread a bit!
 
Good move. FMECA has the same definition both sides of the pond - at least in mil circles, where we tend to use common standards and terminology (it's a NATO thing, I sat on one of their interoperability working groups for years). I think it's important to maintain a systems focus on the outcome, treating the cells, wiring, BMS, charger etc as a power provision system.

Sorry to re-direct/interrupt your thread, Alan, but I'm sure it'll be all the better for having some hard data to focus a solution on.

Jeremy
 
Alan,

I applaud your design that separates the balancing circuit from the monitoring circuit. Although I have not experienced any failures related to BMS use ( I just got my first ones), current designs have an additional FET stage controlling my discharge and they try to balance my pack while in use. Subsequently I will be using my BMS only for charging purposes, and will disconnect it and rely on my bulk low voltage cuttoff until another plug and play device is available for cell monitoring.
 
Just an observation, Alan. As drawn, your circuit may have quite a big variation in resolution across the bank of cells, perhaps too big to work reliably as intended, or not quite enough resolution to be really useful (I'm not sure of which from the schematic).

I'm assuming that you want it to work on all lithium chemistries, so are aiming for a max voltage per cell of around 4.2V (it'd be slightly better if only designed to work on LiFePO4).
This illustrates my concern (it's an issue we've discussed on here a few times over the years):

If the PD on the bottom cell in the stack is set for around 4.2V FS then the bit resolution will be around 4.1mV or so (assuming a 10 bit A/D). This is just about OK, as allowing for quantisation errors etc you'll get an effective accuracy of maybe 8mV, which should be just about OK for ensuring reasonable balance (I've tended to consider about 10mV is about the minimum needed in the past).

If the PD on the top cell in the stack is set for around 34V, then the bit resolution will be around 33mV, or a real-world accuracy figure of maybe 66mV, allowing for the same +/- 1 bit likely quantisation error etc. 66mV just isn't really good enough I believe.

You could just choose to accept the likely 66mV error across all the cells, by making all the PDs the same value (which is what I believe you may have done from your schematic, unfortunately it's a bit small for me to see clearly). This keeps the errors constant across all cells, but I'm still quite concerned at their magnitude. Just a little bit better resolution, with accuracy to match, would be a big improvement, in my opinion.

When we've discussed this digital resolution problem in the past, we've looked at using some of the chips designed to deal with this problem. I don't have the reference to hand, but believe it was a TI chip - they'll be some info in one of the BMS threads here somewhere. This chip is designed with offset, or maybe isolated, A/Ds I believe, so the resolution remains constant across the stack of cells.

The other solutions are to use more than one ucontroller to cover the 8 cell range or try and find a ucontroller with a 12 bit A/D

Jeremy
 
Your schematic closely resembles the CellLog8.
The CellLog can detect when one cell is reading zero (bad cell or wire connection), and is somewhat tolerant to miswiring. They just draw too much power to leave connected. A 'sleep mode' is needed.

Most failures of my analog BMS have been due to the voltage not dividing equally when making a connection to the pack. This was addressed by adding zener diodes across each cell circuit to prevent overvoltage to the chips during pack connection. Miswiring is another failure mode. That's a bit harder to deal with.
 
fechter said:
Your schematic closely resembles the CellLog8.
The CellLog can detect when one cell is reading zero (bad cell or wire connection), and is somewhat tolerant to miswiring. They just draw too much power to leave connected. A 'sleep mode' is needed.

Most failures of my analog BMS have been due to the voltage not dividing equally when making a connection to the pack. This was addressed by adding zener diodes across each cell circuit to prevent overvoltage to the chips during pack connection. Miswiring is another failure mode. That's a bit harder to deal with.

I agree, except that the Cellog8 uses op amps in front of the A/D converter to enhance the resolution at the top end. This partially overcomes the 10 bit limitation in the ATMega 32 series chips, but increases the current drawn by the unit. Another way around this, with lower quiescent power consumption, as I suggested above, would be to switch to the XMEGA chips with their 12 bit A/D converter and use Alan's schematic pretty much as-is.

Jeremy
 
Thanks for the comments!

What I've been thinking is that, since the purpose of this is primarily to detect low voltage, that 30mv resolution is okay. Since we're not balancing with this, we're just trying to detect the 1V drop from nominal to done.

From several perspectives, it is safer to make all inputs for the same voltage, keeps the parts simpler and cheaper (higher quantity, easier inventory), keeps the math simpler, and keeps miswiring from overvolting the ADC inputs.

It won't provide as much information precision as the celllog. Is that a requirement?

Using the same input ranges also would allow the software to handle the miswiring case - and just work. Not sure that is worth the trouble, but it at least won't damage it.
 
+/- 30mV (so ~ 60mV in effect) is just about enough resolution for a simple warning device, but if that is the intention then I'd guess that 8 off TC54s would do the job with a much lower quiescent current and a much lower parts count too (taking into account the many thousands of junctions in the ucontroller). Based on experience, I'd say that the TC54 solution would most probably be far more reliable than the ucontroller solution and have the added bonus of having a quiescent current of around 1uA per channel, low enough to not have to worry about leaving it connected, as the current drain is almost certainly lower than the self-discharge rate for any reasonably sized battery pack (say, over about 2 or 3Ah).

The ucontroller gives an opportunity to do much more than just be a dumb warning device, but to be effective in that role it needs better A/D resolution (which is probably why the Cellog8 uses those op amps on some channels to up the effective resolution on some channels). Just my opinion, but having spent 20 odd years designing, building and flying with instrumentation systems I tend to focus very much on keeping stuff simple and reliable!

Jeremy
 
We differ on some things.

I was planning to do quadreads on the ADC which gives a slight resolution increase. We could look at 12 bit ADC chips but they aren't required to meet the design goals.

I don't think the voltage monitor chip solution is as robust. If it is wired wrong the voltage easily exceeds the ratings for the chips, so it is not protected against that. Transient protection is not easy to add. Failure detection is absent.

The active device count is about 3 for the micro compared to about 20 for the voltage monitor. The component count is about 43 for the micro and about 30 for the TC54. While the component count is slightly lower, the parts most easily damaged count (semiconductors) is much higher for the TC54 design. The micro design is significantly more "armored" against incorrect wiring or voltage spikes.

Counting internal junctions doesn't track with reliability data. Interconnects are too unreliable. As far as I can tell, component reliability hasn't been much of a factor in the BMS failures, it is the protection from the electrical environment that is the problem. It is difficult to protect the TC54's as they need to connect to each cell with low enough impedance to drive the optical isolator LED.

Another requirement of a reliable system is that it self-checks and that is not practical with the voltage monitors.

Power consumption with a properly programmed sleeping micro is very low. Easily adequate for this application.
 
Jeremy Harris said:
...

The ucontroller gives an opportunity to do much more than just be a dumb warning device, but to be effective in that role it needs better A/D resolution (which is probably why the Cellog8 uses those op amps on some channels to up the effective resolution on some channels). Just my opinion, but having spent 20 odd years designing, building and flying with instrumentation systems I tend to focus very much on keeping stuff simple and reliable!

Jeremy

There is a tendency with microprocessors to get caught up in feature creep. It happens to all of us. Sometimes the requirements are better met by resisting the urge and keeping the design simple. Cannot get much simpler than 3 chips to meet these requirements, and the voltage monitors don't even meet the requirements at more than 12 chips.

Are the requirements correct?
 
Unfortunately, over-sampling doesn't get rid of quantisation error for the sort of noisy signal that you're sampling on a battery under load, although it might appear to at first sight, as it's easy to assume that you're dealing with a DC signal. To illustrate this, take a look at the effect of typical cell Ri on terminal voltage. Let's assume that a decent LiFePO4 cell will have an Ri of around 10 mOhm. Pulling 10 amps from this cell will cause the voltage to drop by 100mV, at a randomly varying frequency that's dependent on the instantaneous load on the motor and throttle setting. Add in the inevitable reference error/drift and I don't think that even +/- 30mv is anything like realistic - my estimate is that you'll be lucky to get 60 to 100mV accuracy in practice..

I suspect that the overall system reliability of a stack of TC54s, versus a ucontroller and some peripherals, with home brew code, would be markedly better. Notwithstanding the issues over connectivity reliability, the failure modes for a TC54 solution are likely to be more graceful than those for a ucontroller. My concern with a ucontroller approach to a simple warning device is that failure may not cause a fail-safe outcome, whereas a TC54 solution would be more likely to meet that desired outcome.

In case anyone thinks I'm anti-MCU use, I should, perhaps, add that I designed a 16 bit, 100 samples per second, 8 channel airborne data acquisition system back in 1978, using a Motorola 6800 and eight external 12 bit A/Ds, with 4 bit gain controlled amplifiers to get full 16 bit resolution on each channel. The code was all written in assembler, using MASM. The unit drew around 13 amps from a 5V supply when logging and had a mighty 32K of RAM for data storage. It was fully qualified to operate from -20 deg C to +100 deg C in a shock/vibration environment up to 500g/second. BTW, 100 samples per second was as fast as I could sample and store the data into the RAM that was available at the time. The 6800 was running at 1MHz as I recall....................

Jeremy
 
I built a few systems with those 6800's way back, and that was hard work. Especially when cross-assembling using compass on a CDC machine. Yuck.

The modern 1 chip micros are a lot more reliable than the old multi-chip systems (and tremendously easier to work with). There's probably one on your vehicle's airbag deployment system with an explosive to control. Modern cars have dozens of them and even most microwaves and toasters have them. They are everywhere.

Noise is required for resolution increase and multiple averaged readings always improves things a bit in noise, but as Jeremy points out that doesn't help other things like reference stability. These bandgap references are pretty good, but at some point that would be a limitation. But probably not until beyond 12 bits.

If a TC54 gets popped open from a little transient or even a momentary wiring error, you'll never know until it is too late, unless you regularly test each one. Which is not easy in the usual design. With the micro you can do self testing continuously. Seems to be no comparison.

Open source code is often better than commercial, and this is a pretty simple application, if we keep it that way.

Jeremy and I are not likely to agree on the TC54 vs Micro, so let's drop that for now. We are looking for feedback on the micro design that might improve its reliability, and to get feedback on the requirements and the failure modes that we need to protect against. We have protected against a number of failure modes the the TC54 solution does not.

While we are exploring this area, what are the requirements for adequate cell monitoring resolution and accuracy? What drives those requirements??

If we were charging then the manufacturer's ratings for end of charge voltage would give us some guidance, and those seem to be about 50mV for some cells. But if we are just monitoring, what is the need?

I'm not sure we want to meet those requirements in this cell (low voltage) monitor design, but it is reasonable to investigate it as Jeremy as suggested.
 
Jeremy Harris said:
Unfortunately, over-sampling doesn't get rid of quantisation error for the sort of noisy signal that you're sampling on a battery under load, although it might appear to at first sight, as it's easy to assume that you're dealing with a DC signal. To illustrate this, take a look at the effect of typical cell Ri on terminal voltage. Let's assume that a decent LiFePO4 cell will have an Ri of around 10 mOhm. Pulling 10 amps from this cell will cause the voltage to drop by 100mV, at a randomly varying frequency that's dependent on the instantaneous load on the motor and throttle setting. Add in the inevitable reference error/drift and I don't think that even +/- 30mv is anything like realistic - my estimate is that you'll be lucky to get 60 to 100mV accuracy in practice..

............

Jeremy

This would seem to argue that resolution beyond 50mV would be useless when the motor was running?
 
Once permanently mounted inside a pack, I've never had one of the TC54-based LVC circuits fail, and I've built a ton of these since 2007. The only failures I've had were when they were initially mis-connected, or if a wire from the board to the cells broke/came off, but adding a 5.1V zener on each cell fixed the latter problem. With LiPo-based packs, I've never seen one fail yet, mainly because each 5Ah "sub-pack" comes with a pre-wired balance tap.

To get around the problem of the opto/ebrake not stopping a pack from killing itself, if a controller is left on, for instance, you could always add a very simple active cutoff circuit. The one below was originally posted here by a "brief" member, Randomly, and it works extremely well and has very little standby current drain.

View attachment 4 Channel LVC-Active Cutoff-v4.3.0b.png

I'm working on a very simple "BMS" for a 12V SLA replacement motorcycle/marine genset starter battery that will use a similar version of this circuit that will handle about 250-300A.

-- Gary
 
Back
Top