Life is good again

because it looks like we figured out what the suspend/resume problem was. And as suspected, the actual resource code had nothing what-so-ever to do with it, and was apparently just a trigger for timing.

It's frustrating with bugs like that, but on the other hand it's then a big relief when it gets resolved, and in this case we also ended up going through a lot of code and I think we'll be much better off as a result. It's also a huge relief to find the actual root cause, rather than seeing things that can be used to paper over and hide the problem.

And kudos to the people who actually saw the problem (Rafael and Frans), and who spent a lot of time trying out different things and sending out logs and looking at the resource allocation. The real clue was in a log from a successful suspend/resume cycle that showed some questionable behaviour despite not actually failing.

No comments

Post a Comment