* Boeing 787 integer overflow @ 2015-05-02 23:34 Robert Love 2015-05-03 11:23 ` Maciej Sobczak 0 siblings, 1 reply; 16+ messages in thread From: Robert Love @ 2015-05-02 23:34 UTC (permalink / raw) Ars Tecnica has this article: http://arstechnica.com/information-technology/2015/05/01/boeing-787-dreamliners-contain-a-potentially-catastrophic-software-bug/ Can anyone comment on what language Boeing used for this? If Ada, would a modular integer be more appropriate? Is there an exception handler for this integer? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-02 23:34 Boeing 787 integer overflow Robert Love @ 2015-05-03 11:23 ` Maciej Sobczak 2015-05-03 15:27 ` Georg Bauhaus ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Maciej Sobczak @ 2015-05-03 11:23 UTC (permalink / raw) W dniu niedziela, 3 maja 2015 01:34:59 UTC+2 użytkownik Robert Love napisał: > Ars Tecnica has this article: > > http://arstechnica.com/information-technology/2015/05/01/boeing-787-dreamliners-contain-a-potentially-catastrophic-software-bug/ > > > Can anyone comment on what language Boeing used for this? It does not matter. The ability to run continuously for 8 months was most likely not in the requirements (planes have to be switched off for maintenance more frequently than that anyway), so there was no need to implement a solution for this. You can safely argue that the capacity of the counter allows proper operation within the given bounds and you could even have that tested with 100% coverage of the *required* data/time domain and (why not?) formally verified as well. > If Ada, would a modular integer be more appropriate? Why? Are you aware of the requirement that the counter has to automatically reset after (let's say) half a year? I guess not and even if you attempt to make it up as a derived requirement, it might be superfluous or even contradictory to other requirements. > Is there an > exception handler for this integer? Why? Are there any requirements that explicitly state the plane has to work continuously for longer than 8 months? Ada is not a solution to this problem, because this is really not a problem (unless shown at the level of requirements). The whole article is only an opportunity for journalists to write something exciting and then Boeing has to react somehow purely for PR reasons, even if, from the engineering perspective, they don't actually have to. Of course, if it appears that this part of the system was indeed written in Ada, you can expect Ada skeptics to have a similar ride as with Ariane V. -- Maciej Sobczak * http://www.inspirel.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 11:23 ` Maciej Sobczak @ 2015-05-03 15:27 ` Georg Bauhaus 2015-05-03 16:03 ` Peter Chapin 2015-05-03 23:49 ` robin.vowels 2 siblings, 0 replies; 16+ messages in thread From: Georg Bauhaus @ 2015-05-03 15:27 UTC (permalink / raw) On 03.05.15 13:23, Maciej Sobczak wrote: > Ada is not a solution to this problem, because this is really not a problem (unless shown at the level of requirements). Nor is any programming language, I imagine: given any counting thing of limited size in bits, counting will hit the limit. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 11:23 ` Maciej Sobczak 2015-05-03 15:27 ` Georg Bauhaus @ 2015-05-03 16:03 ` Peter Chapin 2015-05-03 23:34 ` Dennis Lee Bieber 2015-05-03 23:54 ` robin.vowels 2015-05-03 23:49 ` robin.vowels 2 siblings, 2 replies; 16+ messages in thread From: Peter Chapin @ 2015-05-03 16:03 UTC (permalink / raw) On Sun, 3 May 2015, Maciej Sobczak wrote: >> Can anyone comment on what language Boeing used for this? > > It does not matter. The ability to run continuously for 8 months was > most likely not in the requirements (planes have to be switched off for > maintenance more frequently than that anyway), so there was no need to > implement a solution for this. I guess it depends on if there is a *requirement* to reboot the system periodically (less than 8 months) in the maintenance plan. The matter should be handled somewhere and it seems like it wasn't. In other words it was just "luck" that these systems have been getting restarted frequently enough. Planes obviously don't fly for 8 months straight and I'm sure they get maintained, in general, more regularly than that as well. I don't know precisely which system this issue is connected with, but it seems possible to me that, in some cases at least, some systems would be left up and running while others are being maintained. In other words a daily maintenance schedule may not imply that the counter in question is getting reset daily. A software fix might be nice, such as increasing the counter to 64 bits to push the overflow time out to something ridiculous, but just adding an item to the maintenance checklist might also be sufficient. Peter ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 16:03 ` Peter Chapin @ 2015-05-03 23:34 ` Dennis Lee Bieber 2015-05-04 0:00 ` robin.vowels 2015-05-04 0:38 ` Jeffrey R. Carter 2015-05-03 23:54 ` robin.vowels 1 sibling, 2 replies; 16+ messages in thread From: Dennis Lee Bieber @ 2015-05-03 23:34 UTC (permalink / raw) On Sun, 3 May 2015 12:03:51 -0400, Peter Chapin <PChapin@vtc.vsc.edu> declaimed the following: >I guess it depends on if there is a *requirement* to reboot the system >periodically (less than 8 months) in the maintenance plan. The matter >should be handled somewhere and it seems like it wasn't. In other words it >was just "luck" that these systems have been getting restarted frequently >enough. > Also depends upon just what "reboot" means in this environment... If this is some sort of elapsed time counter, then it is something saved in flash memory and will survive a normal power-cycle operation. "Reboot" in this case may mean erasing and reloading the operational flight program, databases, and other stuff in "permanent" memory. -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 23:34 ` Dennis Lee Bieber @ 2015-05-04 0:00 ` robin.vowels 2015-05-04 0:38 ` Jeffrey R. Carter 1 sibling, 0 replies; 16+ messages in thread From: robin.vowels @ 2015-05-04 0:00 UTC (permalink / raw) On Monday, May 4, 2015 at 9:34:59 AM UTC+10, Dennis Lee Bieber wrote: > On Sun, 3 May 2015 12:03:51 -0400, Peter Chapin <P.nospam@vtc.vsc.edu> > declaimed the following: > > >I guess it depends on if there is a *requirement* to reboot the system > >periodically (less than 8 months) in the maintenance plan. The matter > >should be handled somewhere and it seems like it wasn't. In other words it > >was just "luck" that these systems have been getting restarted frequently > >enough. > > > Also depends upon just what "reboot" means in this environment... If > this is some sort of elapsed time counter, then it is something saved in > flash memory and will survive a normal power-cycle operation. The article has pointed out that 248 days corresponds to 2**31 centiseconds, which suggests that the integer is associated with a timer that is running continuously. Timers usually have a lithium battery to keep them going, just as they do in the humble PC. > "Reboot" in this case may mean erasing and reloading the operational > flight program, databases, and other stuff in "permanent" memory. Reboot of the humble PC doesn't usually change the timer's value. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 23:34 ` Dennis Lee Bieber 2015-05-04 0:00 ` robin.vowels @ 2015-05-04 0:38 ` Jeffrey R. Carter 2015-05-04 1:55 ` robin.vowels 1 sibling, 1 reply; 16+ messages in thread From: Jeffrey R. Carter @ 2015-05-04 0:38 UTC (permalink / raw) On 05/03/2015 04:34 PM, Dennis Lee Bieber wrote: > On Sun, 3 May 2015 12:03:51 -0400, Peter Chapin <PChapin@vtc.vsc.edu> > declaimed the following: > >> I guess it depends on if there is a *requirement* to reboot the system >> periodically (less than 8 months) in the maintenance plan. The matter >> should be handled somewhere and it seems like it wasn't. In other words it >> was just "luck" that these systems have been getting restarted frequently >> enough. Apparently there isn't, since the AD is to restart the GCUs more frequently. > Also depends upon just what "reboot" means in this environment... If > this is some sort of elapsed time counter, then it is something saved in > flash memory and will survive a normal power-cycle operation. > > "Reboot" in this case may mean erasing and reloading the operational > flight program, databases, and other stuff in "permanent" memory. Since the Ad is to restart the GCUs more frequently, it doesn't appear to be that complicated. It also says that the effect of the overflow is for the S/W to go into a special mode, so it's clear the S/W detects the overflow somehow. -- Jeff Carter "My little plum, I am like Robin Hood. I take from the rich, and I give to the poor. ... Us poor." Poppy 96 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 0:38 ` Jeffrey R. Carter @ 2015-05-04 1:55 ` robin.vowels 0 siblings, 0 replies; 16+ messages in thread From: robin.vowels @ 2015-05-04 1:55 UTC (permalink / raw) On Monday, May 4, 2015 at 10:38:45 AM UTC+10, Jeffrey R. Carter wrote: > On 05/03/2015 04:34 PM, Dennis Lee Bieber wrote: > > On Sun, 3 May 2015 12:03:51 -0400, Peter Chapin <P.nospam@vtc.vsc.edu> > > declaimed the following: > > > >> I guess it depends on if there is a *requirement* to reboot the system > >> periodically (less than 8 months) in the maintenance plan. The matter > >> should be handled somewhere and it seems like it wasn't. In other words it > >> was just "luck" that these systems have been getting restarted frequently > >> enough. > > Apparently there isn't, since the AD is to restart the GCUs more frequently. > > > Also depends upon just what "reboot" means in this environment... If > > this is some sort of elapsed time counter, then it is something saved in > > flash memory and will survive a normal power-cycle operation. > > > > "Reboot" in this case may mean erasing and reloading the operational > > flight program, databases, and other stuff in "permanent" memory. > > Since the Ad is to restart the GCUs more frequently, it doesn't appear to be > that complicated. It also says that the effect of the overflow is for the S/W to > go into a special mode, That's failsafe mode. > so it's clear the S/W detects the overflow somehow. Indeed, but the overflow handler was a general one for all overflows in the software. A specific one for that particular timer is clearly needed. I can't imagine why they'd want to shut everything down when there's clearly an error. In a plane, you'd want to continue, if possible, and obviously, this one is continuable. In fact, it's essential that it continue. Sounds like a repeat of the Ariadne failure, where they trapped an interrupt and shut down (placing an error code on the data bus, which data was then interpreted as a direction (attitude) change. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 16:03 ` Peter Chapin 2015-05-03 23:34 ` Dennis Lee Bieber @ 2015-05-03 23:54 ` robin.vowels 2015-05-04 8:28 ` Georg Bauhaus 1 sibling, 1 reply; 16+ messages in thread From: robin.vowels @ 2015-05-03 23:54 UTC (permalink / raw) On Monday, May 4, 2015 at 2:03:55 AM UTC+10, Peter Chapin wrote: > On Sun, 3 May 2015, Maciej Sobczak wrote: > > >> Can anyone comment on what language Boeing used for this? > > > > It does not matter. The ability to run continuously for 8 months was > > most likely not in the requirements (planes have to be switched off for > > maintenance more frequently than that anyway), so there was no need to > > implement a solution for this. > > I guess it depends on if there is a *requirement* to reboot the system > periodically (less than 8 months) in the maintenance plan. The matter > should be handled somewhere and it seems like it wasn't. In other words it > was just "luck" that these systems have been getting restarted frequently > enough. > > Planes obviously don't fly for 8 months straight and I'm sure they get > maintained, in general, more regularly than that as well. I don't know > precisely which system this issue is connected with, but it seems possible > to me that, in some cases at least, some systems would be left up and > running while others are being maintained. In other words a daily > maintenance schedule may not imply that the counter in question is getting > reset daily. > > A software fix might be nice, such as increasing the counter to 64 bits to > push the overflow time out to something ridiculous, That isn't the solution. The solution is to have an appropriate error handler. > but just adding an > item to the maintenance checklist might also be sufficient. Things like that get overlooked. Maybe not this year, but in 10 years, when everyone has forgotten about it. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 23:54 ` robin.vowels @ 2015-05-04 8:28 ` Georg Bauhaus 2015-05-04 8:45 ` robin.vowels 2015-05-04 13:28 ` Dennis Lee Bieber 0 siblings, 2 replies; 16+ messages in thread From: Georg Bauhaus @ 2015-05-04 8:28 UTC (permalink / raw) On 04.05.15 01:54, robin.vowels@gmail.com wrote: >> A software fix might be nice, such as increasing the counter to 64 bits to >> >push the overflow time out to something ridiculous, > That isn't the solution. > The solution is to have an appropriate error handler. If the variable has system-wide effects, I think error handling would require a little more than a few lines of code for overflow. Also provided that detecting overflow can be made fast enough in the first place. So, if the bit size could be increased without affecting the system in other ways, this looks like a much cheaper solution to me. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 8:28 ` Georg Bauhaus @ 2015-05-04 8:45 ` robin.vowels 2015-05-04 11:26 ` G.B. 2015-05-04 13:28 ` Dennis Lee Bieber 1 sibling, 1 reply; 16+ messages in thread From: robin.vowels @ 2015-05-04 8:45 UTC (permalink / raw) On Monday, May 4, 2015 at 6:29:00 PM UTC+10, Georg Bauhaus wrote: > On 04.05.15 01:54, r.nospam@gmail.com wrote: > >> A software fix might be nice, such as increasing the counter to 64 bits to > >> >push the overflow time out to something ridiculous, > > That isn't the solution. > > The solution is to have an appropriate error handler. > > If the variable has system-wide effects, I think error > handling would require a little more than a few lines of > code for overflow. Also provided that detecting overflow can > be made fast enough in the first place. Apparently there is already an error handler. Detecting overflow is typically very fast. On some systems, detection is automatic by hardware while on others, an instruction is executed following the relevant arithmetic operation, to raise an interrupt should overflow occur. > So, if the bit size could be increased without affecting > the system in other ways, this looks like a much cheaper > solution to me. Might not be possible. Dedicated processors have limits on the size of a word, or size of the arithmetic register(s). Merely using more bits in a register [even if permitted by hardware] is not the solution. It merely defers it. The solution is to fix the software, by handling the interrupt with a different error handler from the general one. An alternative might be to initialize the timer automatically before every take-off. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 8:45 ` robin.vowels @ 2015-05-04 11:26 ` G.B. 2015-05-04 12:17 ` Dmitry A. Kazakov 0 siblings, 1 reply; 16+ messages in thread From: G.B. @ 2015-05-04 11:26 UTC (permalink / raw) On 04.05.15 10:45, robin.vowels@gmail.com wrote: >>> The solution is to have an appropriate error handler. >> >> If the variable has system-wide effects, I think error >> handling would require a little more than a few lines of >> code for overflow. Also provided that detecting overflow can >> be made fast enough in the first place. > > Apparently there is already an error handler. It doesn't handle the error, if it is one, hence the report. > Detecting overflow is typically very fast. Is it fast? The GNAT UG explains overflow handling options at length, and its default choice of not handling all overflowing integer computations. > On some systems, For the system at hand, we'd have to know what overflow handling means here, in terms of cost. >> So, if the bit size could be increased without affecting >> the system in other ways, this looks like a much cheaper >> solution to me. > > Might not be possible. Dedicated processors have limits on > the size of a word, or size of the arithmetic register(s). If this processor is not one with dedicated overflow checking support, and efficient at that, then using two words instead of one for the counting variable might actually be faster than any overflow handlers. > Merely using more bits in a register [even if permitted > by hardware] is not the solution. It merely defers it. More bits (64 instead of 32) is a solution, since "deferred" is then equivalent to "never", as Peter Chapin has explained when using "ridiculous" for times far into the future, I think. > The solution is to fix the software, by handling the interrupt > with a different error handler from the general one. That's a rephrasing of the original claim. But how to handle? > An alternative might be to initialize the timer automatically > before every take-off. Might it? Is this a known possibility? Generalizing: some counting variables should _never_ be re-initialized without accounting: aging parameters would be restored to "good" by an error in bureaucracy. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 11:26 ` G.B. @ 2015-05-04 12:17 ` Dmitry A. Kazakov 2015-05-04 12:53 ` G.B. 0 siblings, 1 reply; 16+ messages in thread From: Dmitry A. Kazakov @ 2015-05-04 12:17 UTC (permalink / raw) On Mon, 04 May 2015 13:26:50 +0200, G.B. wrote: > On 04.05.15 10:45, robin.vowels@gmail.com wrote: >> Detecting overflow is typically very fast. > > Is it fast? It is not trivial to handle hardware counter's overflows, e.g. by extending it, like 32->64, a lot of ugly issues with race conditions. As a programmer I always suggest to scrap garbage hardware. Engineers usually propose the opposite - to fix hardware by software means. That is how people get hurt! (:-() -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 12:17 ` Dmitry A. Kazakov @ 2015-05-04 12:53 ` G.B. 0 siblings, 0 replies; 16+ messages in thread From: G.B. @ 2015-05-04 12:53 UTC (permalink / raw) On 04.05.15 14:17, Dmitry A. Kazakov wrote: > On Mon, 04 May 2015 13:26:50 +0200, G.B. wrote: > >> On 04.05.15 10:45, robin.vowels@gmail.com wrote: > >>> Detecting overflow is typically very fast. >> >> Is it fast? > > It is not trivial to handle hardware counter's overflows, e.g. by extending > it, like 32->64, a lot of ugly issues with race conditions. I thought so, in particular remembering your use of 64 bit FPT words in non-sequential 32 bit programs? Yet, I think that one cannot simply assess the complexity of handlers or simply dismiss using components of larger (sums of) bit sizes by invoking ceteris paribus. That would be like assuming the system to have FPT hardware, and also permission to use it. Which is why it seems good if a language design does address the handling of overflow and atomic operations, even if only in basic ways. > As a programmer I always suggest to scrap garbage hardware. Engineers > usually propose the opposite - to fix hardware by software means. That is > how people get hurt! (:-() One more argument in favor of clarifying the word "system": the result of human thought. Again, the problem turns out to be one of goal directed organization of units of industry. An error handler achieving that would quite good, wouldn't it? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-04 8:28 ` Georg Bauhaus 2015-05-04 8:45 ` robin.vowels @ 2015-05-04 13:28 ` Dennis Lee Bieber 1 sibling, 0 replies; 16+ messages in thread From: Dennis Lee Bieber @ 2015-05-04 13:28 UTC (permalink / raw) On Mon, 04 May 2015 10:28:59 +0200, Georg Bauhaus <bauhaus@futureapps.invalid> declaimed the following: >On 04.05.15 01:54, robin.vowels@gmail.com wrote: >>> A software fix might be nice, such as increasing the counter to 64 bits to >>> >push the overflow time out to something ridiculous, >> That isn't the solution. >> The solution is to have an appropriate error handler. > >If the variable has system-wide effects, I think error >handling would require a little more than a few lines of >code for overflow. Also provided that detecting overflow can >be made fast enough in the first place. > >So, if the bit size could be increased without affecting >the system in other ways, this looks like a much cheaper >solution to me. Hardware changes are unlikely... Would probably entail a multiple man-year effort to design and certify a new processor board, along with new software on the board to handle the changes in the hardware... I wouldn't be overly surprised to find the actual flight operations don't care about the roll-over... It could be something as simple as a poorly written built-in test algorithm that gets run periodically (every 5 minutes say). Maybe a test for counter operation that relies upon "current value > previous value"; which, upon finding a current value suddenly less than the value the last time it ran, declares the unit faulty and shuts it down. The stop-gap directive seems to be to perform an action that resets the counter and the test values (since just restarting the software woud pick up the values that had been in play). -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Boeing 787 integer overflow 2015-05-03 11:23 ` Maciej Sobczak 2015-05-03 15:27 ` Georg Bauhaus 2015-05-03 16:03 ` Peter Chapin @ 2015-05-03 23:49 ` robin.vowels 2 siblings, 0 replies; 16+ messages in thread From: robin.vowels @ 2015-05-03 23:49 UTC (permalink / raw) On Sunday, May 3, 2015 at 9:23:45 PM UTC+10, Maciej Sobczak wrote: > W dniu niedziela, 3 maja 2015 01:34:59 UTC+2 użytkownik Robert Love napisał: > > > Ars Tecnica has this article: > > > > http://arstechnica.com/information-technology/2015/05/01/boeing-787-dreamliners-contain-a-potentially-catastrophic-software-bug/ > > > > > > Can anyone comment on what language Boeing used for this? > > It does not matter. The ability to run continuously for 8 months was most likely not in the requirements (planes have to be switched off for maintenance more frequently than that anyway), so there was no need to implement a solution for this. You can safely argue that the capacity of the counter allows proper operation within the given bounds and you could even have that tested with 100% coverage of the *required* data/time domain and (why not?) formally verified as well. > > > If Ada, would a modular integer be more appropriate? > > Why? Are you aware of the requirement that the counter has to automatically reset after (let's say) half a year? I guess not and even if you attempt to make it up as a derived requirement, it might be superfluous or even contradictory to other requirements. > > > Is there an > > exception handler for this integer? > > Why? Are there any requirements that explicitly state the plane has to work continuously for longer than 8 months? It won't be in the air for 6 months, but the software may be running for that time, or the counter is running continuously. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-05-04 13:28 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-02 23:34 Boeing 787 integer overflow Robert Love 2015-05-03 11:23 ` Maciej Sobczak 2015-05-03 15:27 ` Georg Bauhaus 2015-05-03 16:03 ` Peter Chapin 2015-05-03 23:34 ` Dennis Lee Bieber 2015-05-04 0:00 ` robin.vowels 2015-05-04 0:38 ` Jeffrey R. Carter 2015-05-04 1:55 ` robin.vowels 2015-05-03 23:54 ` robin.vowels 2015-05-04 8:28 ` Georg Bauhaus 2015-05-04 8:45 ` robin.vowels 2015-05-04 11:26 ` G.B. 2015-05-04 12:17 ` Dmitry A. Kazakov 2015-05-04 12:53 ` G.B. 2015-05-04 13:28 ` Dennis Lee Bieber 2015-05-03 23:49 ` robin.vowels
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox