comp.lang.ada
 help / color / mirror / Atom feed
* [OT] Spirit - Software failure
@ 2004-01-26 10:15 Jano
  2004-01-26 10:42 ` Preben Randhol
  0 siblings, 1 reply; 18+ messages in thread
From: Jano @ 2004-01-26 10:15 UTC (permalink / raw)


Last theory is that the failure was in software:

"Encouraging developments continued for Opportunity's twin, Spirit,
too. Engineers have determined that Spirit's flash memory hardware is
functional, strengthening a theory that Spirit's main problem is in
software that controls file management of the memory. "I think we've
got a patient that's well on the way to recovery," said Mars
Exploration Rover Project Manager Pete Theisinger at NASA's Jet
Propulsion Laboratory, Pasadena, Calif."

http://www.jpl.nasa.gov/releases/2004/37.cfm

DISCLAIMER: I'm not sure if the software of the Spirit is in Ada or C
(It was commented recently but I can't remember). I don't post this to
start another language flamewar or another session of backpatting; I
think simply that people in this group can be interested in this kind
of news.

If you know of another group where this is being discussed I would be
grateful if you can point it to me.

Kind regards!



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 10:15 [OT] Spirit - Software failure Jano
@ 2004-01-26 10:42 ` Preben Randhol
  2004-01-26 13:26   ` Larry Kilgallen
  0 siblings, 1 reply; 18+ messages in thread
From: Preben Randhol @ 2004-01-26 10:42 UTC (permalink / raw)


On 2004-01-26, Jano <402450@cepsz.unizar.es> wrote:
> DISCLAIMER: I'm not sure if the software of the Spirit is in Ada or C

I don't know either, except that it was said to be programmed in C.


-- 
"Saving keystrokes is the job of the text editor, not the programming
 language."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 10:42 ` Preben Randhol
@ 2004-01-26 13:26   ` Larry Kilgallen
  2004-01-26 13:57     ` Ludovic Brenta
  2004-01-26 14:13     ` fdebruin
  0 siblings, 2 replies; 18+ messages in thread
From: Larry Kilgallen @ 2004-01-26 13:26 UTC (permalink / raw)


In article <slrnc19roj.4bn.randhol+valid_for_reply_from_news@k-083152.nt.ntnu.no>, Preben Randhol <randhol+valid_for_reply_from_news@pvv.org> writes:
> On 2004-01-26, Jano <402450@cepsz.unizar.es> wrote:
>> DISCLAIMER: I'm not sure if the software of the Spirit is in Ada or C
> 
> I don't know either, except that it was said to be programmed in C.

Even if it was programmed in C, I would think it could have been tested
more thoroughly.  Massive repeated long-term variagated testing of
memory management software should be one of the easier black box tests
to arrange.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 13:26   ` Larry Kilgallen
@ 2004-01-26 13:57     ` Ludovic Brenta
  2004-01-26 14:15       ` Preben Randhol
  2004-01-26 14:13     ` fdebruin
  1 sibling, 1 reply; 18+ messages in thread
From: Ludovic Brenta @ 2004-01-26 13:57 UTC (permalink / raw)


Kilgallen@SpamCop.net (Larry Kilgallen) writes:

> Preben Randhol writes:
> > On 2004-01-26, Jano wrote:
> >> DISCLAIMER: I'm not sure if the software of the Spirit is in Ada or C
> > 
> > I don't know either, except that it was said to be programmed in C.
> 
> Even if it was programmed in C, I would think it could have been tested
> more thoroughly.  Massive repeated long-term variagated testing of
> memory management software should be one of the easier black box tests
> to arrange.

Any details on the failure itself?  Is it related to memory
management?  A buffer overflow for example?

Besides, for this kind of mission I would favour verification over
testing.

-- 
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 13:26   ` Larry Kilgallen
  2004-01-26 13:57     ` Ludovic Brenta
@ 2004-01-26 14:13     ` fdebruin
  2004-01-26 23:46       ` Robert A Duff
  1 sibling, 1 reply; 18+ messages in thread
From: fdebruin @ 2004-01-26 14:13 UTC (permalink / raw)


Kilgallen@SpamCop.net (Larry Kilgallen) writes:
>Even if it was programmed in C, I would think it could have been tested
>more thoroughly.  

More than what? It is difficult to judge whether the test effort can
be considered sufficient if one does not have insight in what has been
performed.

In addition, with testing you can proof the presence of errors, but you
cannot proof their absence.

Frank de Bruin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 13:57     ` Ludovic Brenta
@ 2004-01-26 14:15       ` Preben Randhol
  2004-01-26 23:17         ` Hyman Rosen
  0 siblings, 1 reply; 18+ messages in thread
From: Preben Randhol @ 2004-01-26 14:15 UTC (permalink / raw)


On 2004-01-26, Ludovic Brenta <ludovic.brenta@insalien.org> wrote:
>
> Any details on the failure itself?  Is it related to memory
> management?  A buffer overflow for example?

Last I read it had something to do with the filemanagement in the flash
memory, but this is what it says in the nasa press release. What the real
bug is due to I don't know ...


-- 
"Saving keystrokes is the job of the text editor, not the programming
 language."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 14:15       ` Preben Randhol
@ 2004-01-26 23:17         ` Hyman Rosen
  2004-01-27  0:40           ` Alexandre E. Kopilovitch
  0 siblings, 1 reply; 18+ messages in thread
From: Hyman Rosen @ 2004-01-26 23:17 UTC (permalink / raw)


Preben Randhol wrote:
> Last I read it had something to do with the filemanagement in the flash
> memory, but this is what it says in the nasa press release. What the real
> bug is due to I don't know ...

There's this now: <http://abcnews.go.com/wire/US/ap20040126_1641.html>

	Spirit began acting up last week, when it stopped sending
	data and began rebooting its computer, eventually resetting
	it roughly 130 times. At one point, the rover thought it was
	the year 2053, Trosper said.

	To tame Spirit's computer, engineers temporarily disabled its
	flash memory.

	Engineers believe they gave Spirit too little random-access memory,
	or RAM, to adequately manage its file-packed flash memory, which is
	similar to the memory used by digital cameras to store photographs.

	Cutting off the flash memory eased the burden on Spirit's RAM and
	ended the rebooting loop that had plagued the spacecraft, Trosper said.

	Engineers planned to do a health check on Spirit's flash memory on
	Tuesday, and then begin deleting hundreds of unneeded files to make
	the memory more manageable for the rover's RAM, Trosper said.

"Too little RAM"? Sure sounds like a buffer overrun.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 14:13     ` fdebruin
@ 2004-01-26 23:46       ` Robert A Duff
  2004-01-27  4:24         ` Larry Kilgallen
  0 siblings, 1 reply; 18+ messages in thread
From: Robert A Duff @ 2004-01-26 23:46 UTC (permalink / raw)


fdebruin <fdebruin@xs4all.nl> writes:

> In addition, with testing you can proof the presence of errors, but you
> cannot proof their absence.

With X, where X is any method whatsoever you can name, you can prove the
presence of errors, but you cannot prove their absence.

- Bob



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 23:17         ` Hyman Rosen
@ 2004-01-27  0:40           ` Alexandre E. Kopilovitch
  0 siblings, 0 replies; 18+ messages in thread
From: Alexandre E. Kopilovitch @ 2004-01-27  0:40 UTC (permalink / raw)
  To: comp.lang.ada

Hyman Rosen wrote:

> There's this now: <http://abcnews.go.com/wire/US/ap20040126_1641.html>
>
>	Spirit began acting up last week, when it stopped sending
>	data and began rebooting its computer, eventually resetting
>	it roughly 130 times. At one point, the rover thought it was
>	the year 2053, Trosper said.
>...
>
> "Too little RAM"? Sure sounds like a buffer overrun.

These days even on the Mars one can catch a computer virus -;)




Alexander Kopilovitch                      aek@vib.usr.pu.ru
Saint-Petersburg
Russia




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-26 23:46       ` Robert A Duff
@ 2004-01-27  4:24         ` Larry Kilgallen
  2004-01-27  8:30           ` Stephen Leake
  0 siblings, 1 reply; 18+ messages in thread
From: Larry Kilgallen @ 2004-01-27  4:24 UTC (permalink / raw)


In article <wccznca1dfh.fsf@shell01.TheWorld.com>, Robert A Duff <bobduff@shell01.TheWorld.com> writes:
> fdebruin <fdebruin@xs4all.nl> writes:
> 
>> In addition, with testing you can proof the presence of errors, but you
>> cannot proof their absence.
> 
> With X, where X is any method whatsoever you can name, you can prove the
> presence of errors, but you cannot prove their absence.

Or if you don't try hard enough, you can't even prove the presence of errors.

        <<< EISNER::DRA1:[NOTES$LIBRARY]HOBBIES_AND_INTERESTS.NOTE;1 >>>
                           -< HOBBIES_AND_INTERESTS >-
=============================================================================
Note 226.85                 NASA, Space Flight, etc.                 85 of 86
EISNER::SCOPELLITI                                18 lines  26-JAN-2004 21:45
                         -< Would ODS-2 have helped? >-
-----------------------------------------------------------------------------
    Detailing the Spirit rover's problems.  From:
    http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/index.html
    
    "Trosper said the problem appeared to be that the rover's flash memory
    couldn't handle the number of files it was storing. The jam-up, she
    said, apparently kept Spirit from shutting down properly and performing
    a number of functions that normally originated in its flash memory. 
    
    "Scientists are still analyzing the data, she said, but would begin
    deleting unnecessary files to test that theory.
    
    "She pointed out that the scientists had thoroughly tested the rover's
    systems on Earth, but that the longest trial for the file system was
    nine days, half of the 18 days Spirit operated before running into the
    problem."
    --------------------
    Perhaps this is the first time that a defrag actually fixes something?
    <GRIN>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27  4:24         ` Larry Kilgallen
@ 2004-01-27  8:30           ` Stephen Leake
  2004-01-27 10:59             ` Larry Kilgallen
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stephen Leake @ 2004-01-27  8:30 UTC (permalink / raw)
  To: comp.lang.ada

Kilgallen@SpamCop.net (Larry Kilgallen) writes:

>     Detailing the Spirit rover's problems.  From:
>     http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/index.html
>     
>     "Trosper said the problem appeared to be that the rover's flash memory
>     couldn't handle the number of files it was storing. The jam-up, she
>     said, apparently kept Spirit from shutting down properly and performing
>     a number of functions that normally originated in its flash memory. 
>     
>     "Scientists are still analyzing the data, she said, but would begin
>     deleting unnecessary files to test that theory.
>     
>     "She pointed out that the scientists had thoroughly tested the rover's
>     systems on Earth, but that the longest trial for the file system was
>     nine days, half of the 18 days Spirit operated before running into the
>     problem."
>     --------------------
>     Perhaps this is the first time that a defrag actually fixes something?
>     <GRIN>

Deleting files isn't defrag. This report describes a plain old memory leak.

Which probably would have been checked for if they had called the
software managing the flash ram a "memory management system" rather
than a "file system"; JPL programmers know they need to check for
memory leaks. But apparently they don't know they need to check for
full disks?

How long does a test need to be to be "thorough"? Longer than 9 days,
apparently :).

I doubt using Ada would have fixed this. They'd have just mapped
Ada.Text_IO to the flash memory, and had the same problem :).

Hiding dynamic memory management under a file system metaphor is a bad
idea in so many ways ...

-- 
-- Stephe




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27  8:30           ` Stephen Leake
@ 2004-01-27 10:59             ` Larry Kilgallen
  2004-01-27 11:47             ` Preben Randhol
  2004-01-27 13:53             ` Dmitry A. Kazakov
  2 siblings, 0 replies; 18+ messages in thread
From: Larry Kilgallen @ 2004-01-27 10:59 UTC (permalink / raw)


In article <mailman.36.1075192268.2270.comp.lang.ada@ada-france.org>, Stephen Leake <stephen_leake@acm.org> writes:

> How long does a test need to be to be "thorough"? Longer than 9 days,
> apparently :).

Ideally a test should put _more_ stress on a component that it would
undergo in actual use.  It is often impossible to achieve that ideal,
but 9 days testing on a component that will get considerably more than
9 days use seems bad.

> I doubt using Ada would have fixed this.

Certainly not, but this is certainly relevant to the issue of management
shortchanging the non-programming tasks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27  8:30           ` Stephen Leake
  2004-01-27 10:59             ` Larry Kilgallen
@ 2004-01-27 11:47             ` Preben Randhol
  2004-01-27 12:30               ` Jeff C,
  2004-01-27 12:41               ` Preben Randhol
  2004-01-27 13:53             ` Dmitry A. Kazakov
  2 siblings, 2 replies; 18+ messages in thread
From: Preben Randhol @ 2004-01-27 11:47 UTC (permalink / raw)


On 2004-01-27, Stephen Leake <stephen_leake@acm.org> wrote:
> Deleting files isn't defrag. This report describes a plain old memory leak.
>
> Which probably would have been checked for if they had called the
> software managing the flash ram a "memory management system" rather
> than a "file system"; JPL programmers know they need to check for
> memory leaks. But apparently they don't know they need to check for
> full disks?

I just read:

http://www.washingtonpost.com/wp-dyn/articles/A50645-2004Jan26.html

and it says:

   To ease the burden on the rover's data collection and storage system,
   engineers plan in the next day or so to start deleting hundreds of
   the unnecessary files, she said. Opportunity's handlers will see to
   it that any similar backlog in Opportunity's memory is also purged.

I don't understand this. Did they think they could take pictures forever
and the 256 Mb wouldn't be used up or there is a memory leake in the
software so that when they now take pictures (perhaps with new
instruments?) there isn't enough room to store the large pictures?


> Hiding dynamic memory management under a file system metaphor is a bad
> idea in so many ways ...

Yes.

Btw does anybody know how the colours on the photos are calibrated? If
one look at the panorama pictures one can see that the colours on the
rover are very strange. Especially the blue colour. Seems to me that the
hue is not matching what one would expect from the photos of the rover
on the nasa site (on earth).

-- 
"Saving keystrokes is the job of the text editor, not the programming
 language."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27 11:47             ` Preben Randhol
@ 2004-01-27 12:30               ` Jeff C,
  2004-01-27 12:41               ` Preben Randhol
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff C, @ 2004-01-27 12:30 UTC (permalink / raw)



"Preben Randhol" <randhol+valid_for_reply_from_news@pvv.org> wrote in
message >
> I don't understand this. Did they think they could take pictures forever
> and the 256 Mb wouldn't be used up or there is a memory leake in the
> software so that when they now take pictures (perhaps with new
> instruments?) there isn't enough room to store the large pictures?
>

I still have yet to see an article that REALY explains what the failure more
here is. People
keep assuming that they know the real details based on these high level
summaries by trying
to apply the little information we get to their own real-world experience or
biases.

If I put on that same hat, I would GUESS that what is happening here is that
they are using the
DosFS component of vxWorks which keeps (at least in prior versions) all of
the FAT tables in RAM and that
while there is still flash space left, the sheer number of files is causing
the RAM allocated to the FAT table
to be used up.

Do I have any facts to back this up....No..I have seen similar problems with
vxWorks filesystems but making
determinations based on half truths is as naive as people that think that
the Arraine 5 blew up because
Ada exceptions are bad.






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27 11:47             ` Preben Randhol
  2004-01-27 12:30               ` Jeff C,
@ 2004-01-27 12:41               ` Preben Randhol
  2004-01-27 16:52                 ` Hyman Rosen
  1 sibling, 1 reply; 18+ messages in thread
From: Preben Randhol @ 2004-01-27 12:41 UTC (permalink / raw)


On 2004-01-27, Preben Randhol <randhol+valid_for_reply_from_news@pvv.org> wrote:
> On 2004-01-27, Stephen Leake <stephen_leake@acm.org> wrote:
>> Deleting files isn't defrag. This report describes a plain old memory leak.
>>
>> Which probably would have been checked for if they had called the
>> software managing the flash ram a "memory management system" rather
>> than a "file system"; JPL programmers know they need to check for
>> memory leaks. But apparently they don't know they need to check for
>> full disks?
>
> I just read:
>
> http://www.washingtonpost.com/wp-dyn/articles/A50645-2004Jan26.html
>
> and it says:
>
>    To ease the burden on the rover's data collection and storage system,
>    engineers plan in the next day or so to start deleting hundreds of
>    the unnecessary files, she said. Opportunity's handlers will see to
>    it that any similar backlog in Opportunity's memory is also purged.
>
> I don't understand this. Did they think they could take pictures forever
> and the 256 Mb wouldn't be used up or there is a memory leake in the
> software so that when they now take pictures (perhaps with new
> instruments?) there isn't enough room to store the large pictures?
>
>
>> Hiding dynamic memory management under a file system metaphor is a bad
>> idea in so many ways ...
>
> Yes.
>
> Btw does anybody know how the colours on the photos are calibrated? If
> one look at the panorama pictures one can see that the colours on the
> rover are very strange. Especially the blue colour. Seems to me that the
> hue is not matching what one would expect from the photos of the rover
> on the nasa site (on earth).

I didn't mean the rover I meant the landing pad. If you look at the
picture here:

   http://marsrovers.jpl.nasa.gov/mission/spacecraft_instru_calibr.html

or better here

   http://marsrovers.jpl.nasa.gov/gallery/press/spirit/20040110a.html

and then compare with it here:

http://marsrovers.jpl.nasa.gov/gallery/press/opportunity/20040126a/MERB_Sol1_Postcard-B002R1_br.jpg

the colours don't match.

-- 
"Saving keystrokes is the job of the text editor, not the programming
 language."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27  8:30           ` Stephen Leake
  2004-01-27 10:59             ` Larry Kilgallen
  2004-01-27 11:47             ` Preben Randhol
@ 2004-01-27 13:53             ` Dmitry A. Kazakov
  2 siblings, 0 replies; 18+ messages in thread
From: Dmitry A. Kazakov @ 2004-01-27 13:53 UTC (permalink / raw)


On 27 Jan 2004 03:30:55 -0500, Stephen Leake <stephen_leake@acm.org>
wrote:

>Kilgallen@SpamCop.net (Larry Kilgallen) writes:
>
>>     Detailing the Spirit rover's problems.  From:
>>     http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/index.html
>>     
>>     "Trosper said the problem appeared to be that the rover's flash memory
>>     couldn't handle the number of files it was storing. The jam-up, she
>>     said, apparently kept Spirit from shutting down properly and performing
>>     a number of functions that normally originated in its flash memory. 
>>     
>>     "Scientists are still analyzing the data, she said, but would begin
>>     deleting unnecessary files to test that theory.
>>     
>>     "She pointed out that the scientists had thoroughly tested the rover's
>>     systems on Earth, but that the longest trial for the file system was
>>     nine days, half of the 18 days Spirit operated before running into the
>>     problem."
>>     --------------------
>>     Perhaps this is the first time that a defrag actually fixes something?
>>     <GRIN>
>
>Deleting files isn't defrag. This report describes a plain old memory leak.
>
>Which probably would have been checked for if they had called the
>software managing the flash ram a "memory management system" rather
>than a "file system"; JPL programmers know they need to check for
>memory leaks. But apparently they don't know they need to check for
>full disks?
>
>How long does a test need to be to be "thorough"? Longer than 9 days,
>apparently :).
>
>I doubt using Ada would have fixed this. They'd have just mapped
>Ada.Text_IO to the flash memory, and had the same problem :).
>
>Hiding dynamic memory management under a file system metaphor is a bad
>idea in so many ways ...

Almost as bad as not to use Ada... (:-))

--
Regards,
Dmitry A. Kazakov
www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27 12:41               ` Preben Randhol
@ 2004-01-27 16:52                 ` Hyman Rosen
  2004-01-27 17:01                   ` Preben Randhol
  0 siblings, 1 reply; 18+ messages in thread
From: Hyman Rosen @ 2004-01-27 16:52 UTC (permalink / raw)


Preben Randhol wrote:
>>Btw does anybody know how the colours on the photos are calibrated?
> the colours don't match.

Here's a very thorough explanation of the Mars colors:
<http://www.atsnn.com/marscolors.html>. Some of the
sundial chips are bright infra-red, for example.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [OT] Spirit - Software failure
  2004-01-27 16:52                 ` Hyman Rosen
@ 2004-01-27 17:01                   ` Preben Randhol
  0 siblings, 0 replies; 18+ messages in thread
From: Preben Randhol @ 2004-01-27 17:01 UTC (permalink / raw)


On 2004-01-27, Hyman Rosen <hyrosen@mail.com> wrote:
> Preben Randhol wrote:
>>>Btw does anybody know how the colours on the photos are calibrated?
>> the colours don't match.
>
> Here's a very thorough explanation of the Mars colors:
><http://www.atsnn.com/marscolors.html>. Some of the
> sundial chips are bright infra-red, for example.

Ah thanks I see now.

-- 
"Saving keystrokes is the job of the text editor, not the programming
 language."



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-01-27 17:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-26 10:15 [OT] Spirit - Software failure Jano
2004-01-26 10:42 ` Preben Randhol
2004-01-26 13:26   ` Larry Kilgallen
2004-01-26 13:57     ` Ludovic Brenta
2004-01-26 14:15       ` Preben Randhol
2004-01-26 23:17         ` Hyman Rosen
2004-01-27  0:40           ` Alexandre E. Kopilovitch
2004-01-26 14:13     ` fdebruin
2004-01-26 23:46       ` Robert A Duff
2004-01-27  4:24         ` Larry Kilgallen
2004-01-27  8:30           ` Stephen Leake
2004-01-27 10:59             ` Larry Kilgallen
2004-01-27 11:47             ` Preben Randhol
2004-01-27 12:30               ` Jeff C,
2004-01-27 12:41               ` Preben Randhol
2004-01-27 16:52                 ` Hyman Rosen
2004-01-27 17:01                   ` Preben Randhol
2004-01-27 13:53             ` Dmitry A. Kazakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox