Ravenscar and context switching for Cortex-M4

comp.lang.ada
 help / color / mirror / Atom feed

* Ravenscar and context switching for Cortex-M4
@ 2015-02-12 20:25 Patrick Noffke
  2015-02-12 21:28 ` Niklas Holsti
  2015-02-16 16:27 ` Patrick Noffke
  0 siblings, 2 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-02-12 20:25 UTC (permalink / raw)


I am porting the GNAT Ravenscar-sfp runtime to work with the TI TM4C MCU, using the STM32F4 implementation as a starting point.  I am having a problem where one of two tasks blocked on entries (each in separate protected objects) is not getting activated.

Here is the situation:

I have two POs, each with an interrupt handler and an entry.  PO1 services UART interrupts and PO2 services SPI interrupts.  When each interrupt fires, the barrier in the corresponding entry is released.

Then I have two tasks, T1 and T2.  T1 is blocked on PO1's entry and T2 is blocked on PO2's entry.

The problem happens when the interrupts happen very near in time to each other.  Then, both entries are still executed, but only one task runs after the last entry completes.  I have instrumented the run-time as well as my code to toggle GPIOs and see what's happening when.  Here is the timing that fails:

1. SPI interrupt triggers.
2. Interrupt_Request_Handler in s-bbcppr-armv7m.adb is executed.  It calls my interrupt handler or PO2, followed by the entry (entry called in the interrupt context), and triggers a context switch for T2.
3. Before the task T2 can run, the UART interrupt triggers.
4. Interrupt_Request_Handler calls my interrupt handler for PO1 and its entry, and triggers a context switch for T1.
5. T1 then runs until it blocks again on the entry for PO1.

T2 never runs.  Furthermore, after this occurs, the entry for PO2 is never executed again (though its interrupt handler is).  T2 also never runs again.

I'm not yet familiar enough with the runtime to know what's happening.  But perhaps the issue is related to using Pend_SV_Handler to trigger the context switch.  Does the "pending" context switch for T1 never get executed since T2 is switched in before the Pend_SV_Handler can execute?  I'm reluctant to muck with Pend_SV_Handler to instrument the code since I don't want to perturb the processor registers and break that which I'm trying to instrument.

If it's useful, T1's priority is 10 and T2's priority is 190.  So even though T1 is lower priority than T2, T1 is switched in before T2 can run.

Both interrupt handlers are the same priority.

Also, if the second interrupt does not happen before the first task can run, then everything is fine (i.e. both tasks get their turn to run).

According to http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles, at most one task may be queued on an entry.  I take this to mean *on a single entry* and that two tasks may be simultaneously queued on separate entries.  Is that correct?  If not, then this is the problem.  But I've seen other suggestions in this list that my interpretation is correct.

Best regards,
Patrick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-12 20:25 Ravenscar and context switching for Cortex-M4 Patrick Noffke
@ 2015-02-12 21:28 ` Niklas Holsti
  2015-02-13 12:41   ` G.B.
  2015-02-16 16:27 ` Patrick Noffke
  1 sibling, 1 reply; 20+ messages in thread
From: Niklas Holsti @ 2015-02-12 21:28 UTC (permalink / raw)

On 15-02-12 22:25 , Patrick Noffke wrote:
> I am porting the GNAT Ravenscar-sfp runtime to work with the TI TM4C
> MCU, using the STM32F4 implementation as a starting point.  I am
> having a problem where one of two tasks blocked on entries (each in
> separate protected objects) is not getting activated.

Can't help you with that, sorry... but hurrah! for working on Ada 
run-times for microcontrollers.

> According to
> http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles,
> at most one task may be queued on an entry.  I take this to mean *on
> a single entry* and that two tasks may be simultaneously queued on
> separate entries.  Is that correct?

Definitely correct. In typical Ravenscar applications, at any one time 
there are many tasks queued on entries, but at most one task is queued 
on each entry at the same time.

In some Ravenscar design guidelines, it is suggested that there should 
be a static association between task and entry, so that for each entry 
there is exactly one task that ever queues on this entry. This guideline 
ensures that there is no risk of two or more tasks ever queuing on the 
same entry at the same time, but I believe that the profile does not 
require this design rule -- it is allowed for more than one task to 
queue on the same entry, as long as they do not do it at the same time. 
In other words, the entry-queue length should never be more than one in 
a Ravenscar system (which means that it is not necessary to implement 
actual entry queues in the run-time).

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-12 21:28 ` Niklas Holsti
@ 2015-02-13 12:41   ` G.B.
  2015-02-13 16:25     ` Simon Wright
  2015-02-13 18:08     ` Niklas Holsti
  0 siblings, 2 replies; 20+ messages in thread
From: G.B. @ 2015-02-13 12:41 UTC (permalink / raw)


On 12.02.15 22:28, Niklas Holsti wrote:
> On 15-02-12 22:25 , Patrick Noffke wrote:

>> According to
>> http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles,
>>
>> at most one task may be queued on an entry.  I take this to mean *on
>> a single entry* and that two tasks may be simultaneously queued on
>> separate entries.  Is that correct?
>
> Definitely correct.

Are you sure this is correct? The profile includes

                    Simple_Barriers,
                    Max_Entry_Queue_Length => 1,
                    Max_Protected_Entries => 1,
                    Max_Task_Entries => 0,

I thought "essentially no queuing" (hence no more than one
simple entry per object) is a good rule of thumb for Ravenscar
profile based programming.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-13 12:41   ` G.B.
@ 2015-02-13 16:25     ` Simon Wright
  2015-02-13 18:08     ` Niklas Holsti
  1 sibling, 0 replies; 20+ messages in thread
From: Simon Wright @ 2015-02-13 16:25 UTC (permalink / raw)


"G.B." <bauhaus@futureapps.invalid> writes:

> On 12.02.15 22:28, Niklas Holsti wrote:
>> On 15-02-12 22:25 , Patrick Noffke wrote:
>
>>> According to
>>> http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles,
>>>
>>> at most one task may be queued on an entry.  I take this to mean *on
>>> a single entry* and that two tasks may be simultaneously queued on
>>> separate entries.  Is that correct?
>>
>> Definitely correct.
>
> Are you sure this is correct? The profile includes
>
>                    Simple_Barriers,
>                    Max_Entry_Queue_Length => 1,
>                    Max_Protected_Entries => 1,
>                    Max_Task_Entries => 0,
>
> I thought "essentially no queuing" (hence no more than one
> simple entry per object) is a good rule of thumb for Ravenscar
> profile based programming.

Not just a rule of thumb!

   Tasks may not have entries
   POs may have zero or one entries
   Only one task may be waiting on a PO's entry

but

   you can have 42 tasks each waiting on 42 different POs' single
   entries (if you have enough RAM :-)

I don't know what would happen in my FreeRTOS-based RTS if I let 2 tasks
try to queue on the same entry ... I'd get Program_Error with "entry
call already queued". The variable called Object.Entry_Queue is in fact
a single pointer.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-13 12:41   ` G.B.
  2015-02-13 16:25     ` Simon Wright
@ 2015-02-13 18:08     ` Niklas Holsti
  2015-02-13 19:01       ` Simon Wright
  2015-02-13 23:45       ` Georg Bauhaus
  1 sibling, 2 replies; 20+ messages in thread
From: Niklas Holsti @ 2015-02-13 18:08 UTC (permalink / raw)

On 15-02-13 14:41 , G.B. wrote:
> On 12.02.15 22:28, Niklas Holsti wrote:
>> On 15-02-12 22:25 , Patrick Noffke wrote:
>
>>> According to
>>> http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles,
>>>
>>>
>>> at most one task may be queued on an entry.  I take this to mean *on
>>> a single entry* and that two tasks may be simultaneously queued on
>>> separate entries.  Is that correct?
>>
>> Definitely correct.
>
> Are you sure this is correct?

Yes, as I understood the question.

> The profile includes
>
>                     Simple_Barriers,
>                     Max_Entry_Queue_Length => 1,
>                     Max_Protected_Entries => 1,
>                     Max_Task_Entries => 0,

So?

> I thought "essentially no queuing" (hence no more than one
> simple entry per object) is a good rule of thumb for Ravenscar
> profile based programming.

Ravenscar allows at most one entry per protected object; that is what 
Max_Protected_Entries => 1 means. It is a fixed limitation, not a rule 
of thumb.

Two or more tasks can be simultaneously queued (i.e. blocked) on 
separate entries, as long as no more than one task is queued on any 
given entry. These separate entries must be in as many separate 
protected objects, because each PO can have at most one entry.

However, the number of entries per PO is not the essential factor for 
queuing. In some more permissive tasking profile, there could be 
multiple entries per PO, and there would still be no queues of blocked 
tasks if the constraint Max_Entry_Queue_Length => 1 is retained.

The OP said:

 > I have two POs, each with an interrupt handler and an entry.
 > [snip]
 > Then I have two tasks, T1 and T2.  T1 is blocked on
 > PO1's entry and T2 is blocked on PO2's entry.

Entirely within Ravenscar rules, and the very pattern for handling two 
different interrupts in a Ravenscar application.

AIUI, the Ravenscar rules mean that in a PO with an entry, the entry is 
basically implemented by:
- a Boolean flag, the barrier for the entry
- one pointer to the (at most one) task waiting on the closed entry.

It would IMO not be a great complication to allow more than one entry 
per PO; it seems to enough to duplicate the above implementation 
components for each entry. Still needs no queues of tasks, as long as 
Max_Entry_Queue_Length => 1.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-13 18:08     ` Niklas Holsti
@ 2015-02-13 19:01       ` Simon Wright
  2015-02-13 23:45       ` Georg Bauhaus
  1 sibling, 0 replies; 20+ messages in thread
From: Simon Wright @ 2015-02-13 19:01 UTC (permalink / raw)

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> AIUI, the Ravenscar rules mean that in a PO with an entry, the entry
> is basically implemented by:
> - a Boolean flag, the barrier for the entry
> - one pointer to the (at most one) task waiting on the closed entry.
>
> It would IMO not be a great complication to allow more than one entry
> per PO; it seems to enough to duplicate the above implementation
> components for each entry. Still needs no queues of tasks, as long as
> Max_Entry_Queue_Length => 1.

I think that would be OK in principle, but GNAT uses - for example - the
type System.Tasking.Protected_Object.Single_Entry.Protection_Entry, so
might get a little confused if asked to support more than one!

As for the barrier, which has to be a plain Boolean in Ravenscar (not
even "not Closed"), that's implemented using
System.Tasking.Protected_Objects.Entry_Body, which is a record
containing two accesses-to-subprogram; 'Barrier' designates a function
returning Boolean, and 'Action' implements the body of the entry. So I
think one could have more complex conditions at negligible cost.

GNAT has Profile (Restricted), see s-rident.ads; the restrictions added
to Restricted to make Ravenscar are (at 4.9.1, haven't checked whether
this has changed in 5)

   No_Calendar                      
   No_Implicit_Heap_Allocations     
   No_Local_Timing_Events           
   No_Relative_Delay                
   No_Select_Statements             
   No_Specific_Termination_Handlers 
   No_Task_Termination              
   Simple_Barriers                  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-13 18:08     ` Niklas Holsti
  2015-02-13 19:01       ` Simon Wright
@ 2015-02-13 23:45       ` Georg Bauhaus
  1 sibling, 0 replies; 20+ messages in thread
From: Georg Bauhaus @ 2015-02-13 23:45 UTC (permalink / raw)


On 13.02.15 19:08, Niklas Holsti wrote:
> On 15-02-13 14:41 , G.B. wrote:
>> On 12.02.15 22:28, Niklas Holsti wrote:
>>> On 15-02-12 22:25 , Patrick Noffke wrote:
>>
>>>> According to
>>>> http://docs.adacore.com/gnathie_ug-docs/html/gnathie_ug/gnathie_ug/the_predefined_profiles.html#ada-restrictions-in-the-ravenscar-profiles,
>>>>
>>>>
>>>> at most one task may be queued on an entry.  I take this to mean *on
>>>> a single entry* and that two tasks may be simultaneously queued on
>>>> separate entries.

> Two or more tasks can be simultaneously queued (i.e. blocked) on separate entries, as long as no more than one task is queued on any given entry. These separate entries must be in as many separate protected objects, because each PO can have at most one entry.

I guess I had misread "simultaneously queued on separate entries"
to mean the separate queues of a PO with multiple entries.
Likely because if going to the stadium, "standing in queues at
separate entries" won't usually make me think of one queue being in
Liverpool, Anfield and the other in Barcelona, Camp Nou.
I'm sorry. It now will.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-12 20:25 Ravenscar and context switching for Cortex-M4 Patrick Noffke
  2015-02-12 21:28 ` Niklas Holsti
@ 2015-02-16 16:27 ` Patrick Noffke
  2015-02-16 16:34   ` Patrick Noffke
  2015-02-16 21:28   ` Simon Wright
  1 sibling, 2 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-02-16 16:27 UTC (permalink / raw)

On Thursday, February 12, 2015 at 2:25:40 PM UTC-6, Patrick Noffke wrote:
> I am porting the GNAT Ravenscar-sfp runtime to work with the TI TM4C MCU, using the STM32F4 implementation as a starting point.  I am having a problem where one of two tasks blocked on entries (each in separate protected objects) is not getting activated.
> 
> Here is the situation:
> 
> I have two POs, each with an interrupt handler and an entry.  PO1 services UART interrupts and PO2 services SPI interrupts.  When each interrupt fires, the barrier in the corresponding entry is released.
> 
> Then I have two tasks, T1 and T2.  T1 is blocked on PO1's entry and T2 is blocked on PO2's entry.
> 

I have changed things up a bit now just to see if I can get any more insight.  Now T2 is the main program (no longer an Ada task, but AFAICT the runtime doesn't distinguish between main and a task with regard to context switching).

> The problem happens when the interrupts happen very near in time to each other.  Then, both entries are still executed, but only one task runs after the last entry completes.  I have instrumented the run-time as well as my code to toggle GPIOs and see what's happening when.  Here is the timing that fails:
> 
> 1. SPI interrupt triggers.
> 2. Interrupt_Request_Handler in s-bbcppr-armv7m.adb is executed.  It calls my interrupt handler or PO2, followed by the entry (entry called in the interrupt context), and triggers a context switch for T2.
> 3. Before the task T2 can run, the UART interrupt triggers.
> 4. Interrupt_Request_Handler calls my interrupt handler for PO1 and its entry, and triggers a context switch for T1.
> 5. T1 then runs until it blocks again on the entry for PO1.
> 
> T2 never runs.  Furthermore, after this occurs, the entry for PO2 is never executed again (though its interrupt handler is).  T2 also never runs again.
> 

Here's what happens now (the order of the interrupts may change between runs, but this is for one capture):

1. UART interrupt triggers.
2. PO1's entry executes.
3. SPI interrupt triggers twice (see below).
4. PO2's entry executes.
5. T1 (UART task) executes.  This is the first thing wrong.  T2 is higher priority than T1 so T2 should run first.
6. T2 (SPI task) executes twice.  Upon the second execution, I get a program error because Object.Entry_Queue is null.  The exception is raised in s-tposen-raven.adb (line 167 in my copy) in Protected_Single_Entry_Call.

This may be relevant -- the SPI interrupt triggers twice.  This is because the interrupt is for a DMA completion, and it fires both when TX and RX complete (since it's SPI, they complete at the same time).  I take care in my interrupt handler to release the entry from only one of the two interrupts.  Perhaps with the interrupt firing twice, the runtime may get confused and activate the task twice (even though the entry only executes once).  But for the above run, the entry was released during the second SPI interrupt.

Please let me know if you have any suggestions.

Best regards,
Patrick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-16 16:27 ` Patrick Noffke
@ 2015-02-16 16:34   ` Patrick Noffke
  2015-02-16 21:28   ` Simon Wright
  1 sibling, 0 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-02-16 16:34 UTC (permalink / raw)


On Monday, February 16, 2015 at 10:28:00 AM UTC-6, Patrick Noffke wrote:
> 6. T2 (SPI task) executes twice.  Upon the second execution, I get a program error because Object.Entry_Queue is null.  The exception is raised in s-tposen-raven.adb (line 167 in my copy) in Protected_Single_Entry_Call.

Just to clarify -- the exception is because Object.Entry_Queue is *not* null.

Patrick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-16 16:27 ` Patrick Noffke
  2015-02-16 16:34   ` Patrick Noffke
@ 2015-02-16 21:28   ` Simon Wright
  2015-02-19 20:14     ` Patrick Noffke
  2015-08-06 21:05     ` Patrick Noffke
  1 sibling, 2 replies; 20+ messages in thread
From: Simon Wright @ 2015-02-16 21:28 UTC (permalink / raw)


Patrick Noffke <patrick.noffke@gmail.com> writes: 
 
> Here's what happens now (the order of the interrupts may change 
> between runs, but this is for one capture): 
> 
> 1. UART interrupt triggers.  2. PO1's entry executes. 
 
because the entry body is executed in interrupt context. See 
below. 
 
> 3. SPI interrupt triggers twice (see below).  4. PO2's entry 
> executes.  5. T1 (UART task) executes.  This is the first thing 
> wrong.  T2 is higher priority than T1 so T2 should run first. 
> 6. T2 (SPI task) executes twice.  Upon the second execution, I 
> get a program error because Object.Entry_Queue is null.  The 
> exception is 
 
Entry_Queue is *not* null, as you said in the next post. 
 
> raised in s-tposen-raven.adb (line 167 in my copy) in 
> Protected_Single_Entry_Call. 
> 
> This may be relevant -- the SPI interrupt triggers twice.  This 
> is because the interrupt is for a DMA completion, and it fires 
> both when TX and RX complete (since it's SPI, they complete at 
> the same time).  I take care in my interrupt handler to release 
> the entry from only one of the two interrupts.  Perhaps with the 
> interrupt firing twice, the runtime may get confused and 
> activate the task twice (even though the entry only executes 
> once).  But for the above run, the entry was released during the 
> second SPI interrupt. 
 
The RTS does this (I hope I have it right): 
 
   The entry call (Protected_Single_Entry_Call):

     locks the entry
     if the barrier is open then
       asserts that Call_In_Progress isn't set
       sets Call_In_Progress
       calls the entry body wrapper
       clears Call_In_Progress
       unlocks the entry
     else
       if the Entry_Queue isn't null then
         unlocks the entry
         raises PE
       end if
       sets the Entry_Queue
       unlocks the entry
       sleeps
     end if

   The handler wrapper:

     locks the entry
     calls another wrapper for the handler itself
     calls Service_Entry
     exits

   Service_Entry:
     if the Entry_Queue is set and the barrier is open then
       clears the Entry_Queue
       asserts that Call_In_Progress isn't set
       sets Call_In_Progress
       calls the entry body wrapper
       clears Call_In_Progress
       saves the caller task_id
       unlocks the entry
       wakes the caller
     else
       unlocks the entry
     end if

I really don't see how the sequnece you describe happens!

One thing that puzzles me is the locking/unlocking of the entry: this is
done (in that RTS) by raising the caller task's priority to the ceiling
priority of the task, if necessary. So what about interrupts? And when
the handler wrapper (you can see this by compiling the package with the
PO in with -gnatdg) locks the entry, it seems to raise the current
task's priority, where the current task has nothing to do with the PO at
all!

My version of lock does

   procedure Lock (Object : Protection_Access) is
   begin
      if FreeRTOS.Tasks.In_ISR then
         null;
      elsif Object.Ceiling in System.Interrupt_Priority then
         FreeRTOS.Tasks.Disable_Interrupts;
      else
        --  do ceiling priority stuff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-16 21:28   ` Simon Wright
@ 2015-02-19 20:14     ` Patrick Noffke
  2015-02-19 21:03       ` Bob Duff
  2015-02-19 22:13       ` Patrick Noffke
  2015-08-06 21:05     ` Patrick Noffke
  1 sibling, 2 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-02-19 20:14 UTC (permalink / raw)


On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:
> Patrick Noffke writes: 
>  
> > Here's what happens now (the order of the interrupts may change 
> > between runs, but this is for one capture): 
> > 
> > 1. UART interrupt triggers.  2. PO1's entry executes. 
>  
> because the entry body is executed in interrupt context. See 
> below. 
>  
> > 3. SPI interrupt triggers twice (see below).  4. PO2's entry 
> > executes.  5. T1 (UART task) executes.  This is the first thing 
> > wrong.  T2 is higher priority than T1 so T2 should run first. 
> > 6. T2 (SPI task) executes twice.  Upon the second execution, I 
> > get a program error because Object.Entry_Queue is null.  The 
> > exception is 
>  
> Entry_Queue is *not* null, as you said in the next post. 
>  
> > raised in s-tposen-raven.adb (line 167 in my copy) in 
> > Protected_Single_Entry_Call. 
> > 
> > This may be relevant -- the SPI interrupt triggers twice.  This 
> > is because the interrupt is for a DMA completion, and it fires 
> > both when TX and RX complete (since it's SPI, they complete at 
> > the same time).  I take care in my interrupt handler to release 
> > the entry from only one of the two interrupts.  Perhaps with the 
> > interrupt firing twice, the runtime may get confused and 
> > activate the task twice (even though the entry only executes 
> > once).  But for the above run, the entry was released during the 
> > second SPI interrupt. 
>  
> The RTS does this (I hope I have it right): 
>  
>    The entry call (Protected_Single_Entry_Call):
> 
>      locks the entry
>      if the barrier is open then
>        asserts that Call_In_Progress isn't set
>        sets Call_In_Progress
>        calls the entry body wrapper
>        clears Call_In_Progress
>        unlocks the entry
>      else
>        if the Entry_Queue isn't null then
>          unlocks the entry
>          raises PE
>        end if
>        sets the Entry_Queue
>        unlocks the entry
>        sleeps
>      end if
> 
>    The handler wrapper:
> 
>      locks the entry
>      calls another wrapper for the handler itself
>      calls Service_Entry
>      exits
> 
>    Service_Entry:
>      if the Entry_Queue is set and the barrier is open then
>        clears the Entry_Queue
>        asserts that Call_In_Progress isn't set
>        sets Call_In_Progress
>        calls the entry body wrapper
>        clears Call_In_Progress
>        saves the caller task_id
>        unlocks the entry
>        wakes the caller
>      else
>        unlocks the entry
>      end if
> 
> I really don't see how the sequnece you describe happens!
> 
> One thing that puzzles me is the locking/unlocking of the entry: this is
> done (in that RTS) by raising the caller task's priority to the ceiling
> priority of the task, if necessary. So what about interrupts? And when
> the handler wrapper (you can see this by compiling the package with the
> PO in with -gnatdg) locks the entry, it seems to raise the current
> task's priority, where the current task has nothing to do with the PO at
> all!
> 

Thank you for all this!  It helps a lot.  I didn't know about -gnatdg -- very useful.

I suspect the problem may stem from the fact that Leave_Kernel in s-bbprot.adb can insert a suspended task into the thread queue.  I am guessing somehow this is tripping up the runtime when multiple tasks become runnable at the same time.

I stepped through the debugger at startup, and I can see the suspended task going into the queue.  Then another task is woken up and put at the front of the queue, so the first task is Runnable and the Next task is Suspended.  Then when Leave_Kernel resumes running (it enables interrupts after inserting the suspended task) it may not call Extract (since the running thread state is Runnable) -- it never considers that the Suspended task it inserted might be later in the queue.

I haven't yet been able to directly correlate the Leave_Kernel behavior with the task incorrectly waking up twice.  I'll keep trying things, but I wanted to share what I found so far.

What I have confirmed is that I can get the system into a state where there are two Runnable tasks in the thread queue, and the "Next" field of the last one points to the first one.  That is:

First_Thread_Table (CPU_Id) = First_Thread_Table (CPU_Id).Next.Next

Right before this happens is when the two tasks are woken up at the same time.

It appears the task that runs twice (when I get the Program_Error I reported earlier) is the one that gets put in the queue in the suspended state at startup (an empirical data point after changing priorities of my tasks).

Best regards,
Pat


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 20:14     ` Patrick Noffke
@ 2015-02-19 21:03       ` Bob Duff
  2015-02-20 13:05         ` Simon Wright
  2015-02-19 22:13       ` Patrick Noffke
  1 sibling, 1 reply; 20+ messages in thread
From: Bob Duff @ 2015-02-19 21:03 UTC (permalink / raw)


Patrick Noffke <patrick.noffke@gmail.com> writes:

> Thank you for all this!  It helps a lot.  I didn't know about -gnatdg
> -- very useful.

Also try -gnatDG.

- Bob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 20:14     ` Patrick Noffke
  2015-02-19 21:03       ` Bob Duff
@ 2015-02-19 22:13       ` Patrick Noffke
  2015-02-19 22:44         ` Patrick Noffke
  1 sibling, 1 reply; 20+ messages in thread
From: Patrick Noffke @ 2015-02-19 22:13 UTC (permalink / raw)


On Thursday, February 19, 2015 at 2:14:45 PM UTC-6, Patrick Noffke wrote:
> On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:
> > Patrick Noffke writes: 
> >  
> > > Here's what happens now (the order of the interrupts may change 
> > > between runs, but this is for one capture): 
> > > 
> > > 1. UART interrupt triggers.  2. PO1's entry executes. 
> >  
> > because the entry body is executed in interrupt context. See 
> > below. 
> >  
> > > 3. SPI interrupt triggers twice (see below).  4. PO2's entry 
> > > executes.  5. T1 (UART task) executes.  This is the first thing 
> > > wrong.  T2 is higher priority than T1 so T2 should run first. 
> > > 6. T2 (SPI task) executes twice.  Upon the second execution, I 
> > > get a program error because Object.Entry_Queue is null.  The 
> > > exception is 
> >  
> > Entry_Queue is *not* null, as you said in the next post. 
> >  
> > > raised in s-tposen-raven.adb (line 167 in my copy) in 
> > > Protected_Single_Entry_Call. 
> > > 
> > > This may be relevant -- the SPI interrupt triggers twice.  This 
> > > is because the interrupt is for a DMA completion, and it fires 
> > > both when TX and RX complete (since it's SPI, they complete at 
> > > the same time).  I take care in my interrupt handler to release 
> > > the entry from only one of the two interrupts.  Perhaps with the 
> > > interrupt firing twice, the runtime may get confused and 
> > > activate the task twice (even though the entry only executes 
> > > once).  But for the above run, the entry was released during the 
> > > second SPI interrupt. 
> >  
> > The RTS does this (I hope I have it right): 
> >  
> >    The entry call (Protected_Single_Entry_Call):
> > 
> >      locks the entry
> >      if the barrier is open then
> >        asserts that Call_In_Progress isn't set
> >        sets Call_In_Progress
> >        calls the entry body wrapper
> >        clears Call_In_Progress
> >        unlocks the entry
> >      else
> >        if the Entry_Queue isn't null then
> >          unlocks the entry
> >          raises PE
> >        end if
> >        sets the Entry_Queue
> >        unlocks the entry
> >        sleeps
> >      end if
> > 
> >    The handler wrapper:
> > 
> >      locks the entry
> >      calls another wrapper for the handler itself
> >      calls Service_Entry
> >      exits
> > 
> >    Service_Entry:
> >      if the Entry_Queue is set and the barrier is open then
> >        clears the Entry_Queue
> >        asserts that Call_In_Progress isn't set
> >        sets Call_In_Progress
> >        calls the entry body wrapper
> >        clears Call_In_Progress
> >        saves the caller task_id
> >        unlocks the entry
> >        wakes the caller
> >      else
> >        unlocks the entry
> >      end if
> > 
> > I really don't see how the sequnece you describe happens!
> > 
> > One thing that puzzles me is the locking/unlocking of the entry: this is
> > done (in that RTS) by raising the caller task's priority to the ceiling
> > priority of the task, if necessary. So what about interrupts? And when
> > the handler wrapper (you can see this by compiling the package with the
> > PO in with -gnatdg) locks the entry, it seems to raise the current
> > task's priority, where the current task has nothing to do with the PO at
> > all!
> > 
> 
> Thank you for all this!  It helps a lot.  I didn't know about -gnatdg -- very useful.
> 
> I suspect the problem may stem from the fact that Leave_Kernel in s-bbprot.adb can insert a suspended task into the thread queue.  I am guessing somehow this is tripping up the runtime when multiple tasks become runnable at the same time.
> 
> I stepped through the debugger at startup, and I can see the suspended task going into the queue.  Then another task is woken up and put at the front of the queue, so the first task is Runnable and the Next task is Suspended.  Then when Leave_Kernel resumes running (it enables interrupts after inserting the suspended task) it may not call Extract (since the running thread state is Runnable) -- it never considers that the Suspended task it inserted might be later in the queue.
> 
> I haven't yet been able to directly correlate the Leave_Kernel behavior with the task incorrectly waking up twice.  I'll keep trying things, but I wanted to share what I found so far.
> 
> What I have confirmed is that I can get the system into a state where there are two Runnable tasks in the thread queue, and the "Next" field of the last one points to the first one.  That is:
> 
> First_Thread_Table (CPU_Id) = First_Thread_Table (CPU_Id).Next.Next
> 
> Right before this happens is when the two tasks are woken up at the same time.
> 
> It appears the task that runs twice (when I get the Program_Error I reported earlier) is the one that gets put in the queue in the suspended state at startup (an empirical data point after changing priorities of my tasks).
> 

I think I know what's going on.  There is a problem with the Insert procedure in s-bbthqu.adb.  If (1) a thread is already in the queue, (2) it is not at the head of the queue, and (3) the priority is greater than the thread at the head of the queue, it can get inserted twice.  The error is in the "elsif" condition of this code:

     if First_Thread_Table (CPU_Id) = Thread then
         null;

      --  Insert at the head of queue if there is no other thread with a higher
      --  priority.

      elsif First_Thread_Table (CPU_Id) = Null_Thread_Id
        or else
          Thread.Active_Priority > First_Thread_Table (CPU_Id).Active_Priority
      then
         Thread.Next := First_Thread_Table (CPU_Id);
         First_Thread_Table (CPU_Id) := Thread;

      --  Middle or tail insertion

      else
         --  Look for the Aux_Pointer to insert the thread just after it
         ...

Here is what's happening in my case:

(1) UART is at the head of the queue in the suspended state with priority 10.  Leave_Kernel code is pending after calling Enable_Interrupts.
(2) SPI interrupt fires.
(3) SPI task priority is 252 (interrupt priority).
(4) SPI task gets inserted at head of queue in the Runnable state.
(5) SPI task priority gets adjusted to 15 when Unlock_Entry is called.  It is left at the head of the queue.  SPI_task.Next = UART_task.
(6) UART interrupt fires.
(7) UART task priority is 250, and task is set to Runnable.
(8) UART task gets inserted at head of queue since its priority is greater than SPI_task.  Now UART_task.Next = SPI_task, and SPI_task.Next = UART_task.
(9) UART task priority is adjusted to 10 when Unlock_Entry is called.  Now SPI_task is head of queue, and SPI_task.Next = UART_task.  UART_task.Next = UART_task.
(10) SPI task runs.
(11) UART task runs twice.  Boom.

I think the Insert procedure needs to be modified to look through the entire queue for the thread to be inserted.  Might be simplest to just remove it if it exists before entering the if statement.

Pat

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 22:13       ` Patrick Noffke
@ 2015-02-19 22:44         ` Patrick Noffke
  2015-02-20  8:31           ` Simon Wright
  2015-06-24 15:20           ` Patrick Noffke
  0 siblings, 2 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-02-19 22:44 UTC (permalink / raw)


On Thursday, February 19, 2015 at 4:13:51 PM UTC-6, Patrick Noffke wrote:
> On Thursday, February 19, 2015 at 2:14:45 PM UTC-6, Patrick Noffke wrote:
> > On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:
> > > Patrick Noffke writes: 
> > >  
> > > > Here's what happens now (the order of the interrupts may change 
> > > > between runs, but this is for one capture): 
> > > > 
> > > > 1. UART interrupt triggers.  2. PO1's entry executes. 
> > >  
> > > because the entry body is executed in interrupt context. See 
> > > below. 
> > >  
> > > > 3. SPI interrupt triggers twice (see below).  4. PO2's entry 
> > > > executes.  5. T1 (UART task) executes.  This is the first thing 
> > > > wrong.  T2 is higher priority than T1 so T2 should run first. 
> > > > 6. T2 (SPI task) executes twice.  Upon the second execution, I 
> > > > get a program error because Object.Entry_Queue is null.  The 
> > > > exception is 
> > >  
> > > Entry_Queue is *not* null, as you said in the next post. 
> > >  
> > > > raised in s-tposen-raven.adb (line 167 in my copy) in 
> > > > Protected_Single_Entry_Call. 
> > > > 
> > > > This may be relevant -- the SPI interrupt triggers twice.  This 
> > > > is because the interrupt is for a DMA completion, and it fires 
> > > > both when TX and RX complete (since it's SPI, they complete at 
> > > > the same time).  I take care in my interrupt handler to release 
> > > > the entry from only one of the two interrupts.  Perhaps with the 
> > > > interrupt firing twice, the runtime may get confused and 
> > > > activate the task twice (even though the entry only executes 
> > > > once).  But for the above run, the entry was released during the 
> > > > second SPI interrupt. 
> > >  
> > > The RTS does this (I hope I have it right): 
> > >  
> > >    The entry call (Protected_Single_Entry_Call):
> > > 
> > >      locks the entry
> > >      if the barrier is open then
> > >        asserts that Call_In_Progress isn't set
> > >        sets Call_In_Progress
> > >        calls the entry body wrapper
> > >        clears Call_In_Progress
> > >        unlocks the entry
> > >      else
> > >        if the Entry_Queue isn't null then
> > >          unlocks the entry
> > >          raises PE
> > >        end if
> > >        sets the Entry_Queue
> > >        unlocks the entry
> > >        sleeps
> > >      end if
> > > 
> > >    The handler wrapper:
> > > 
> > >      locks the entry
> > >      calls another wrapper for the handler itself
> > >      calls Service_Entry
> > >      exits
> > > 
> > >    Service_Entry:
> > >      if the Entry_Queue is set and the barrier is open then
> > >        clears the Entry_Queue
> > >        asserts that Call_In_Progress isn't set
> > >        sets Call_In_Progress
> > >        calls the entry body wrapper
> > >        clears Call_In_Progress
> > >        saves the caller task_id
> > >        unlocks the entry
> > >        wakes the caller
> > >      else
> > >        unlocks the entry
> > >      end if
> > > 
> > > I really don't see how the sequnece you describe happens!
> > > 
> > > One thing that puzzles me is the locking/unlocking of the entry: this is
> > > done (in that RTS) by raising the caller task's priority to the ceiling
> > > priority of the task, if necessary. So what about interrupts? And when
> > > the handler wrapper (you can see this by compiling the package with the
> > > PO in with -gnatdg) locks the entry, it seems to raise the current
> > > task's priority, where the current task has nothing to do with the PO at
> > > all!
> > > 
> > 
> > Thank you for all this!  It helps a lot.  I didn't know about -gnatdg -- very useful.
> > 
> > I suspect the problem may stem from the fact that Leave_Kernel in s-bbprot.adb can insert a suspended task into the thread queue.  I am guessing somehow this is tripping up the runtime when multiple tasks become runnable at the same time.
> > 
> > I stepped through the debugger at startup, and I can see the suspended task going into the queue.  Then another task is woken up and put at the front of the queue, so the first task is Runnable and the Next task is Suspended.  Then when Leave_Kernel resumes running (it enables interrupts after inserting the suspended task) it may not call Extract (since the running thread state is Runnable) -- it never considers that the Suspended task it inserted might be later in the queue.
> > 
> > I haven't yet been able to directly correlate the Leave_Kernel behavior with the task incorrectly waking up twice.  I'll keep trying things, but I wanted to share what I found so far.
> > 
> > What I have confirmed is that I can get the system into a state where there are two Runnable tasks in the thread queue, and the "Next" field of the last one points to the first one.  That is:
> > 
> > First_Thread_Table (CPU_Id) = First_Thread_Table (CPU_Id).Next.Next
> > 
> > Right before this happens is when the two tasks are woken up at the same time.
> > 
> > It appears the task that runs twice (when I get the Program_Error I reported earlier) is the one that gets put in the queue in the suspended state at startup (an empirical data point after changing priorities of my tasks).
> > 
> 
> I think I know what's going on.  There is a problem with the Insert procedure in s-bbthqu.adb.  If (1) a thread is already in the queue, (2) it is not at the head of the queue, and (3) the priority is greater than the thread at the head of the queue, it can get inserted twice.  The error is in the "elsif" condition of this code:
> 
>      if First_Thread_Table (CPU_Id) = Thread then
>          null;
> 
>       --  Insert at the head of queue if there is no other thread with a higher
>       --  priority.
> 
>       elsif First_Thread_Table (CPU_Id) = Null_Thread_Id
>         or else
>           Thread.Active_Priority > First_Thread_Table (CPU_Id).Active_Priority
>       then
>          Thread.Next := First_Thread_Table (CPU_Id);
>          First_Thread_Table (CPU_Id) := Thread;
> 
>       --  Middle or tail insertion
> 
>       else
>          --  Look for the Aux_Pointer to insert the thread just after it
>          ...
> 
> Here is what's happening in my case:
> 
> (1) UART is at the head of the queue in the suspended state with priority 10.  Leave_Kernel code is pending after calling Enable_Interrupts.
> (2) SPI interrupt fires.
> (3) SPI task priority is 252 (interrupt priority).
> (4) SPI task gets inserted at head of queue in the Runnable state.
> (5) SPI task priority gets adjusted to 15 when Unlock_Entry is called.  It is left at the head of the queue.  SPI_task.Next = UART_task.
> (6) UART interrupt fires.
> (7) UART task priority is 250, and task is set to Runnable.
> (8) UART task gets inserted at head of queue since its priority is greater than SPI_task.  Now UART_task.Next = SPI_task, and SPI_task.Next = UART_task.
> (9) UART task priority is adjusted to 10 when Unlock_Entry is called.  Now SPI_task is head of queue, and SPI_task.Next = UART_task.  UART_task.Next = UART_task.
> (10) SPI task runs.
> (11) UART task runs twice.  Boom.
> 
> I think the Insert procedure needs to be modified to look through the entire queue for the thread to be inserted.  Might be simplest to just remove it if it exists before entering the if statement.
> 

This version of the entire Insert procedure works for me:

   ------------
   -- Insert --
   ------------

   procedure Insert (Thread : Thread_Id) is
      Aux_Pointer : Thread_Id;
      CPU_Id      : constant CPU := Get_CPU (Thread);

   begin

      --  ??? This pragma is disabled because the Tasks_Activated only
      --  represents the end of activation for one package not all the
      --  packages. We have to find a better milestone for the end of
      --  tasks activation.

      --  --  A CPU can only insert alarm in its own queue, except during
      --  --  initialization.

      --  pragma Assert (CPU_Id = Current_CPU or else not Tasks_Activated);

      --  It may be the case that we try to insert a task that is already in
      --  the queue. This can only happen if the task was not runnable and its
      --  context was being used for handling an interrupt. Hence, if the task
      --  is already in the queue and we try to insert it, we need to check
      --  whether it is in the correct place.

      --  No insertion if the task is already at the head of the queue

      if First_Thread_Table (CPU_Id) = Thread then
         null;

         --  Insert at the head of queue if there is no other thread
         --  with a higher priority.

      elsif First_Thread_Table (CPU_Id) = Null_Thread_Id then
         Thread.Next := First_Thread_Table (CPU_Id);
         First_Thread_Table (CPU_Id) := Thread;

      else
         --  Middle or tail insertion

         --  Remove the thread if it is already in the queue.  We know
         --  the first thread is not null.
         Aux_Pointer := First_Thread_Table (CPU_Id);
         while Aux_Pointer.Next /= Null_Thread_Id
           and then Aux_Pointer.Next /= Thread
         loop
            Aux_Pointer := Aux_Pointer.Next;
         end loop;

         if Aux_Pointer.Next = Thread then
            Aux_Pointer.Next := Thread.Next;
         end if;

         if
           Thread.Active_Priority > First_Thread_Table (CPU_Id).Active_Priority
         then
            Thread.Next := First_Thread_Table (CPU_Id);
            First_Thread_Table (CPU_Id) := Thread;
         else
            --  Look for the Aux_Pointer to insert the thread just after it

            Aux_Pointer := First_Thread_Table (CPU_Id);
            while Aux_Pointer.Next /= Null_Thread_Id
              and then Aux_Pointer.Next.Active_Priority >=
              Thread.Active_Priority
            loop
               Aux_Pointer := Aux_Pointer.Next;
            end loop;

            --  Insert the thread after the Aux_Pointer

            Thread.Next := Aux_Pointer.Next;
            Aux_Pointer.Next := Thread;
         end if;

      end if;
   end Insert;




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 22:44         ` Patrick Noffke
@ 2015-02-20  8:31           ` Simon Wright
  2015-06-24 15:20           ` Patrick Noffke
  1 sibling, 0 replies; 20+ messages in thread
From: Simon Wright @ 2015-02-20  8:31 UTC (permalink / raw)


Patrick Noffke <patrick.noffke@gmail.com> writes:

> This version of the entire Insert procedure works for me:

Great!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 21:03       ` Bob Duff
@ 2015-02-20 13:05         ` Simon Wright
  0 siblings, 0 replies; 20+ messages in thread
From: Simon Wright @ 2015-02-20 13:05 UTC (permalink / raw)

Bob Duff <bobduff@theworld.com> writes:

> Patrick Noffke <patrick.noffke@gmail.com> writes:
>
>> Thank you for all this!  It helps a lot.  I didn't know about -gnatdg
>> -- very useful.
>
> Also try -gnatDG.

-gnatG ?

According to gnatmake -h,

  -gnatD    Debug expanded generated code (max line length = 72)
  -gnatDnn  Debug expanded generated code (max line length = nn)

and (GCC 4.9.1; not GNAT GPL 2014 or GCC 5.0.0) both -gnatD and -gnatDG
resulted in

 buttons.adb:1071:07: violation of restriction "No_Implicit_Heap_Allocations"
 buttons.adb:1071:07: from profile "Ravenscar" at system.ads:41

where (a) buttons.adb is nowhere like that long, and (b) I don't see
anything at that line in the output of -gnatdg or -gnatG to trigger the
error.

Mind you, there is something very odd about that restriction; I had to
restate the restriction at the start of buttons.adb to prevent GCC 4.9.1
and GNAT GPL 2014 thinking that it was
violated. https://sourceforge.net/p/stm32f4-gnat-rts/tickets/11/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-19 22:44         ` Patrick Noffke
  2015-02-20  8:31           ` Simon Wright
@ 2015-06-24 15:20           ` Patrick Noffke
  1 sibling, 0 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-06-24 15:20 UTC (permalink / raw)


On Thursday, February 19, 2015 at 4:45:00 PM UTC-6, Patrick Noffke wrote:
> On Thursday, February 19, 2015 at 4:13:51 PM UTC-6, Patrick Noffke wrote:
> > On Thursday, February 19, 2015 at 2:14:45 PM UTC-6, Patrick Noffke wrote:
> > > On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:
> > > > Patrick Noffke writes: 
> > > >  
> > > > > Here's what happens now (the order of the interrupts may change 
> > > > > between runs, but this is for one capture): 
> > > > > 
> > > > > 1. UART interrupt triggers.  2. PO1's entry executes. 
> > > >  
> > > > because the entry body is executed in interrupt context. See 
> > > > below. 
> > > >  
> > > > > 3. SPI interrupt triggers twice (see below).  4. PO2's entry 
> > > > > executes.  5. T1 (UART task) executes.  This is the first thing 
> > > > > wrong.  T2 is higher priority than T1 so T2 should run first. 
> > > > > 6. T2 (SPI task) executes twice.  Upon the second execution, I 
> > > > > get a program error because Object.Entry_Queue is null.  The 
> > > > > exception is 
> > > >  
> > > > Entry_Queue is *not* null, as you said in the next post. 
> > > >  
> > > > > raised in s-tposen-raven.adb (line 167 in my copy) in 
> > > > > Protected_Single_Entry_Call. 
> > > > > 
> > > > > This may be relevant -- the SPI interrupt triggers twice.  This 
> > > > > is because the interrupt is for a DMA completion, and it fires 
> > > > > both when TX and RX complete (since it's SPI, they complete at 
> > > > > the same time).  I take care in my interrupt handler to release 
> > > > > the entry from only one of the two interrupts.  Perhaps with the 
> > > > > interrupt firing twice, the runtime may get confused and 
> > > > > activate the task twice (even though the entry only executes 
> > > > > once).  But for the above run, the entry was released during the 
> > > > > second SPI interrupt. 
> > > >  
> > > > The RTS does this (I hope I have it right): 
> > > >  
> > > >    The entry call (Protected_Single_Entry_Call):
> > > > 
> > > >      locks the entry
> > > >      if the barrier is open then
> > > >        asserts that Call_In_Progress isn't set
> > > >        sets Call_In_Progress
> > > >        calls the entry body wrapper
> > > >        clears Call_In_Progress
> > > >        unlocks the entry
> > > >      else
> > > >        if the Entry_Queue isn't null then
> > > >          unlocks the entry
> > > >          raises PE
> > > >        end if
> > > >        sets the Entry_Queue
> > > >        unlocks the entry
> > > >        sleeps
> > > >      end if
> > > > 
> > > >    The handler wrapper:
> > > > 
> > > >      locks the entry
> > > >      calls another wrapper for the handler itself
> > > >      calls Service_Entry
> > > >      exits
> > > > 
> > > >    Service_Entry:
> > > >      if the Entry_Queue is set and the barrier is open then
> > > >        clears the Entry_Queue
> > > >        asserts that Call_In_Progress isn't set
> > > >        sets Call_In_Progress
> > > >        calls the entry body wrapper
> > > >        clears Call_In_Progress
> > > >        saves the caller task_id
> > > >        unlocks the entry
> > > >        wakes the caller
> > > >      else
> > > >        unlocks the entry
> > > >      end if
> > > > 
> > > > I really don't see how the sequnece you describe happens!
> > > > 
> > > > One thing that puzzles me is the locking/unlocking of the entry: this is
> > > > done (in that RTS) by raising the caller task's priority to the ceiling
> > > > priority of the task, if necessary. So what about interrupts? And when
> > > > the handler wrapper (you can see this by compiling the package with the
> > > > PO in with -gnatdg) locks the entry, it seems to raise the current
> > > > task's priority, where the current task has nothing to do with the PO at
> > > > all!
> > > > 
> > > 
> > > Thank you for all this!  It helps a lot.  I didn't know about -gnatdg -- very useful.
> > > 
> > > I suspect the problem may stem from the fact that Leave_Kernel in s-bbprot.adb can insert a suspended task into the thread queue.  I am guessing somehow this is tripping up the runtime when multiple tasks become runnable at the same time.
> > > 
> > > I stepped through the debugger at startup, and I can see the suspended task going into the queue.  Then another task is woken up and put at the front of the queue, so the first task is Runnable and the Next task is Suspended.  Then when Leave_Kernel resumes running (it enables interrupts after inserting the suspended task) it may not call Extract (since the running thread state is Runnable) -- it never considers that the Suspended task it inserted might be later in the queue.
> > > 
> > > I haven't yet been able to directly correlate the Leave_Kernel behavior with the task incorrectly waking up twice.  I'll keep trying things, but I wanted to share what I found so far.
> > > 
> > > What I have confirmed is that I can get the system into a state where there are two Runnable tasks in the thread queue, and the "Next" field of the last one points to the first one.  That is:
> > > 
> > > First_Thread_Table (CPU_Id) = First_Thread_Table (CPU_Id).Next.Next
> > > 
> > > Right before this happens is when the two tasks are woken up at the same time.
> > > 
> > > It appears the task that runs twice (when I get the Program_Error I reported earlier) is the one that gets put in the queue in the suspended state at startup (an empirical data point after changing priorities of my tasks).
> > > 
> > 
> > I think I know what's going on.  There is a problem with the Insert procedure in s-bbthqu.adb.  If (1) a thread is already in the queue, (2) it is not at the head of the queue, and (3) the priority is greater than the thread at the head of the queue, it can get inserted twice.  The error is in the "elsif" condition of this code:
> > 
> >      if First_Thread_Table (CPU_Id) = Thread then
> >          null;
> > 
> >       --  Insert at the head of queue if there is no other thread with a higher
> >       --  priority.
> > 
> >       elsif First_Thread_Table (CPU_Id) = Null_Thread_Id
> >         or else
> >           Thread.Active_Priority > First_Thread_Table (CPU_Id).Active_Priority
> >       then
> >          Thread.Next := First_Thread_Table (CPU_Id);
> >          First_Thread_Table (CPU_Id) := Thread;
> > 
> >       --  Middle or tail insertion
> > 
> >       else
> >          --  Look for the Aux_Pointer to insert the thread just after it
> >          ...
> > 
> > Here is what's happening in my case:
> > 
> > (1) UART is at the head of the queue in the suspended state with priority 10.  Leave_Kernel code is pending after calling Enable_Interrupts.
> > (2) SPI interrupt fires.
> > (3) SPI task priority is 252 (interrupt priority).
> > (4) SPI task gets inserted at head of queue in the Runnable state.
> > (5) SPI task priority gets adjusted to 15 when Unlock_Entry is called.  It is left at the head of the queue.  SPI_task.Next = UART_task.
> > (6) UART interrupt fires.
> > (7) UART task priority is 250, and task is set to Runnable.
> > (8) UART task gets inserted at head of queue since its priority is greater than SPI_task.  Now UART_task.Next = SPI_task, and SPI_task.Next = UART_task.
> > (9) UART task priority is adjusted to 10 when Unlock_Entry is called.  Now SPI_task is head of queue, and SPI_task.Next = UART_task.  UART_task.Next = UART_task.
> > (10) SPI task runs.
> > (11) UART task runs twice.  Boom.
> > 
> > I think the Insert procedure needs to be modified to look through the entire queue for the thread to be inserted.  Might be simplest to just remove it if it exists before entering the if statement.
> > 
> 
> This version of the entire Insert procedure works for me:
> 
>    ------------
>    -- Insert --
>    ------------
> 
>    procedure Insert (Thread : Thread_Id) is
>       Aux_Pointer : Thread_Id;
>       CPU_Id      : constant CPU := Get_CPU (Thread);
> 
>    begin
> 
>       --  ??? This pragma is disabled because the Tasks_Activated only
>       --  represents the end of activation for one package not all the
>       --  packages. We have to find a better milestone for the end of
>       --  tasks activation.
> 
>       --  --  A CPU can only insert alarm in its own queue, except during
>       --  --  initialization.
> 
>       --  pragma Assert (CPU_Id = Current_CPU or else not Tasks_Activated);
> 
>       --  It may be the case that we try to insert a task that is already in
>       --  the queue. This can only happen if the task was not runnable and its
>       --  context was being used for handling an interrupt. Hence, if the task
>       --  is already in the queue and we try to insert it, we need to check
>       --  whether it is in the correct place.
> 
>       --  No insertion if the task is already at the head of the queue
> 
>       if First_Thread_Table (CPU_Id) = Thread then
>          null;
> 
>          --  Insert at the head of queue if there is no other thread
>          --  with a higher priority.
> 
>       elsif First_Thread_Table (CPU_Id) = Null_Thread_Id then
>          Thread.Next := First_Thread_Table (CPU_Id);
>          First_Thread_Table (CPU_Id) := Thread;
> 
>       else
>          --  Middle or tail insertion
> 
>          --  Remove the thread if it is already in the queue.  We know
>          --  the first thread is not null.
>          Aux_Pointer := First_Thread_Table (CPU_Id);
>          while Aux_Pointer.Next /= Null_Thread_Id
>            and then Aux_Pointer.Next /= Thread
>          loop
>             Aux_Pointer := Aux_Pointer.Next;
>          end loop;
> 
>          if Aux_Pointer.Next = Thread then
>             Aux_Pointer.Next := Thread.Next;
>          end if;
> 
>          if
>            Thread.Active_Priority > First_Thread_Table (CPU_Id).Active_Priority
>          then
>             Thread.Next := First_Thread_Table (CPU_Id);
>             First_Thread_Table (CPU_Id) := Thread;
>          else
>             --  Look for the Aux_Pointer to insert the thread just after it
> 
>             Aux_Pointer := First_Thread_Table (CPU_Id);
>             while Aux_Pointer.Next /= Null_Thread_Id
>               and then Aux_Pointer.Next.Active_Priority >=
>               Thread.Active_Priority
>             loop
>                Aux_Pointer := Aux_Pointer.Next;
>             end loop;
> 
>             --  Insert the thread after the Aux_Pointer
> 
>             Thread.Next := Aux_Pointer.Next;
>             Aux_Pointer.Next := Thread;
>          end if;
> 
>       end if;
>    end Insert;

FYI - This problem is not fixed in GNAT GPL 2015.  I reported it to AdaCore back in February.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-02-16 21:28   ` Simon Wright
  2015-02-19 20:14     ` Patrick Noffke
@ 2015-08-06 21:05     ` Patrick Noffke
  2015-08-06 21:43       ` Patrick Noffke
  1 sibling, 1 reply; 20+ messages in thread
From: Patrick Noffke @ 2015-08-06 21:05 UTC (permalink / raw)

On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:

> One thing that puzzles me is the locking/unlocking of the entry: this is
> done (in that RTS) by raising the caller task's priority to the ceiling
> priority of the task, if necessary. So what about interrupts? And when
> the handler wrapper (you can see this by compiling the package with the
> PO in with -gnatdg) locks the entry, it seems to raise the current
> task's priority, where the current task has nothing to do with the PO at
> all!
> 

There is another bug that I've discovered in the GNAT 2015 Ravenscar-full runtime (Cortex-M4) that appears to be caused by the Interrupt_Wrapper changing the task's priority.  When there is no Runnable task, Leave_Kernel will Insert the non-Runnable task back into the ready queue (with state of Delayed or Suspended).  This task is also stored in the Running_Thread_Table (only one value in the table for a single CPU).  Then an interrupt comes along and raises the priority of the Runnable_Thread_Table task (which is Delayed) to that of the interrupt.  And the task that is woken up by the interrupt goes later in the ready queue, since its priority is now lower than the Delayed task.

This ultimately causes the task that was readied by the interrupt to not run (in my case, it never runs again).  I haven't traced all the subsequent behavior to know exactly why this happens, but if I don't change priorities in the Interrupt_Wrapper, the problem seems to be gone.

Does anyone have an idea why the task priorities are changed in the Interrupt_Wrapper?  I'm wondering if I'm introducing a problem that was supposed to be fixed by not changing the priorities.  But as Simon pointed out, the current running task may have nothing to do with the PO associated with the interrupt, so it seems odd to change the priority here.

Patrick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-08-06 21:05     ` Patrick Noffke
@ 2015-08-06 21:43       ` Patrick Noffke
  2015-08-07 20:34         ` Patrick Noffke
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick Noffke @ 2015-08-06 21:43 UTC (permalink / raw)


On Thursday, August 6, 2015 at 4:05:15 PM UTC-5, Patrick Noffke wrote:
> On Monday, February 16, 2015 at 3:28:08 PM UTC-6, Simon Wright wrote:
> 
> > One thing that puzzles me is the locking/unlocking of the entry: this is
> > done (in that RTS) by raising the caller task's priority to the ceiling
> > priority of the task, if necessary. So what about interrupts? And when
> > the handler wrapper (you can see this by compiling the package with the
> > PO in with -gnatdg) locks the entry, it seems to raise the current
> > task's priority, where the current task has nothing to do with the PO at
> > all!
> > 
> 
> There is another bug that I've discovered in the GNAT 2015 Ravenscar-full runtime (Cortex-M4) that appears to be caused by the Interrupt_Wrapper changing the task's priority.  When there is no Runnable task, Leave_Kernel will Insert the non-Runnable task back into the ready queue (with state of Delayed or Suspended).  This task is also stored in the Running_Thread_Table (only one value in the table for a single CPU).  Then an interrupt comes along and raises the priority of the Runnable_Thread_Table task (which is Delayed) to that of the interrupt.  And the task that is woken up by the interrupt goes later in the ready queue, since its priority is now lower than the Delayed task.
> 
> This ultimately causes the task that was readied by the interrupt to not run (in my case, it never runs again).  I haven't traced all the subsequent behavior to know exactly why this happens, but if I don't change priorities in the Interrupt_Wrapper, the problem seems to be gone.
> 
> Does anyone have an idea why the task priorities are changed in the Interrupt_Wrapper?  I'm wondering if I'm introducing a problem that was supposed to be fixed by not changing the priorities.  But as Simon pointed out, the current running task may have nothing to do with the PO associated with the interrupt, so it seems odd to change the priority here.
> 

Not to mention, an interrupt handler (possibly with associated PO) may have nothing to do with *any* task, much less the current task.

A further point of confusion is why Self_Id.In_Interrupt is getting set to True while executing the user handler.  In_Interrupt is used by Current_Interrupt, which seems related to execution time stuff.  I'm not sure if this is needed (or accurate given the above).

Patrick


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ravenscar and context switching for Cortex-M4
  2015-08-06 21:43       ` Patrick Noffke
@ 2015-08-07 20:34         ` Patrick Noffke
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick Noffke @ 2015-08-07 20:34 UTC (permalink / raw)

On Thursday, August 6, 2015 at 4:43:52 PM UTC-5, Patrick Noffke wrote:
> > 
> > There is another bug that I've discovered in the GNAT 2015 Ravenscar-full runtime (Cortex-M4) that appears to be caused by the Interrupt_Wrapper changing the task's priority.  When there is no Runnable task, Leave_Kernel will Insert the non-Runnable task back into the ready queue (with state of Delayed or Suspended).  This task is also stored in the Running_Thread_Table (only one value in the table for a single CPU).  Then an interrupt comes along and raises the priority of the Runnable_Thread_Table task (which is Delayed) to that of the interrupt.  And the task that is woken up by the interrupt goes later in the ready queue, since its priority is now lower than the Delayed task.
> > 
> > This ultimately causes the task that was readied by the interrupt to not run (in my case, it never runs again).  I haven't traced all the subsequent behavior to know exactly why this happens, but if I don't change priorities in the Interrupt_Wrapper, the problem seems to be gone.
> > 

I just sent the following bug report to AdaCore:

I have discovered another bug in your GNAT 2015 Ravenscar runtime on a Cortex-M4 processor.

In s-bbthqu.adb, in the Change_Priority procedure, there is this comment:

      --  When raising the priority, it is not possible that there is another
      --  task with a higher priority (otherwise the other task would be
      --  running). Hence, there is no displacement required within the
      --  queue, because the thread is already in the first position.

That assumption is not correct when there are simultaneous interrupts before PendSV has had a chance to run.

I will detail how the bug manifests, but first I wanted to point out that the calls to Change_Priority in the Interrupt_Handler are not always (if ever) associated with the task that the interrupt may wake up, if there is even a task related to that interrupt.  Typically, when the interrupt fires, the current task is the Environment_Task, which is either Delayed or Suspended.  That task is assigned to Self_Id in the interrupt handler, which has nothing to do with the interrupt in my case.

The call to Change_Priority in Interrupt_Handler is part of the reason for this specific problem.

I have three tasks and two interrupts:
Main Loop (Environment_Task), priority = 0x00
PWM Task, priority = 0xC8
SPI Task, priority = 0x50

PWM ISR, priority = 0xFC
SPI ISR, priority = 0xFB

The PWM task is woken up by a PO associated with the PWM ISR, and similarly for the SPI task/ISR.

The condition for this error is the PWM ISR fires, and is immediately followed by the SPI ISR.  PendSV does not fire between (confirmed by an ITM trace capture).  In that case, the following sequence happens.  I will list active and base priorities for a task below as 0xAA/0xBB (0xAA = active priority, 0xBB = base priority).

1. 0x00 is the current task (i.e. in Runnable_Thread_Table, with Delayed state).
2. PWM ISR fires and calls Change_Priority, raising 0x00 priority to 0xFC.
3. PWM ISR causes PWM task (0xC8) to become ready, so Insert is called and ready queue order is 0xFC/0x00, 0xC8/0xC8.
4. PWM ISR calls Change_Priority to lower current task, so ready queue is 0xC8/0xC8, 0x00/0x00.
5. SPI ISR fires and raises 0x00 to 0xFB (PendSV hasn't fired, so 0x00 is still current task), but Change_Priority does not move it up in the queue (the faulty assumption in the above comment), so the ready queue is then 0xC8/0xC8, 0xFB/0x00.
6. Insert is called with the 0x50 task, so the ready queue is then 0xC8/0xC8, 0xFB/0x00, 0x50/0x50.
7. SPI ISR calls Change_Priority to lower current task, which puts 0x50 at the head (Change_Priority assigns Thread.Next to the head, and 0x00's "Next" is 0x50).  This causes 0xC8 to be thrown away, leaving the order at 0x50/0x50, 0x00/0x00.
8. 0x50 is run, followed by Extract, leaving only 0x00/0x00 on the queue.

Regards,
Patrick

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-08-07 20:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12 20:25 Ravenscar and context switching for Cortex-M4 Patrick Noffke
2015-02-12 21:28 ` Niklas Holsti
2015-02-13 12:41   ` G.B.
2015-02-13 16:25     ` Simon Wright
2015-02-13 18:08     ` Niklas Holsti
2015-02-13 19:01       ` Simon Wright
2015-02-13 23:45       ` Georg Bauhaus
2015-02-16 16:27 ` Patrick Noffke
2015-02-16 16:34   ` Patrick Noffke
2015-02-16 21:28   ` Simon Wright
2015-02-19 20:14     ` Patrick Noffke
2015-02-19 21:03       ` Bob Duff
2015-02-20 13:05         ` Simon Wright
2015-02-19 22:13       ` Patrick Noffke
2015-02-19 22:44         ` Patrick Noffke
2015-02-20  8:31           ` Simon Wright
2015-06-24 15:20           ` Patrick Noffke
2015-08-06 21:05     ` Patrick Noffke
2015-08-06 21:43       ` Patrick Noffke
2015-08-07 20:34         ` Patrick Noffke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox