High CPU in tasking

comp.lang.ada
 help / color / mirror / Atom feed

* High CPU in tasking
@ 2004-06-24 15:43 Lutz Donnerhacke
  2004-06-24 17:00 ` Nick Roberts
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-24 15:43 UTC (permalink / raw)


In order to stop others falling into the same mistake, I debugged several
days.

I wrote an data stream decoder and output (via TCP) manifolder using Ada
tasking and a protected ringbuffer. The whole application word fine, but the
CPU load increased linear on input load and dramatically over the number of
output queues.

Debugging turned out:
  - Every writing to the ringbuffer wakes up all reader tasks.
  - That's why the reader buffers were filled with only the little data amount
    just written.
  - Tasking overhead caused the CPU load.

Two solutions (used both):
  - The ringbuffer got a minimum reading length => Fewer wakeups.
  - The writer task collect a lot of data before writing
    => Fewer checks for wakeup.

The ratio of tasking events before and after the change is about 30:1.

Conclusion:
  When implementing tasking synchonisation with protected objects,
  keep in mind, that the standard Ada tasking model generates a near real
  time experience, which is mostly not required.

HTH



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 15:43 High CPU in tasking Lutz Donnerhacke
@ 2004-06-24 17:00 ` Nick Roberts
  2004-06-24 20:25   ` Lutz Donnerhacke
  2004-06-25 21:15 ` Mark Lorenzen
  2004-06-26  8:01 ` Wojtek Narczynski
  2 siblings, 1 reply; 14+ messages in thread
From: Nick Roberts @ 2004-06-24 17:00 UTC (permalink / raw)


"Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
news:slrncdltli.nr.lutz@taranis.iks-jena.de...

> In order to stop others falling into the same mistake, I debugged
> several days.
> ...

I'm obviously only guessing at the details of your application, Lutz, but
gleaning what I can, I'm surprised that you seem to be talking about one
buffer. Do you not have a separate buffer for each reader (of a single
demultiplexed stream)? Also, what were the priorities of the tasks?

-- 
Nick Roberts





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 17:00 ` Nick Roberts
@ 2004-06-24 20:25   ` Lutz Donnerhacke
  2004-06-24 21:56     ` Nick Roberts
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-24 20:25 UTC (permalink / raw)

* Nick Roberts wrote:
> I'm obviously only guessing at the details of your application, Lutz, but
> gleaning what I can, I'm surprised that you seem to be talking about one
> buffer.

Yep. The goal of the job is to decode a single data source and redistribute
the gained data to an unknown number of parallel readers with different
speeds.

> Do you not have a separate buffer for each reader (of a single
> demultiplexed stream)?

Yep. I do have a ringbuffer with a single "writer" and an indetermined
number of "readers". The data structure is a protected type with an entry
family for all readers.

> Also, what were the priorities of the tasks?

All the same. I'm currently testing with a data source of 1GB/h and eight
data sources of about half the rate. Each data source allocates a ringbuffer.

There are controlling tasks: A logger task writing to /dev/log, because
syslog(3) is not thread safe, and a supervisor task for handling the
commands on each the TCP session.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 20:25   ` Lutz Donnerhacke
@ 2004-06-24 21:56     ` Nick Roberts
  2004-06-25  7:34       ` Lutz Donnerhacke
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Roberts @ 2004-06-24 21:56 UTC (permalink / raw)


"Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
news:slrncdme55.brv.lutz@belenus.iks-jena.de...

> > Do you not have a separate buffer for each reader (of a single
> > demultiplexed stream)?
>
> Yep. I do have a ringbuffer with a single "writer" and an indetermined
> number of "readers". The data structure is a protected type with an entry
> family for all readers.

Would it be possible to have a separate protected object container a buffer
for each reader?

The idea is that the writer task: accepts a data packet; determines which
reader it belongs to; writes the packet into the buffer (protected object)
for this reader. This means that (in theory) different readers can be
reading in parallel with each other (and with the writer writing into a
different buffer), and it also means (hopefully) that a reader will not be
woken up unless a packet arrives that it will certainly read (not any other
reader).

If this approach is possible, I think it might more neatly solve your
problem ...

> > Also, what were the priorities of the tasks?
>
> All the same. I'm currently testing with a data source of 1GB/h and eight
> data sources of about half the rate. Each data source allocates a
ringbuffer.

... especially if you make the writer a higher priority than all the
readers.

> There are controlling tasks: A logger task writing to /dev/log, because
> syslog(3) is not thread safe, and a supervisor task for handling the
> commands on each the TCP session.

I suggest the logger task should made a lower priority than all the other
tasks (so that it tends to work in the gaps between reader and writer
activity). I suggest the supervisor task be made a higher priority than all
the other tasks, so as to minimise any delay in acting on incoming commands
(such as 'stop' ;-)

Obviously I could be grasping the wrong end of the stick here!

-- 
Nick Roberts





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 21:56     ` Nick Roberts
@ 2004-06-25  7:34       ` Lutz Donnerhacke
  2004-06-25 17:03         ` Nick Roberts
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-25  7:34 UTC (permalink / raw)


* Nick Roberts wrote:
> "Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
>> > Do you not have a separate buffer for each reader (of a single
>> > demultiplexed stream)?
>>
>> Yep. I do have a ringbuffer with a single "writer" and an indetermined
>> number of "readers". The data structure is a protected type with an entry
>> family for all readers.
>
> Would it be possible to have a separate protected object container a buffer
> for each reader?

In this case, the writer has to fill an unconstraint array of ringbuffers.
This seems much more inefficient.

> The idea is that the writer task: accepts a data packet; determines which
> reader it belongs to;

Every data belongs to every reader. It's a manifolder which clones the whole
data to many receivers.

> If this approach is possible, I think it might more neatly solve your
> problem ...

The problem is solved. I published my solution to point others to this
pitfall.

>> All the same. I'm currently testing with a data source of 1GB/h and eight
>> data sources of about half the rate. Each data source allocates a
>> ringbuffer.
>
> ... especially if you make the writer a higher priority than all the
> readers.

Why? This would cause the writer to overwrite the ringbuffer without any
reading access.

>> There are controlling tasks: A logger task writing to /dev/log, because
>> syslog(3) is not thread safe, and a supervisor task for handling the
>> commands on each the TCP session.
>
> I suggest the logger task should made a lower priority than all the other
> tasks (so that it tends to work in the gaps between reader and writer
> activity). I suggest the supervisor task be made a higher priority than all
> the other tasks, so as to minimise any delay in acting on incoming commands
> (such as 'stop' ;-)

The whole program is event triggered. There are no free running code
sequences. The only effect I can imagine will be a deadlock ;-)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-25  7:34       ` Lutz Donnerhacke
@ 2004-06-25 17:03         ` Nick Roberts
  2004-06-28  8:32           ` Lutz Donnerhacke
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Roberts @ 2004-06-25 17:03 UTC (permalink / raw)


"Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
news:slrncdnlc9.nv.lutz@taranis.iks-jena.de...

> > Would it be possible to have a separate protected object
> > container a buffer for each reader?
>
> In this case, the writer has to fill an unconstraint array of
> ringbuffers.
> ...
> Every data belongs to every reader. It's a manifolder which
> clones the whole data to many receivers.

That was one of the application details I didn't know.

> This seems much more inefficient.

Not necessarily. Again, I must make some assumptions about your application,
and the machine it is running on (specifically that it is or may be SMP),
but it is possible that the time cost of writing each packet n times rather
than once would be more than compensated for by the decrease in memory
contention of multiple readers simultaneously processing the packet.

In fact, the only way to establish which approach worked better would be to
try them both and measure their speed.

Note that the cost of contention, mainly because of the cache flushing
caused, could be very high compared to the cost of writing, which might
require just one cache flush. It's interesting to note that this is a case
where it would be useful to have compiler (or linker) support for placing
objects in memory locations that correspond to different cache 'colours'.

> > ... especially if you make the writer a higher priority than
> > all the readers.
>
> Why? This would cause the writer to overwrite the ringbuffer
> without any reading access.

Not unless your code is faulty. You must not rely upon priority or
expectations of the executional progress of tasks to replace proper
synchronisation control.

I think the priority adjustments I suggested are also invalidated by the
information you have given (above). I won't explain why I made the
suggestion, as I don't want to confuse you, but trust me that there was a
good reason.

> > I suggest the logger task should made a lower priority than all
> > the other tasks (so that it tends to work in the gaps between
> > reader and writer activity). I suggest the supervisor task be made
> > a higher priority than all the other tasks, so as to minimise any
> > delay in acting on incoming commands (such as 'stop' ;-)
>
> The whole program is event triggered. There are no free running
> code sequences. The only effect I can imagine will be a deadlock ;-)

Not unless your code is faulty. You must not rely upon priority or
expectations of the executional progress of tasks to replace proper
synchronisation control. (Hope I don't sound like a parrot :-)

If you would be willing to show the relevant code, and describe your
application's goals and design in a bit more detail, I'd be very interested.

-- 
Nick Roberts





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 15:43 High CPU in tasking Lutz Donnerhacke
  2004-06-24 17:00 ` Nick Roberts
@ 2004-06-25 21:15 ` Mark Lorenzen
  2004-06-26  8:01 ` Wojtek Narczynski
  2 siblings, 0 replies; 14+ messages in thread
From: Mark Lorenzen @ 2004-06-25 21:15 UTC (permalink / raw)

Lutz Donnerhacke <lutz@iks-jena.de> writes:

> In order to stop others falling into the same mistake, I debugged several
> days.
> 
> I wrote an data stream decoder and output (via TCP) manifolder using Ada
> tasking and a protected ringbuffer. The whole application word fine, but the
> CPU load increased linear on input load and dramatically over the number of
> output queues.
> 
> Debugging turned out:
>   - Every writing to the ringbuffer wakes up all reader tasks.
>   - That's why the reader buffers were filled with only the little data amount
>     just written.
>   - Tasking overhead caused the CPU load.
> 
> Two solutions (used both):
>   - The ringbuffer got a minimum reading length => Fewer wakeups.
>   - The writer task collect a lot of data before writing
>     => Fewer checks for wakeup.
> 
> The ratio of tasking events before and after the change is about 30:1.
> 
> Conclusion:
>   When implementing tasking synchonisation with protected objects,
>   keep in mind, that the standard Ada tasking model generates a near real
>   time experience, which is mostly not required.
> 
> HTH

These are some very good observations regarding soft- vs hard
real-time systems. In soft real-time systems we often want high
throughput and gladly sacrifice responsiveness.

A favorite design of mine, is to have one task for each interface,
that encapsulates all the horrible properties of the interface
f.x. TCP's chopping of data and so on. When a complete message,
telemetry frame or whatever has been read, it can be put into a queue
for further treatment. Just like when your writer collects a lot of
data before writing.

If it is necessary with a lot of readers, you may find out that one
task for each reader is not the right solution, but one task for many
readers should be used. In very high performance systems, you may want
one task for each network interface card on the computer in order to
avoid that the tasks compete for the limited resource (i.e. the
NIC). One task would then serve many TCP sessions and buffer write
requests in a way so that it can write large chunks at a time.

Do not underestimate the overhead from tasking.

Regards,
- Mark Lorenzen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-24 15:43 High CPU in tasking Lutz Donnerhacke
  2004-06-24 17:00 ` Nick Roberts
  2004-06-25 21:15 ` Mark Lorenzen
@ 2004-06-26  8:01 ` Wojtek Narczynski
  2004-06-28  8:17   ` Lutz Donnerhacke
  2 siblings, 1 reply; 14+ messages in thread
From: Wojtek Narczynski @ 2004-06-26  8:01 UTC (permalink / raw)


Hello,

> Debugging turned out:
>   - Every writing to the ringbuffer wakes up all reader tasks.

This sounds like the "thundering herd" problem linux used to have with
acccept call. This has been fixed, so maybe in this case the problem
should be reported and eventually fixed, or is it required to wake em
all for some reason?

Regards,
Wojtek



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-26  8:01 ` Wojtek Narczynski
@ 2004-06-28  8:17   ` Lutz Donnerhacke
  0 siblings, 0 replies; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-28  8:17 UTC (permalink / raw)


* Wojtek Narczynski wrote:
>> Debugging turned out:
>>   - Every writing to the ringbuffer wakes up all reader tasks.
>
> This sounds like the "thundering herd" problem linux used to have with
> acccept call. This has been fixed, so maybe in this case the problem
> should be reported and eventually fixed, or is it required to wake em
> all for some reason?

No, there is no "thundering herd" problem. Because written data is processed
by all reader tasks, every waked task has work to do.

I only point to the ARM requried behavior of a protected task.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-25 17:03         ` Nick Roberts
@ 2004-06-28  8:32           ` Lutz Donnerhacke
  2004-06-29 17:26             ` Nick Roberts
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-28  8:32 UTC (permalink / raw)


* Nick Roberts wrote:
> Not necessarily. Again, I must make some assumptions about your application,
> and the machine it is running on (specifically that it is or may be SMP),
> but it is possible that the time cost of writing each packet n times rather
> than once would be more than compensated for by the decrease in memory
> contention of multiple readers simultaneously processing the packet.

The writer task cosumes about 20% of CPU in decoding, while the reader tasks
consume about 1% in writing out. The tasking overhead is about 0.7% CPU per
task.

Before introducing minimal reading amounts, I got about 80% CPU per reader
task! And my minimal enviroment contains nine downstreams. *Orks*

You might be right in suggesting multiple buffers, but I have to postphone
measurments on this idea to a later stage. Currently the customer is waiting.

> Note that the cost of contention, mainly because of the cache flushing
> caused, could be very high compared to the cost of writing, which might
> require just one cache flush. It's interesting to note that this is a case
> where it would be useful to have compiler (or linker) support for placing
> objects in memory locations that correspond to different cache 'colours'.

Honestly, I do not understand your pragraph, but I will try to read about
caching techniques. Thanks for the hint.

[different priorities]
>> Why? This would cause the writer to overwrite the ringbuffer
>> without any reading access.
>
> Not unless your code is faulty. You must not rely upon priority or
> expectations of the executional progress of tasks to replace proper
> synchronisation control.

Correct. I'll try this out. It might be helpful in improving my mental model
of the process itself.

> I think the priority adjustments I suggested are also invalidated by the
> information you have given (above). I won't explain why I made the
> suggestion, as I don't want to confuse you, but trust me that there was a
> good reason.

;-)

>> The whole program is event triggered. There are no free running
>> code sequences. The only effect I can imagine will be a deadlock ;-)
>
> Not unless your code is faulty.

My code is definitly broken. I get tasking errors :-(

> You must not rely upon priority or expectations of the executional
> progress of tasks to replace proper synchronisation control. (Hope I
> don't sound like a parrot :-)

At the moment I do only rely on the proper synchronisation control of a
protected type with entry families. But I'll try to improve this using your
ideas.

> If you would be willing to show the relevant code, and describe your
> application's goals and design in a bit more detail, I'd be very interested.

package Ringbuffers is
   pragma Elaborate_Body;
   
   -- Position is not monotonic in order to process infinite streams.
   type Position is mod 2**31;
   
   -- Entry families are limited by
   -- System.Tasking.Protected_Objects.Max_Protected_Entry (GNAT)
   -- so a level of indirection is necessary
   type Protected_Entry is range 1 .. 100;
   type Entry_Position is array(Protected_Entry) of Position;
   type Entry_Boolean  is array(Protected_Entry) of Boolean;
   pragma Pack(Entry_Boolean);
   No_Free_Entry : exception;
   
   type Main_Statistic_Type is record
      write   : Position;
      readers : Natural;      
   end record;

   type Statistic_Type(connected : Boolean) is record
      write : Position;
      case connected is
         when False => null;
         when True =>
            behind : Natural;
            missed : Natural;
            note   : Boolean;
      end case;
   end record;

   protected type Ringbuffer(Size : Positive; Minimum_Output : Natural) is
      procedure Register(index : out Protected_Entry); -- raise No_Free_Entry
      procedure Unregister(index :   Protected_Entry);
      function Statistics return Main_Statistic_Type;

      procedure Notify(index : Protected_Entry);
      function Statistics(index : Protected_Entry) return Statistic_Type;

      procedure Put(data : in String);
      entry Get(Protected_Entry)(
        notify : out Boolean;
        data   : out String;
        last   : out Natural;
        missed : out Natural
      );
   private
      write  : Position := Position'First;
      read   : Entry_Position;
      usage, notification : Entry_Boolean := (others => False);
      buffer : String(1 .. Size);
   end Ringbuffer;
end Ringbuffers;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-28  8:32           ` Lutz Donnerhacke
@ 2004-06-29 17:26             ` Nick Roberts
  2004-06-30 12:26               ` Lutz Donnerhacke
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Roberts @ 2004-06-29 17:26 UTC (permalink / raw)


"Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
news:slrncdvlt6.nt.lutz@taranis.iks-jena.de...

> ...
> Honestly, I do not understand your pragraph, but I will try to read
> about caching techniques. Thanks for the hint.

I think you might find this web page interesting:

   http://linux0.cs.uaf.edu/als_proceedings/papers/sears/sears_html/

> > If you would be willing to show the relevant code, and describe
> > your application's goals and design in a bit more detail, I'd be
> > very interested.
>
> package Ringbuffers is
> ...

Looking at this briefly, I can make some comments based on a guess at the
design and required functionality.

I suspect that one thing you need to do is to make Register an entry, and
allow the writer to wait for space in the buffer if necessary (because it
has got too advanced on the readers).

Do you need to explicitly unregister positions? If you registered with an
initial count (which would always be the number of readers), Notify could
decrement the count, and when the count goes down to 0, you assume the
position is unregistered.

I suspect the data itself ('buffer') doesn't need to go into the protected
object, only registrations. Then I don't think you would need to have Get or
Put (and so no need for a family of entries for Get); just have a shared
array.

If these suggestions make any sense, I'd be happy to suggest more detailed
code, if you wish!

-- 
Power to your elbow,
Nick Roberts





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-29 17:26             ` Nick Roberts
@ 2004-06-30 12:26               ` Lutz Donnerhacke
  2004-06-30 23:39                 ` Randy Brukardt
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-06-30 12:26 UTC (permalink / raw)

* Nick Roberts wrote:
> "Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
>> Honestly, I do not understand your pragraph, but I will try to read
>> about caching techniques. Thanks for the hint.
>
> I think you might find this web page interesting:
>    http://linux0.cs.uaf.edu/als_proceedings/papers/sears/sears_html/

Thanx.

>> package Ringbuffers is
>
> I suspect that one thing you need to do is to make Register an entry, and
> allow the writer to wait for space in the buffer if necessary (because it
> has got too advanced on the readers).

Nope. "Put" is a procedure and therefore will not wait for any reader.
The readers call an entry (and are identified by the parameter of the entry
family, in order to access there current read positions) to wait for new data.
If the reader is too slow, it gets a positive "missed" value back.

> Do you need to explicitly unregister positions?

I do register readers, because entry families are limited in the number of
distince values possible. So a level of indirection is necessary is provided
by registration.

> I suspect the data itself ('buffer') doesn't need to go into the protected
> object, only registrations.

Currently the data is protected. It's an interesting idea to keep it out and
use a protected type for the positions only. This seems to result in explicit
semaphores.

> Then I don't think you would need to have Get or Put (and so no need for
> a family of entries for Get); just have a shared array.

It's not that easy, because the write process can overwrite slow readers.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-30 12:26               ` Lutz Donnerhacke
@ 2004-06-30 23:39                 ` Randy Brukardt
  2004-07-01  7:02                   ` Lutz Donnerhacke
  0 siblings, 1 reply; 14+ messages in thread
From: Randy Brukardt @ 2004-06-30 23:39 UTC (permalink / raw)

"Lutz Donnerhacke" <lutz@iks-jena.de> wrote in message
news:slrnce5cbd.m1.lutz@taranis.iks-jena.de...
> * Nick Roberts wrote:
...
> > Do you need to explicitly unregister positions?
>
> I do register readers, because entry families are limited in the number of
> distince values possible. So a level of indirection is necessary is
provided
> by registration.

It should be noted that the limitation on the range of protected entry
families is a weird limitation of some versions of GNAT, and not one
intended/expected by the Ada language. GNAT is the only Ada compiler of the
ones I usually test that has this sort of limitation. Janus/Ada, for
instance, treats protected entry families as an additional parameter to the
entry (which is very different than the way task entry families are
implemented). Perhaps you'd have better performance if you didn't need this
workaround?

                     Randy.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: High CPU in tasking
  2004-06-30 23:39                 ` Randy Brukardt
@ 2004-07-01  7:02                   ` Lutz Donnerhacke
  0 siblings, 0 replies; 14+ messages in thread
From: Lutz Donnerhacke @ 2004-07-01  7:02 UTC (permalink / raw)


* Randy Brukardt wrote:
> It should be noted that the limitation on the range of protected entry
> families is a weird limitation of some versions of GNAT, and not one
> intended/expected by the Ada language. GNAT is the only Ada compiler of
> the ones I usually test that has this sort of limitation. Janus/Ada, for
> instance, treats protected entry families as an additional parameter to
> the entry (which is very different than the way task entry families are
> implemented). Perhaps you'd have better performance if you didn't need
> this workaround?

The registration overhead is small, so I do expect only the significant
drop in management overhead.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-07-01  7:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-24 15:43 High CPU in tasking Lutz Donnerhacke
2004-06-24 17:00 ` Nick Roberts
2004-06-24 20:25   ` Lutz Donnerhacke
2004-06-24 21:56     ` Nick Roberts
2004-06-25  7:34       ` Lutz Donnerhacke
2004-06-25 17:03         ` Nick Roberts
2004-06-28  8:32           ` Lutz Donnerhacke
2004-06-29 17:26             ` Nick Roberts
2004-06-30 12:26               ` Lutz Donnerhacke
2004-06-30 23:39                 ` Randy Brukardt
2004-07-01  7:02                   ` Lutz Donnerhacke
2004-06-25 21:15 ` Mark Lorenzen
2004-06-26  8:01 ` Wojtek Narczynski
2004-06-28  8:17   ` Lutz Donnerhacke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox