* avoiding builtin memset @ 2017-04-24 16:06 Jere 2017-04-24 16:56 ` Shark8 2017-05-24 15:08 ` Frédéric PRACA 0 siblings, 2 replies; 13+ messages in thread From: Jere @ 2017-04-24 16:06 UTC (permalink / raw) GNAT GPL 2016 Windows 10 Cross compiled to arm cortex m0+ Full Optimization With a small runtime that I am modifying, I am not linking in the standard c libraries. This means whenever I do an array initialize, gcc tries to link in a non existent memset call. I started working on an Ada version of memset which I export out. The problem comes when memset tries to recursively call itself. The compiler is too smart for me. At the end of my memset I have a simple loop: while Current_Address < End_Address loop Convert_8.To_Pointer (Current_Address).all := Uint8_Value; Current_Address := Current_Address + 1; end loop; It serves two purposes: 1. It finishes up any leftover bytes on an unaligned array 2. If the data set is small enough (<= 16), the function skips immediately to this loop and just does a byte copy on the whole thing rather than try and do all the extra logic for an aligned copy However GNAT tries to convert that to another memset call, which doesn't work well, since I am in memset. My immediate workaround is to make the loop more complex: Count := Count * 2; -- To avod recursive memset call while Count > 0 loop Convert_8.To_Pointer (Current_Address).all := Uint8_Value; Current_Address := Current_Address + 1; Count := Count - 2; end loop; But I don't really like this as it adds unnecessary overhead. I looked around for gcc switches to inhibit calls to memset, but the only one I found ( -fno-builtins ) only works on C files. I really don't want to write it in C (or assembly for that matter). I think I can also do ASM statements, but I would hate to have to do ASM if there is an Ada solution. I also don't know if GCC would just optimize the ASM to a memset call anyways. Do I have any other options that would let me do it in Ada? I didn't see pragmas that jumped out at me. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-24 16:06 avoiding builtin memset Jere @ 2017-04-24 16:56 ` Shark8 2017-04-25 1:21 ` Anh Vo 2017-04-27 0:22 ` Jere 2017-05-24 15:08 ` Frédéric PRACA 1 sibling, 2 replies; 13+ messages in thread From: Shark8 @ 2017-04-24 16:56 UTC (permalink / raw) I suppose you could try this: Procedure Memset( Address : System.Address; Value : System.Storage_Elements.Storage_Element; Length : Natural ) is Use System.Storage_Elements; Memory : Storage_Array(1..Storage_Offset(Length)) with Import, Address => Address; Begin For Element of Memory loop Element := Value; end loop; End Memset; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-24 16:56 ` Shark8 @ 2017-04-25 1:21 ` Anh Vo 2017-04-25 2:57 ` Luke A. Guest 2017-04-27 0:22 ` Jere 1 sibling, 1 reply; 13+ messages in thread From: Anh Vo @ 2017-04-25 1:21 UTC (permalink / raw) On Monday, April 24, 2017 at 9:56:51 AM UTC-7, Shark8 wrote: > I suppose you could try this: > > Procedure Memset( > Address : System.Address; > Value : System.Storage_Elements.Storage_Element; > Length : Natural > ) is > Use System.Storage_Elements; > Memory : Storage_Array(1..Storage_Offset(Length)) > with Import, Address => Address; > Begin > For Element of Memory loop > Element := Value; > end loop; > End Memset; It is very nice. Indeed, it is compact and readable piece of code. Anh Vo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-25 1:21 ` Anh Vo @ 2017-04-25 2:57 ` Luke A. Guest 2017-04-25 18:43 ` Shark8 0 siblings, 1 reply; 13+ messages in thread From: Luke A. Guest @ 2017-04-25 2:57 UTC (permalink / raw) Anh Vo <anhvofrcaus@gmail.com> wrote: > On Monday, April 24, 2017 at 9:56:51 AM UTC-7, Shark8 wrote: >> I suppose you could try this: >> >> Procedure Memset( >> Address : System.Address; >> Value : System.Storage_Elements.Storage_Element; >> Length : Natural >> ) is >> Use System.Storage_Elements; >> Memory : Storage_Array(1..Storage_Offset(Length)) >> with Import, Address => Address; >> Begin >> For Element of Memory loop >> Element := Value; >> end loop; >> End Memset; > > It is very nice. Indeed, it is compact and readable piece of code. > > Anh Vo > You'll need to export that as the C memset function. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-25 2:57 ` Luke A. Guest @ 2017-04-25 18:43 ` Shark8 2017-04-25 22:18 ` Luke A. Guest 0 siblings, 1 reply; 13+ messages in thread From: Shark8 @ 2017-04-25 18:43 UTC (permalink / raw) On Monday, April 24, 2017 at 8:57:10 PM UTC-6, Luke A. Guest wrote: > Anh Vo wrote: > > > > It is very nice. Indeed, it is compact and readable piece of code. > > > > Anh Vo Thank you. > > You'll need to export that as the C memset function. Even if you're only using it in Ada functions? Also, I suppose you could also add Pragma Inspection_Point(Element) inside the loop [at the end] and Pragma Inspection_Point(Memory) just before "end Memset;" to ensure that there's no optimization to a single/internal memset call. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-25 18:43 ` Shark8 @ 2017-04-25 22:18 ` Luke A. Guest 2017-04-26 7:35 ` Simon Wright 0 siblings, 1 reply; 13+ messages in thread From: Luke A. Guest @ 2017-04-25 22:18 UTC (permalink / raw) Shark8 <onewingedshark@gmail.com> wrote: >> You'll need to export that as the C memset function. > > Even if you're only using it in Ada functions? Yup, GNAT expects certain functions to be present and generates calls to them. Check system.ads for mention of allowing assignment, that basically calls memcpy on assignment for objects that require a memory copy rather than a register to register copy. Others are bzero/ memset. > > Also, I suppose you could also add Pragma Inspection_Point(Element) > inside the loop [at the end] and Pragma Inspection_Point(Memory) just > before "end Memset;" to ensure that there's no optimization to a > single/internal memset call. > No idea, never used them. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-25 22:18 ` Luke A. Guest @ 2017-04-26 7:35 ` Simon Wright 2017-04-26 13:44 ` Lucretia 0 siblings, 1 reply; 13+ messages in thread From: Simon Wright @ 2017-04-26 7:35 UTC (permalink / raw) Luke A. Guest <laguest@archeia.com> writes: > Shark8 <onewingedshark@gmail.com> wrote: > >>> You'll need to export that as the C memset function. >> >> Even if you're only using it in Ada functions? > > > Yup, GNAT expects certain functions to be present and generates calls > to them. Check system.ads for mention of allowing assignment, that > basically calls memcpy on assignment for objects that require a memory > copy rather than a register to register copy. Not quite sure this is set in system.ads: in the 6.1 sources, the only reference to memset in the compiler itself is in exp_aggr.adb, -- The ultimate goal is to generate a call to a fast memset routine -- specifically optimized for the target. function Aggr_Assignment_OK_For_Backend (N : Node_Id) return Boolean is whereas for System.Support_Composite_Assign there is in sem_ch5.adb -- Check for non-allowed composite assignment if not Support_Composite_Assign_On_Target and then (Is_Array_Type (T1) or else Is_Record_Type (T1)) and then (not Has_Size_Clause (T1) or else Esize (T1) > 64) then Error_Msg_CRT ("composite assignment", N); end if; (don't like the look of that 'Esize (T1) > 64'!) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-26 7:35 ` Simon Wright @ 2017-04-26 13:44 ` Lucretia 2017-04-26 15:22 ` Simon Wright 0 siblings, 1 reply; 13+ messages in thread From: Lucretia @ 2017-04-26 13:44 UTC (permalink / raw) On Wednesday, 26 April 2017 08:35:05 UTC+1, Simon Wright wrote: > Luke A. Guest <me@me.com> writes: > > > Shark8 <onewingedshark@> wrote: > > > >>> You'll need to export that as the C memset function. > >> > >> Even if you're only using it in Ada functions? > > > > > > Yup, GNAT expects certain functions to be present and generates calls > > to them. Check system.ads for mention of allowing assignment, that > > basically calls memcpy on assignment for objects that require a memory > > copy rather than a register to register copy. > > Not quite sure this is set in system.ads: in the 6.1 sources, the only > reference to memset in the compiler itself is in exp_aggr.adb, See Support_Composite_Assign and the comment inside targparm.ads: -- The assignment of composite objects other than small records and -- arrays whose size is 64-bits or less and is set by an explicit -- size clause may generate calls to memcpy, memmove, and bcopy. -- If versions of all these routines are available, then this flag -- is set to True. If any of these routines is not available, then -- the flag is set False, and composite assignments are not allowed. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-26 13:44 ` Lucretia @ 2017-04-26 15:22 ` Simon Wright 0 siblings, 0 replies; 13+ messages in thread From: Simon Wright @ 2017-04-26 15:22 UTC (permalink / raw) Lucretia <laguest9000@googlemail.com> writes: > On Wednesday, 26 April 2017 08:35:05 UTC+1, Simon Wright wrote: >> Luke A. Guest <me@me.com> writes: >> >> > Shark8 <onewingedshark@> wrote: >> > >> >>> You'll need to export that as the C memset function. >> >> >> >> Even if you're only using it in Ada functions? >> > >> > >> > Yup, GNAT expects certain functions to be present and generates calls >> > to them. Check system.ads for mention of allowing assignment, that >> > basically calls memcpy on assignment for objects that require a memory >> > copy rather than a register to register copy. >> >> Not quite sure this is set in system.ads: in the 6.1 sources, the only >> reference to memset in the compiler itself is in exp_aggr.adb, > > See Support_Composite_Assign and the comment inside targparm.ads: > > -- The assignment of composite objects other than small records and > -- arrays whose size is 64-bits or less and is set by an explicit > -- size clause may generate calls to memcpy, memmove, and bcopy. > -- If versions of all these routines are available, then this flag > -- is set to True. If any of these routines is not available, then > -- the flag is set False, and composite assignments are not allowed. Oh, I was only looking in targparm.adb. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-24 16:56 ` Shark8 2017-04-25 1:21 ` Anh Vo @ 2017-04-27 0:22 ` Jere 2017-04-27 4:35 ` J-P. Rosen 1 sibling, 1 reply; 13+ messages in thread From: Jere @ 2017-04-27 0:22 UTC (permalink / raw) On Monday, April 24, 2017 at 12:56:51 PM UTC-4, Shark8 wrote: > I suppose you could try this: > > Procedure Memset( > Address : System.Address; > Value : System.Storage_Elements.Storage_Element; > Length : Natural > ) is > Use System.Storage_Elements; > Memory : Storage_Array(1..Storage_Offset(Length)) > with Import, Address => Address; > Begin > For Element of Memory loop > Element := Value; > end loop; > End Memset; Thanks! I'll be honest, I wasn't expecting it to avoid memset (didn't see why it should), but it didn't recursively call it. It did however, do a very interesting loop unroll: *************************************************************** <snipped> 208: 2a00 cmp r2, #0 20a: d02d beq.n 268 <memset+0xa8> 20c: 7029 strb r1, [r5, #0] 20e: 2a01 cmp r2, #1 210: d02a beq.n 268 <memset+0xa8> 212: 7069 strb r1, [r5, #1] 214: 2a02 cmp r2, #2 216: d027 beq.n 268 <memset+0xa8> 218: 70a9 strb r1, [r5, #2] 21a: 2a03 cmp r2, #3 21c: d024 beq.n 268 <memset+0xa8> 21e: 70e9 strb r1, [r5, #3] 220: 2a04 cmp r2, #4 222: d021 beq.n 268 <memset+0xa8> 224: 7129 strb r1, [r5, #4] 226: 2a05 cmp r2, #5 228: d01e beq.n 268 <memset+0xa8> 22a: 7169 strb r1, [r5, #5] 22c: 2a06 cmp r2, #6 22e: d01b beq.n 268 <memset+0xa8> 230: 71a9 strb r1, [r5, #6] 232: 2a07 cmp r2, #7 234: d018 beq.n 268 <memset+0xa8> 236: 71e9 strb r1, [r5, #7] 238: 2a08 cmp r2, #8 23a: d015 beq.n 268 <memset+0xa8> 23c: 7229 strb r1, [r5, #8] 23e: 2a09 cmp r2, #9 240: d012 beq.n 268 <memset+0xa8> 242: 7269 strb r1, [r5, #9] 244: 2a0a cmp r2, #10 246: d00f beq.n 268 <memset+0xa8> 248: 72a9 strb r1, [r5, #10] 24a: 2a0b cmp r2, #11 24c: d00c beq.n 268 <memset+0xa8> 24e: 72e9 strb r1, [r5, #11] 250: 2a0c cmp r2, #12 252: d009 beq.n 268 <memset+0xa8> 254: 7329 strb r1, [r5, #12] 256: 2a0d cmp r2, #13 258: d006 beq.n 268 <memset+0xa8> 25a: 7369 strb r1, [r5, #13] 25c: 2a0e cmp r2, #14 25e: d003 beq.n 268 <memset+0xa8> 260: 73a9 strb r1, [r5, #14] 262: 2a0f cmp r2, #15 264: d000 beq.n 268 <memset+0xa8> 266: 73e9 strb r1, [r5, #15] 268: b002 add sp, #8 26a: bd70 pop {r4, r5, r6, pc} *************************************************************** That was interesting. That's with -03 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-27 0:22 ` Jere @ 2017-04-27 4:35 ` J-P. Rosen 2017-04-27 7:09 ` Simon Wright 0 siblings, 1 reply; 13+ messages in thread From: J-P. Rosen @ 2017-04-27 4:35 UTC (permalink / raw) Le 27/04/2017 à 02:22, Jere a écrit : >> For Element of Memory loop >> Element := Value; >> end loop; > > I'll be honest, I wasn't expecting it to avoid memset (didn't see why > it should), but it didn't recursively call it. It did however, do a > very interesting loop unroll: > [...] What if you replace the loop with: Memory := (others => Value); ? -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 http://www.adalog.fr ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-27 4:35 ` J-P. Rosen @ 2017-04-27 7:09 ` Simon Wright 0 siblings, 0 replies; 13+ messages in thread From: Simon Wright @ 2017-04-27 7:09 UTC (permalink / raw) "J-P. Rosen" <rosen@adalog.fr> writes: > Le 27/04/2017 à 02:22, Jere a écrit : >>> For Element of Memory loop >>> Element := Value; >>> end loop; >> >> I'll be honest, I wasn't expecting it to avoid memset (didn't see why >> it should), but it didn't recursively call it. It did however, do a >> very interesting loop unroll: >> [...] > What if you replace the loop with: > Memory := (others => Value); That's when the generated code calls memset(3). Unless, I guess, you've set System up to forbid composite assignments? I changed my (Cortex) system.ads to Support_Aggregates : constant Boolean := False; and this code -- Initialize BSS in SRAM Bss := (others => 0); results in startup.adb:113:14: aggregate not supported by configuration whereas system.ads: Support_Composite_Assign : constant Boolean := False; results in startup.adb:113:11: composite assignment not supported by configuration ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: avoiding builtin memset 2017-04-24 16:06 avoiding builtin memset Jere 2017-04-24 16:56 ` Shark8 @ 2017-05-24 15:08 ` Frédéric PRACA 1 sibling, 0 replies; 13+ messages in thread From: Frédéric PRACA @ 2017-05-24 15:08 UTC (permalink / raw) Le lundi 24 avril 2017 18:06:11 UTC+2, Jere a écrit : > GNAT GPL 2016 > Windows 10 > Cross compiled to arm cortex m0+ > Full Optimization > > With a small runtime that I am modifying, I am not linking in the standard c > libraries. This means whenever I do an array initialize, gcc tries to link > in a non existent memset call. I started working on an Ada version of > memset which I export out. The problem comes when memset tries to > recursively call itself. The compiler is too smart for me. > > At the end of my memset I have a simple loop: > while Current_Address < End_Address loop > Convert_8.To_Pointer (Current_Address).all := Uint8_Value; > Current_Address := Current_Address + 1; > end loop; > > It serves two purposes: > 1. It finishes up any leftover bytes on an unaligned array > 2. If the data set is small enough (<= 16), the function skips immediately > to this loop and just does a byte copy on the whole thing rather > than try and do all the extra logic for an aligned copy > > However GNAT tries to convert that to another memset call, which doesn't > work well, since I am in memset. > > My immediate workaround is to make the loop more complex: > Count := Count * 2; -- To avod recursive memset call > while Count > 0 loop > Convert_8.To_Pointer (Current_Address).all := Uint8_Value; > Current_Address := Current_Address + 1; > Count := Count - 2; > end loop; > > But I don't really like this as it adds unnecessary overhead. > > I looked around for gcc switches to inhibit calls to memset, but the only > one I found ( -fno-builtins ) only works on C files. I really don't want to > write it in C (or assembly for that matter). > > I think I can also do ASM statements, but I would hate to have to do ASM if > there is an Ada solution. I also don't know if GCC would just optimize > the ASM to a memset call anyways. > > Do I have any other options that would let me do it in Ada? I didn't see > pragmas that jumped out at me. In the past, I've been doing the same for a x86 toy OS called Lovelace OS. The spec: -- Lovelace Operating System - An Unix Like Ada'Based Operating system -- Copyright (C) 2013-2014 Xavier GRAVE, Frederic BOYER -- This program is free software: you can redistribute it and/or modify -- it under the terms of the GNU General Public License as published by -- the Free Software Foundation, either version 3 of the License, or -- (at your option) any later version. -- This program is distributed in the hope that it will be useful, -- but WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -- GNU General Public License for more details. -- You should have received a copy of the GNU General Public License -- along with this program. If not, see <http://www.gnu.org/licenses/>. pragma Suppress (All_Checks); with System; with System.Storage_Elements; procedure Oasys.Memset (Destination : in System.Address; Value : in System.Storage_Elements.Storage_Element; Count : in System.Storage_Elements.Storage_Count); pragma Pure (Memset); pragma Export (C, Memset, "memset"); Then the body for the x86-32 part -- Lovelace Operating System - An Unix Like Ada'Based Operating system -- Copyright (C) 2013-2014 Xavier GRAVE, Frederic BOYER -- This program is free software: you can redistribute it and/or modify -- it under the terms of the GNU General Public License as published by -- the Free Software Foundation, either version 3 of the License, or -- (at your option) any later version. -- This program is distributed in the hope that it will be useful, -- but WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -- GNU General Public License for more details. -- You should have received a copy of the GNU General Public License -- along with this program. If not, see <http://www.gnu.org/licenses/>. with System.Storage_Elements; use System.Storage_Elements; with Oasys.Debug; procedure Oasys.Memset (Destination : in System.Address; Value : in Storage_Element; Count : in Storage_Count) is pragma Suppress (All_Checks); -- Storage_unit zones Byte_Zone_Destination : Storage_Array (1 .. Count); for Byte_Zone_Destination'Address use Destination; Offset : Storage_Offset := 0; Number_Of_Bytes_To_Write : Storage_Count := Count; -- CPU width (ie 32 or 64 bits) in bytes Full_Width : constant := System.Word_Size / System.Storage_Unit; begin -- see http://www.noxeos.com/2013/08/06/code-optimisations/ -- Algo -- offset = 0 -- while destination + offset is not aligned, -- set the byte to value and add 1 to offset -- decrease count of elements to set while ((Destination + Offset) mod Full_Width /= 0) loop -- Warning ! We use offset + 1 for indexing the array -- because offset starts at 0 and arrays to 1 Byte_Zone_Destination (Offset + 1) := Value; Offset := Offset + 1; Number_Of_Bytes_To_Write := Number_Of_Bytes_To_Write - 1; end loop; declare -- Number of word zones Number_Of_Zones : constant Natural := Natural (Number_Of_Bytes_To_Write) / Full_Width; type Word is mod 2**System.Word_Size; -- word long zones type Long_Word_Zone_Array is array (1 .. Number_Of_Zones) of Word; Zone_Destination : Long_Word_Zone_Array; for Zone_Destination'Address use (Destination + Offset); Remaining_Bytes : constant Natural := Natural (Number_Of_Bytes_To_Write) rem Full_Width; Long_Value : Word := 0; Power : Natural := 0; begin -- Creating the long value while Power < System.Word_Size loop Oasys.Debug.Put_String ("Long value " & Word'Image (Long_Value)); Oasys.Debug.New_Line; Oasys.Debug.Put_String ("Power = " & Natural'Image (Power)); Oasys.Debug.New_Line; Oasys.Debug.Put_String ("Value shifted = " & Word'Image (Word (Value) * 2**Power)); Oasys.Debug.New_Line; Long_Value := Long_Value + Word (Value) * 2**Power; Power := Power + System.Storage_Unit; end loop; Oasys.Debug.Put_String ("Long value " & Word'Image (Long_Value)); Oasys.Debug.New_Line; -- As we are aligned, -- find how many aligned word we have -- build an array from (destination + last_offset) to last aligned count -- This way, we use the full width of the CPU for Index in Zone_Destination'Range loop Zone_Destination (Index) := Long_Value; Offset := Offset + Full_Width; end loop; -- If there are still small bytes to change if (Remaining_Bytes /= 0) then -- For the same reason as above, we add one to Offset for Index in Offset - 1 .. Byte_Zone_Destination'Last loop Byte_Zone_Destination (Index) := Value; end loop; end if; end; end Oasys.Memset; For what it's worth ;) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-05-24 15:08 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-24 16:06 avoiding builtin memset Jere 2017-04-24 16:56 ` Shark8 2017-04-25 1:21 ` Anh Vo 2017-04-25 2:57 ` Luke A. Guest 2017-04-25 18:43 ` Shark8 2017-04-25 22:18 ` Luke A. Guest 2017-04-26 7:35 ` Simon Wright 2017-04-26 13:44 ` Lucretia 2017-04-26 15:22 ` Simon Wright 2017-04-27 0:22 ` Jere 2017-04-27 4:35 ` J-P. Rosen 2017-04-27 7:09 ` Simon Wright 2017-05-24 15:08 ` Frédéric PRACA
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox