*
* *
*
scene.org
Log in:
login for 1 year
No account? register here

Scene.org is hosted and supported by:
Scene.org is sponsored by:
* forum - #coders

*
Topic:  fast copy of byte into doubleword
* Posted by Fillbert Tuesday 13 January 2004 - 22:47 
i have the following problem.
i've to copy the bytes from eax into ebx, ecx and edx in this way:

bl = al, bh = al and now the high bytes of ebx bl (high) = al

and ah will loaded into ecx and al (high) into edx.

my algo is this:

mov bl,al
mov cl,ah
shl ebx,8
shl ecx,8
shr eax,8
mov bl,bh
mov cl,ch
mov dl,ah
mov dh,ah
shl edx,8
mov dl,dh

i think there are better ways to do this, but i don't know. if you know please leave a message.

thanxs

Fillbert / Creative Mind

* Posted by anvil Wednesday 14 January 2004 - 15:42 
I can't say I'm very updated on the intel processors but as far as I know you'd better be careful when using byte-registers since it may cause some unpleasant stalls. Have you tried doing it using only the 32bit versions of the registers with masking and shifting?

* Posted by stas87 Wednesday 14 January 2004 - 23:36 
Try this. i think there are no stals
at least 6 bytes shorter:

mov ecx, eax ;; 00DDCCBB
mov bl,al ;; XXXXXXBB
rol eax,8 ;; DDCCBB00
ror ecx,8 ;; BB00DDCC
mov bh,ah ;; XXXXBBBB - OK
mov dh,ch ;; XXXXDDXX
bswap eax ;; 00BBCCDD
mov ch,ah ;; XXXXCCCC - OK
mov dl,al ;; XXXXDDDD - OK

Oh, sorry. Just took a look at your code in OllyDbg
This is bigger 1 byte, but i hope faster. At least now it's correct :)

mov ecx, eax
mov ebx, eax
rol ecx, 8
rol ebx, 16
mov edx, eax
mov bh, ch
mov dl, bl
mov bl, al
ror eax, 8
mov ch, dh
mov dh, ah
mov cl, al

regards,
S.T.A.S.

[Post edited by stas87 on Thursday 15 January 2004 - 0:23]


* Posted by a0a Thursday 15 January 2004 - 14:15 
Let's first see what you're trying to do in a normal form:

ebx = 0x(01)0101 * al
ecx = 0x(01)0101 * ah
shr eax,16
edx = 0x 01 0101 * al

maybe you can interleave the muls with other code, or even use the fpu to prepare this stuff in the background; SIMD & SSE have beautifull instructions for exactly this kind of thing

* Posted by Fillbert Thursday 22 January 2004 - 9:26 
Okay i found a possibility by myself.

I've found a opcode called pshufw included in sse 1 that can do exactly that what i want.

Thanx to all

Fillbert / Creative Mind

*