 |
|
|
Scene.org is hosted and supported by:
|
|
|
Scene.org is sponsored by:
|
|
|
|
|
 |
forum - #coders |
|
 | | Topic: | fast copy of byte into doubleword | | | i have the following problem.
i've to copy the bytes from eax into ebx, ecx and edx in this way:
bl = al, bh = al and now the high bytes of ebx bl (high) = al
and ah will loaded into ecx and al (high) into edx.
my algo is this:
mov bl,al
mov cl,ah
shl ebx,8
shl ecx,8
shr eax,8
mov bl,bh
mov cl,ch
mov dl,ah
mov dh,ah
shl edx,8
mov dl,dh
i think there are better ways to do this, but i don't know. if you know please leave a message.
thanxs
Fillbert / Creative Mind | | |
| | I can't say I'm very updated on the intel processors but as far as I know you'd better be careful when using byte-registers since it may cause some unpleasant stalls. Have you tried doing it using only the 32bit versions of the registers with masking and shifting? | | |
| Try this. i think there are no stals
at least 6 bytes shorter:
mov ecx, eax ;; 00DDCCBB
mov bl,al ;; XXXXXXBB
rol eax,8 ;; DDCCBB00
ror ecx,8 ;; BB00DDCC
mov bh,ah ;; XXXXBBBB - OK
mov dh,ch ;; XXXXDDXX
bswap eax ;; 00BBCCDD
mov ch,ah ;; XXXXCCCC - OK
mov dl,al ;; XXXXDDDD - OK
Oh, sorry. Just took a look at your code in OllyDbg
This is bigger 1 byte, but i hope faster. At least now it's correct :)
mov ecx, eax
mov ebx, eax
rol ecx, 8
rol ebx, 16
mov edx, eax
mov bh, ch
mov dl, bl
mov bl, al
ror eax, 8
mov ch, dh
mov dh, ah
mov cl, al
regards,
S.T.A.S.[Post edited by stas87 on Thursday 15 January 2004 - 0:23] | | |
| Let's first see what you're trying to do in a normal form:
ebx = 0x(01)0101 * al
ecx = 0x(01)0101 * ah
shr eax,16
edx = 0x 01 0101 * al
maybe you can interleave the muls with other code, or even use the fpu to prepare this stuff in the background; SIMD & SSE have beautifull instructions for exactly this kind of thing | | |
| Okay i found a possibility by myself.
I've found a opcode called pshufw included in sse 1 that can do exactly that what i want.
Thanx to all
Fillbert / Creative Mind | | |
|
|
|