why do repe and repne do the same before movsb?

why do repe and repne do the same before movsb?
Tag : string
Date : December 05 2020, 12:22 PM

With these it helps In the machine code, there are actually only two different prefix bytes.
0xF3 is called REP when used with MOVS/LODS/STOS/INS/OUTS (instructions which don't affect flags) 0xF3 is called REPE or REPZ when used with CMPS/SCAS 0xF2 is called REPNE or REPNZ when used with CMPS/SCAS, and is not documented for other instructions.

Are x86 assembly instructions REPE/REPZ and REPNE/REPNZ equal?

Tag : assembly
Date : March 29 2020, 07:55 AM
like below fixes the issue Yes - they are synonyms
The Intel manual volume 2B (IntelĀ® 64 and IA-32 Architectures Software Developer's Manual Volume 2B: Instruction Set Reference, N-Z) says:

Assembly x86 movsb

Tag : video
Date : March 29 2020, 07:55 AM
hope this fix your issue You are moving address of foo directly to segment register. As you probably already know, segment registers are used to hold 20-bit address that is added to the offset. In your case address of foo is something like 0x7E06. When you move this to segment register and zero out the offset, you get address 0x7E06 << 4 + 0 = 0x7E060.
For this case you can use seg as Michael pointed out in the comments:
mov ax, seg foo
mov ds, ax
mov ax, foo
shr ax, 4
mov ds, ax

How does CPU differentiate REP and REPE instructions?

Tag : assembly
Date : March 29 2020, 07:55 AM
wish help you to fix your issue REP and REPE are prefixes, not instructions. Some instructions accept the REPE prefix. Others accept REP. The ISA was designed so that no instruction accepts both, so the problem does not arise.

How to force GCC to produce REPNE SCAS (x86 assembly), not CMP?

Tag : gcc
Date : March 29 2020, 07:55 AM
it helps some times You don't indicate why you seem to think that you should see REPNE SCAS, but regardless of optimization level you should not see it.
If you refer to the Agner Fog tables (last updated in 2014) you'll find that REPNE SCAS is always less optimal than a CMP followed by a JE.

Coaxing GCC to emit REPE CMPSB

Tag : c
Date : March 29 2020, 07:55 AM
it helps some times rep cmps isn't fast; it's >= 2 cycles per count throughput on Haswell, for example, plus startup overhead. (http://agner.org/optimize). You can get a regular byte-at-a-time loop to go at 1 compare per clock (modern CPUs can run 2 loads per clock) even when you have to check for a match and for a 0 terminator, if you write it carefully.
InstLatx64 numbers agree: Haswell can manage 1 cycle per byte for rep cmpsb, but that's total bandwidth (i.e. 2 cycles to compare 1 byte from each string).
#include <string.h>

int string_equal(const char *s) {
    return 0 == strcmp(s, "test1");
    .string "test1"
    mov     rsi, rdi
    mov     ecx, 6
    mov     edi, OFFSET FLAT:.LC0
    repz cmpsb
    setne   al
    movzx   eax, al
int cmp_mem(const char *s) {
    return 0 == memcmp(s, "test1", 6);

    cmp     DWORD PTR [rdi], 1953719668  # 0x74736574
    je      .L8
    mov     eax, 1
    xor     eax, 1          # missed optimization here after the memcmp pattern; should just xor eax,eax
    xor     eax, eax
    cmp     WORD PTR [rdi+4], 49     # check last 2 bytes
    jne     .L5
    xor     eax, 1
