With these it helps In the machine code, there are actually only two different prefix bytes. 0xF3 is called REP when used with MOVS/LODS/STOS/INS/OUTS (instructions which don't affect flags) 0xF3 is called REPE or REPZ when used with CMPS/SCAS 0xF2 is called REPNE or REPNZ when used with CMPS/SCAS, and is not documented for other instructions.
hope this fix your issue You are moving address of foo directly to segment register. As you probably already know, segment registers are used to hold 20-bit address that is added to the offset. In your case address of foo is something like 0x7E06. When you move this to segment register and zero out the offset, you get address 0x7E06 << 4 + 0 = 0x7E060. For this case you can use seg as Michael pointed out in the comments:
mov ax, seg foo
mov ds, ax
mov ax, foo
shr ax, 4
mov ds, ax
How does CPU differentiate REP and REPE instructions?
wish help you to fix your issue REP and REPE are prefixes, not instructions. Some instructions accept the REPE prefix. Others accept REP. The ISA was designed so that no instruction accepts both, so the problem does not arise.
How to force GCC to produce REPNE SCAS (x86 assembly), not CMP?
it helps some times You don't indicate why you seem to think that you should see REPNE SCAS, but regardless of optimization level you should not see it. If you refer to the Agner Fog tables (last updated in 2014) you'll find that REPNE SCAS is always less optimal than a CMP followed by a JE.
it helps some times rep cmps isn't fast; it's >= 2 cycles per count throughput on Haswell, for example, plus startup overhead. (http://agner.org/optimize). You can get a regular byte-at-a-time loop to go at 1 compare per clock (modern CPUs can run 2 loads per clock) even when you have to check for a match and for a 0 terminator, if you write it carefully. InstLatx64 numbers agree: Haswell can manage 1 cycle per byte for rep cmpsb, but that's total bandwidth (i.e. 2 cycles to compare 1 byte from each string).