Solution to Practical Reverse Engineering I. Chapter 01, Page 11

I was playing with the flare-on challenge last month, and I realized I was bit too rusty with RE. So, since I have recently received my copy of the book Practical Reverse Engineering, I decided to write some posts with my solutions to the exercises. This way, I can keep this blog alive and, at the same time, I’m forced to document what I do. The exercise I’ll be solving is located in chapter 1, page 11 of the book:

1. This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does.

mov     edi, [ebp+8]
mov     edx, edi
xor     eax, eax
or      ecx, 0FFFFFFFFh
repne scasb
add     ecx, 2
neg     ecx
mov     al, [ebp+0Ch]
mov     edi, edx
rep stosb
mov     eax, edx

Answering the first question is very easy just by reading the book. ebp + 8 is an array structure. If we look at line 5, we see a scab instruction (Scan String) working with the data in ebp + 8. This instruction works with strings, so ebp + 8 is a variable type char*.
As for [ebp+0Ch], looking at line 8 we can see the are assigning the value to the registry al. The al registry size in 1B so it has to be a char variable.
Diagram1

Image taken from friedspace.com

Let’s now take a look at the snippet.
The first thing that took my attention was the use of ebp + 8 and ebp + 0C. This kind of structure (ebp + X) is used to point to the arguments of a function. This is compiler dependent, but, generally speaking, after calling a function, the stack would look like this:

       |    ...   |  
       |----------|  |
            ...      |
       |   Args   |  |
       |    ...   |  |
EBP+8->|----------|  |
       |  Return  |  |
EBP+4->|----------|  |
       |    EBP   |  v
  EBP->|----------| Stack
       |  Loc.Var | growing
       |----------| direction.
       |          |
       |          |
       |          |

Looking at the image, we can see that EBP points to the Base Pointer, EBP + 4 points to the return address, and EBP + 8 onwards points to the space where the arguments are stored. The first argument is in EBP + 8, the next one in EBP + (8 + sizeof (ebp+ 8)) and so on.
For now, we know that the snippet of code we are analyzing is a function with at least two arguments like the following:

foo(char *buff, char c)

To support the explanation, I’m going to use the string “ola ke ase” as the value of buff, and ‘x’ as the value of c.
Let’s go with the snippet. The first instruction is:

mov edi, [ebp+8]        //edi = buff
edi -> "ola ke ase"

Move the first argument to edi.

 
mov edx, edi            //edx = edi = buff

edi -> "ola ke ase"
edx -> "ola ke ase"

Save a copy of buff in edx, we’ll see why later on.

 
xor eax, eax            //eax = 0
 
edi -> "ola ke ase"
edx -> "ola ke ase"
eax = 0

xor reg, reg is a very typical instruction to set a registry to 0, as xoring something with itself is allways 0.

 
or ecx, 0FFFFFFFFh  //ecx = FFFFFFFFh = -1
 
edi -> "ola ke ase"
edx -> "ola ke ase"
eax = 0
ecx = -1

Set ecx to -1.
When you ‘or’ something with 1, it will always be 1. As ecx is 8 bytes, ecx | FFFFFFFF = FFFFFFFFF. And FFFFFFFF is the two’s complement representation of -1. ecx = -1

 
repne scasb

There are two instructions here: repne and scasb. This couple of instructions was the biggest headache for me in this exercise.

repne repeats until not equal. It will repeat until one of the two following conditions is met: ecx = 0 or ZF = 1. Each time it repeats the instruction, it decrements the value of ecx by 1 and checks if ZF = 1. As ecx = -1, it will never be 0. Therefore, the only condition stopping this loop is ZF = 1.

scasb scans the string in edi looking for the value in eax. For each iteration, scasb checks if the value of the address in edi is the value in eax (0). This means it will try to find the end of the string. For each iteration, it will increase the value of edi by 4 bytes, i.e. it will point to the next character, and “set the flags” (I couldn’t find good info about what it does, but I assume that at least it sets ZF to 1).

Let’s check some iterations:

  
1:
edi -> "ola ke ase"
edx -> "ola ke ase"
eax = 0
ecx = -1
 
2:
edi -> "la ke ase"
edx -> "ola ke ase"
eax = 0
ecx = -2
 
3:
edi -> "a ke ase"
edx -> "ola ke ase"
eax = 0
ecx = -3
 
[...]
 
N:
edi -> ""
edx -> "ola ke ase"
eax = 0
ecx = -12

As we can see, edi has been modified and now points to the end of buff. The registry ecx has changed too, but… What is that “-12”? We will see it in the next instructions.

add ecx, 2
neg ecx

edi -> ""
edx -> "ola ke ase"
eax = 0
ecx = 10

These two instructions come together because they are very straight forward. First, add 2 to ecx and then negate the result. So ecx + 2 = -12 + 2 = -10. -(-10) = 10, which is equal to the length of buff. ecx has been used as a counter and contains the length of buff.

ecx = strlen(buff);

Let’s keep going…

mov al, [ebp+0Ch] //al = c

edi -> ""
edx -> "ola ke ase"
eax -> 'x'
ecx = 10

The lower byte of eax = ‘x’. As eax = 0 = 00000000, it just take the value ‘x’.

mov edi, edx //edi = buff

edi -> "ola ke ase"
edx -> "ola ke ase"
eax -> 'x'
ecx = 10

edi points to the beginning of buff. That’s why the software sets edx to buff in the second line: to be able to restore edi to use it again from the beginning with the instruction stosb.

rep stosb 

We have two instructions again: rep and stosb.
rep (repeat), as repne, repeats the instruction until a condition is met. In this case, the condition is ecx = 0. Each repetition decreases the value on ecx by 1. As the initial value is 10, it will repeat the instruction stosb 10 times.
The instruction stosb (Store String) takes the value in al, and puts it where edi is pointing. Then, it increments edi by 4 again to point to the next character.

In short, rep stosb will set to ‘x’ every single character on buff. Let’s see some iterations:

rep stosb

1:
edi -> "la ke ase"
edx -> "xla ke ase"
eax -> 'x'
ecx = 9

2:
edi -> "a ke ase"
edx -> "xxa ke ase"
eax -> 'x'
ecx = 9

[...]

10:
edi -> ""
edx -> "xxxxxxxxxx"
eax -> 'x'
ecx = 0

In the first and second iterations, as edi and edx are pointing to the same memory address, the modification of edi affects edx.
This looks like an implementation of the c function memset(), so the pseudo-c code for this last part should be something like:

memset(buff, c, ecx); //remember we have the length of buff stored in ecx

And finally:

mov eax, edx 

In Assembly, the return value of a function is the one stored in eax, so we can expect a ret instruction after this one.
And that’s it! In pseudo-c, the snippet would look like this:

char* foo(char *buff, uchar c)
{
        int len = strlen(buff);
        memset(buff, c, len);
        return buff;
}

Or even like this, since we can’t see any space allocation for the local variable in the code:

char* foo(char *buff, uchar c)
{
        memset(buff, c, strlen(buff));
        return buff;
}

References
[1] Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
[2] C and C++ Data Types
[3] Intel Architecture Software Developer’s Manual. Vol 2: Instruction Set Reference [PDF]

One comment