I have held off posting this solution until the deadline on GCHQ’s website had expired, my hope is that this will help to ensure that those who are “unworthy” (in their view) do not abuse the solutions I give here. Before we jump right in my special thanks go to my awesome colleague Lionel Lemarié who was also working on this at the same time. Bouncing ideas and thoughts off such a knowledgeable chap was super useful and really does go to prove that two heads are better than one! This post is also reproduced on my personal blog.

On the 1st of December 2011 I saw the tag GCHQ trending on Twitter. Apparently they had set a challenge to British Nationals to try and crack a code they released on the website http://www.canyoucrackit.co.uk. As someone who enjoys puzzles I thought it’d be fun to have a go at trying to crack the code they put up. This blog post details my efforts in solving the code.

Stage 1:

If you visit http://www.canyoucrackit.co.uk you will be greeted with something that looks like a really bad IT system in a Hollywood Blockbuster. Most prominently there is an image with 160 bytes of hexadecimal, split into 10 rows of 16 entries. The first stage of solving this puzzle was to get this data into binary format. There was no real good way of doing this except to painstaking transcribe the bytes by hand into my favourite Hex Editor. I double checked the data and also had a friend check it to make sure there was no errors at this point. My initial feeling was perhaps it was some “cryptographic in joke”, where the data hashed to make a readable bit of plain text. I ran the data through a bunch of hashing functions, a bunch of SHA variants, CRCs, the works. Nothing! It wasn’t until I noticed the string of bytes 0xef 0xeb 0xad 0xde, that it set alarm bells ringing in my head. This was an executable payload! Once this realisation set in it only took a cursory glance at the first byte of the data, 0xeb, to confirm that it was likely to be x86. Here is the binary file I created for those who are interested, please excuse the .txt extension (WordPress flags my .bin files and even .c files as security risks so a lot of downloads from this post are .txt).

It actually turns out that the payload in the image on the site is only half the story. There is a second part to it, which is inside cypber.png itself. So download the main image from the site and fire up your favourite hex editor. You’re looking for an iTXt chunk, which is the header of some metadata.

QkJCQjIAAACR2PFtcCA6q2eaC8SR+8dmD/zNzLQC+td3tFQ4qx8O447TDeuZw5P+0SsbEcYR
78jKLw==

The presence of the == at the end was a hint that the data was actually base64 encoded. So you’ll need to decode this, I used this site. Once you have that, your payload is now complete.


001C13AE jmp main+24h (1C13B4h)
001C13B0 scas dword ptr es:[edi]
001C13B1 ret 0A3BFh
001C13B4 sub esp,100h
001C13BA xor ecx,ecx
001C13BC mov byte ptr [esp+ecx],cl
001C13BF inc cl
001C13C1 jne main+2Ch (1C13BCh)
001C13C3 xor eax,eax
001C13C5 mov edx,0DEADBEEFh
001C13CA add al,byte ptr [esp+ecx]
001C13CD add al,dl
001C13CF ror edx,8
001C13D2 mov bl,byte ptr [esp+ecx]
001C13D5 mov bh,byte ptr [esp+eax]
001C13D8 mov byte ptr [esp+eax],bl
001C13DB mov byte ptr [esp+ecx],bh
001C13DE inc cl
001C13E0 jne main+3Ah (1C13CAh)
001C13E2 jmp main+0B3h (1C1443h)
001C13E7 mov ebx,esp
001C13E9 add ebx,4
001C13EF pop esp
001C13F0 pop eax
001C13F1 cmp eax,41414141h
001C13F6 jne main+0ABh (1C143Bh)
001C13F8 pop eax
001C13F9 cmp eax,42424242h
001C13FE jne main+0ABh (1C143Bh)
001C1400 pop edx
001C1401 mov ecx,edx
001C1403 mov esi,esp
001C1405 mov edi,ebx
001C1407 sub edi,ecx
001C1409 rep movs byte ptr es:[edi],byte ptr [esi]
001C140B mov esi,ebx
001C140D mov ecx,edx
001C140F mov edi,ebx
001C1411 sub edi,ecx
001C1413 xor eax,eax
001C1415 xor ebx,ebx
001C1417 xor edx,edx
001C1419 inc al
001C141B add bl,byte ptr [esi+eax]
001C141E mov dl,byte ptr [esi+eax]
001C1421 mov dh,byte ptr [esi+ebx]
001C1424 mov byte ptr [esi+eax],dh
001C1427 mov byte ptr [esi+ebx],dl
001C142A add dl,dh
001C142C xor dh,dh
001C142E mov bl,byte ptr [esi+edx]
001C1431 mov dl,byte ptr [edi]
001C1433 xor dl,bl
001C1435 mov byte ptr [edi],dl
001C1437 inc edi
001C1438 dec ecx
001C1439 jne main+89h (1C1419h)
001C143B xor ebx,ebx
001C143D mov eax,ebx
001C143F inc al
001C1441 nop
001C1442 nop
001C1443 nop
001C1444 nop
001C1445 call main+57h (1C13E7h)
001C144A inc ecx
001C144B inc ecx
001C144C inc ecx
001C144D inc ecx
001C144E inc edx
001C144F inc edx
001C1450 inc edx
001C1451 inc edx
001C1452 xor al,byte ptr [eax]
001C1454 add byte ptr [eax],al
001C1456 xchg eax,ecx
001C1457 fdiv st,st(1)
001C1459 ins dword ptr es:[edi],dx
001C145A jo main+0ECh (1C147Ch)
001C145C cmp ch,byte ptr [ebx-3BF46599h]
001C1462 xchg eax,ecx
001C1463 sti
001C1464 db c7h
001C1465 paddb xmm1,xmm5
001C1469 int 3
001C146A mov ah,2
001C146C cli
001C146D xlat byte ptr [ebx]
001C146E ja main+94h (1C1424h)
001C1470 push esp
001C1471 cmp byte ptr [ebx-711CF1E1h],ch
001C1477 ror dword ptr ds:[93C399EBh],cl
001C147D db feh
001C147E shr dword ptr [ebx],1
001C1480 sbb edx,dword ptr [ecx]
001C1482 db c6h
001C1483 adc edi,ebp
001C1485 enter 2FCAh,0Dh

As it would be less than smart to simply execute a chunk of unknown x86 binary on my machine (regardless of its source), I disassembled the binaries to make sure they weren’t harmful prior to running them. To run the payload you can do it in two different methods. You can use a little C program, in this program you just need to dump the data as an array of unsigned chars, and then make a function pointer to the prototype typedef void(*executePayload)();. A lovely typecast from the unsigned char array to a function pointer will do the trick. However, most systems these days come with various security measures to stop malicious code chewing through your system. One such feature is DEP (Data Execution Protection). In order to execute a payload in this way you need to make sure the memory that contains the payload is mapped correctly with the correct executable flags. Here is the source file that will execute the payload. You can compile it and execute it with the following commands in your Linux terminal and then use GDB to debug it:

gcc -g -O0 payload.c -o payload
./payload

In visual studio, you can just do __asm _emit for each of the bytes after you byte swap them. You can download my source file for this here.

The first thing the payload does it jump the next 4 bytes (this will be important later on), then it allocated a buffer on the stack by moving esp and then does a bunch of ops to build strings in these buffers. It’s not really important or interesting what it’s actually doing, to get a hold of what you want, just run until the int 0x80 call, and then check out the stack frame in the memory window. It’ll contain a URL to a .js file.

Stage 2:

If we download the file we are presented with some JavaScript source code. It’s just a case of implementing the virtual machine, running it til it hits a halt instruction and checking the memory buffer for the URL.

The virtual machine as described had a handful of 8-bit registers, 8 instructions and a 16-byte segmented memory model.

Instruction Fetch & Decode

The op codes that are described in the JS file take the format shown in the image below, it is trivial to knock up a bit of C code to fetch and decode each op.


/* fetch & decode: */
const uint8_t byte = g_mem[calculateIndex(g_cpu.cs, g_cpu.ip)];
const uint8_t opCode = (byte & 0xe0) >> 5;
const uint8_t mod = (byte & 0x10) >> 4;
const uint8_t op1 = (byte & 0xf);
const uint8_t op2 = g_mem[calculateIndex(g_cpu.cs, g_cpu.ip)+1]; /* optional */

Addressing

Segmented addressing means that in order to address something you use a segment index (which in this case selects which 16-byte location of memory you are in), and then an offset from that segment’s base. The offset is not necessarily constrained to [0..15], check out the C code below. Here is the source file for my VM written in C. If that wasn’t enough Lionel has very kindly made available his JS source code for stage 2, which you can get here. Alternatively just click here if you just want to run Lionel’s VM in your browser and see his cool debug output.


int16_t calculateIndex(const uint8_t segment, const uint8_t offset) {

    const int16_t segment16 = (uint16_t)segment;
    const int16_t offset16 = (uint16_t)offset;
    const int16_t index = (segment16 << 4) + offset16;
    return index;
}

jmp

A simple unconditional jump. Just sets cs and ip to whatever the jump target is. If mod bit is not set, then the cs register is left unaffected.

movr & movm

These instructions just move data between registers and also to and from memory.

add, xor and cmp

These three instructions have similar characteristics from the mod flag perspective, and basically do exactly what it says on the tin. However, in the case of cmp there are some additional changes required to the flags register. Comparisons are usually implemented as a subtraction, if the result is 0 then the values are equal, and hence the flags register is set to 0. If the result of the subtraction yields a positive value, then the first operand is larger than the second. In the specifications for this virtual machine the flags register is required to be set to 0x1 in this case. And if the sign bit is set, (i.e.: the subtract yields a negative result) then the flags register should be set to 0xff. This is shown in the code snippet below.


const int8_t result = getRegisterValue(op1) - getRegisterValue(op2);
if(result == 0)
    setRegisterValue(REGISTER_FLAG, 0);
else if(result < 0)
    setRegisterValue(REGISTER_FLAG, 0xff);
else
    setRegisterValue(REGISTER_FLAG, 1);

jmpe

This instruction will jump when the flag register is set to 0x0, which indicates that the result of a preceding cmp instruction was 0.

if(g_cpu.fl == 0x0) {
    if(!mod) {
        g_cpu.ip = getRegisterValue(op1);
        printf("jmpe r%d\n", op1);
    } else {
        g_cpu.cs = op2;
        g_cpu.ip = getRegisterValue(op1);
        printf("jmpe #%x:r%d\n", op2, op1);
    }
}

hlt

The halt instruction (as the name suggests) simply brings the virtual machine to a halt.

Wrapping it up…

On examining the source code an astute reader maybe notice that the firmware is not used at all. I tried disassembling this stuff with the disassembler I wrote before I wrote the VM (also included in my little C file, just change the argument to cpuInit to true, it was a load of rubbish, so it didn’t make sense to run it. Initially why it was there was a mystery (but it became obvious later)!

Another thing to note is that it turns out that their specifications aren’t the entire picture. If you take a look at the CS register you will notice that is never non-zero. Seems odd to have a completely redundant register, well it turns out it *is* required. The CS register I assume stands for code-segment (matching the DS register which I guess is data-segment). This should have been my first clue.

Looks like another URL, this time to a .exe file.

Stage 3:

So after downloading the executable file my first stop was to disassemble it. My tool of choice here was IDA Freeware. IDA is an awesome tool which I would highly recommend, you can grab it here. I also used a Hex Editor, my tool of choice is Hex Workshop, which you can grab here. Once you disassemble it under IDA you need to run the exe and step it (you will need to install Cygwin for this). You will notice that it attempts to load a file called license.txt, so I created a txt file with this name and continued to step it. Eventually I hit the following bit of code. It is checking the first 4 bytes of the buffer it reads from license.txt to see if it matches some special magic number (GCHQ).


.text:00401160 mov [ebp+var_4C], 0
.text:00401167 cmp [ebp+gchqHeaderBytes], 71686367h
...
.text:0040117F mov [esp+78h+salt?], eax
.text:00401182 call crypt
.text:00401187 mov edx, eax
.text:00401189 mov eax, DEShash
.text:0040118E mov [esp+78h+var_74], eax
.text:00401192 mov [esp+78h+salt?], edx
.text:00401195 call strcmp
.text:0040119A test eax, eax

So the first 4 bytes of license.txt is easy, you just replicate this value in there. Be careful to byte swap it. The next bit of the code is a bit more tricky. It calculates a hash value for the next few bytes in license.txt and compares it the result to some other magic value (again stored in the data segment of the executable). The first 2 bytes of the result buffer from crypt() will be the salt for the hashing function (to try and guard against dictionary-based attacks). To break this password I fired up John the Ripper. This is a free program you can get from here (just make sure you get the jumbo version that supports the Markov mode). To build John the Ripper Jumbo, fire up your cygwin bash terminal and navigate to the src directory. You will then need to the following, replacing system with your target system:

make clean (system)

Where (system) is replaced with your target system, for me it was win32-cygwin-x86-any. Once it’s built, make a new file in the run directory called “password.txt” and type the following in it:

user:hqDTK7b8K2rvw

Then use the following command line to try and break it:

./john -markov:220 password.txt

Now for me 220 was enough to break the password, but other individuals I’ve spoken to about this said that they had to go up to 240. I’m not exactly sure on how the Markov mode works in John the Ripper, but potentially you might need to run a few times at 220 to break the password, or up the level. On my laptop which has an Intel Core i5 in it, it took about 5 minutes to break the password.


$ ./john -markov:220 password.txt
Loaded 1 password hash (Traditional DES [24/32 4K])
Warning: MaxLen = 12 is too large for the current hash type, reduced to 8
MKV start (lvl=220 len=8 pwd=1601142515)
cyberwin (user)
guesses: 1 time: 0:00:04:39 DONE (Sat Dec 3 17:29:12 2011) c/s: 302851 trying: cybervam - cyberwo
Use the "--show" option to display all of the cracked passwords reliably

At this point it’s probably worth mentioning that actually breaking the password for license.txt is a waste of time. You can crack the .exe without having to generate the password. Later on in the execution flow the .exe will try to set up the stage 1 and stage 2 license keys. All this involves is a copy (or two) from the license.txt buffer that was read in via the standard C file I/O stuff. From examining the code you can work out the exact offset into this buffer that the copy is performed. Once you have this, you can subtract the size in bytes of the magic number (which was 4 bytes as you will recall). Congratulations, you now know the password’s length without needing to actually crack the password itself. You can pad license.txt with whatever you like and then just hex edit the conditional jump, to make it non-conditional just after the strcmp of the output buffer from the call to crypt.

So what are the final pieces of the puzzle? It took me a while to work it out, but mentions of “Stage 1″ and “Stage 2″ in the printf are actually clues about using values from stage 1 and stage 2 (duh!). You need a single 4 byte value from stage 1 and 2×4 byte values from stage 2. Stage 2 to me was obvious, the unused firmware values from the VM (just remember to byte swap them!). Stage 1 you can get by looking at the disassembly again, the first instruction jumps the next 4 bytes. :) This data is already byte swapped, as shown by the presence of 0xdeadbeef in the data.

So append these to your license.txt file and you’re done! Here is mine. If you step the code again you will eventually see keygen.exe will try to make a BSD socket and connect to the host you passed in as the argument (as shown below).


.text:004012C4 call connect
.text:004012C9 test eax, eax
.text:004012CB jns short loc_4012EF
.text:004012CD mov eax, [ebp+arg_0]
.text:004012D0 mov [esp+148h+var_144], eax
.text:004012D4 mov [esp+148h+var_148], offset aErrorConnectSF ; "error: connect(\"%s\") failed\n"
.text:004012DB call printf
.text:004012E0 mov [ebp+var_144], 0FFFFFFFFh
.text:004012EA jmp loc_401423

I actually stepped over this stuff until you get to the call to sprintf where it builds the path to request for the HTTP server.


.text:00401315 mov [esp+148h+pathPart1], eax
.text:00401319 mov [esp+148h+var_144], offset aGetSXXXKey_txt ; "GET /%s/%x/%x/%x/key.txt HTTP/1.0\r\n\r\n"
.text:00401321 lea eax, [ebp+removeFilePath]
.text:00401327 mov [esp+148h+var_148], eax
.text:0040132A call sprintf
.text:0040132F lea eax, [ebp+removeFilePath]

The value at removeFilePath in the disasm is the file you want. Fire up your favourite browser and bash that path in. It turns out to be http://www.canyoucrackit.co.uk/hqDTK7b8K2rvw/a3bfc2af/d2ab1f05/da13f110/key.txt


The file contains a single line of text:

Pr0t3ct!on#cyber_security@12*12.2011+

And that’s that… :) I hope you enjoyed this little detour into how I spent a few hours of my life. It was good fun, maybe the games industry should do recruitment tests more like this! :) Until next time.


Post Comment

Please notice: Comments are moderated by an Admin.


Theme © 2005 - 2009 FrederikM.de
BlueMod is a modification of the blueblog_DE Theme by Oliver Wunder