This page looks best with JavaScript enabled

picoCTF Riscy Writeup

 ·  ☕ 9 min read  ·  🦉 Aidan

Riscy

Analysis and solution of picoCTF’s RE Hard question Riscy

First contact

First thing was to run file on the provided binary, which yields the following:

1
riscy: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), statically linked, stripped

As expected based on the title, RISC-V. Just a few points of interest, RVC means we can expect to see compressed instructions, which are the same but smaller. e.g. li can become c.li, using half the size for the instruction at the cost of supporting smaller values to load. It’s statically linked, so no external libraries needed, but also that means that’s not a method to attack it. Finally, it’s stripped, which isn’t the end of the world, an ELF will always give us entry.

I ran readelf -a on it next, nothing too exciting file didn’t already get.

Finally a cursory checksec shows the following:

1
2
3
4
5
Arch:     riscv64-64-little
-RELRO:    No RELRO
-Stack:    No canary found
+NX:       NX enabled
-PIE:      No PIE (0x10000)

Static analysis with Ghidra

ecall woes

A common problem I encountered was that Ghidra doesn’t play nice with ecall, which is how RISC-V does syscalls and hands control over to the OS/kernel for a second. What this means is we get a lot of code like this:

li      a7, 0x40
li      a0, 0x1
auipc   a1, 0x0
addi    a1, a1, 0x123
li      a2, 0x20
ecall

Except that Ghidra’s decompiler will only display ecall().

Going over this, we have a7 = 0x40 indicating the syscall for “write”, a0 = 0x1, indicating that we’re writing to STDOUT, a1 = pc + 0x123, which is the relative address of the string we’re printing, a2 = 0x20, a length, and finally the ecall to tie it all together. In C, this would be something like print("Some string");.

Overall Ghidra decompiled RISC-V somewhat poorly. Disassembly however was good. The analysis process was about 50/50 analyzing disassembled code vs decompiled code.

Redefine function signatures first

A lesson I learned from this is that decompiling works much better when the function signatures are more correct. I got most of the way through shuffle_and_fetch, then modified the function signature, and Ghidra automatically re-decompiled the function much more accurately.

Before changing function signature
Before: The default function signature

The new function signature:

After changing function signature
After: The new function signature

Function list

Function list in Ghidra
Function list in Ghidra

There are only 4 functions to analyze in the binary, excluding local jumps and labels.

  • entry - Where the ELF loader starts
  • init_shuffle - Makes an array of bytes based on the text the user inputs, and another value that’s hardcoded in entry.
  • shuffle_and_fetch - Mutates the array from init_shuffle and returns a single byte.
  • sys_exit - All this does is invoke the syscall to exit.

Between init_shuffle and shuffle_and_fetch, there’s so much shuffling going on, I couldn’t keep track of it all. At a very high level from the PoV of entry, the user inputs a flag guess, which builds the array from init_shuffle, and then shuffle_and_fetch mutates the array and returns a value. That value is compared against a stored chunk of bytes.

Analysis: init_shuffle

init_shuffle decompiled code
init_shuffle decompiled code

First thing’s first, this function is only called once from entry. When it’s called, there are 3 parameters passed to it, an output pointer on entry’s stack, a pointer to the user’s inputed string, and the value 8.

init_shuffle disassembly called from entry
init_shuffle disassembly called from entry

The function then builds a byte array, then as the name implies, shuffles it around. Notably, the index as it shuffles is affected by the values in the string.

Analysis: shuffle_and_fetch

shuffle_and_fetch decompiled code
shuffle_and_fetch decompiled code

Despite being shorter, this feels even shufflier. Reasonable variable labels went out the window and I just named them for the registers that hold them. For those wondering where a3 is, it does get used as a scratch register to store the index used on the return line. The basic function of this is to advance through the array and produce shuffled data to be compared with the stored bytes.

Dynamic Analysis

Getting emulation running wasn’t terrible, I mostly followed this guide, which walks you through cross-compiling Linux, and Busybox for RISC-V, and building QEMU to run RISC-V.

Docker

I produced a Dockerfile and a few shell scripts to start all that for me, and soon I could jump into a RISC-V environment very easily. I ran this on an Ubuntu host, I think you would need to do something similar because running this requires the --privileged flag.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
FROM ubuntu:22.04

RUN mkdir /riscv
WORKDIR /riscv

# Install deps and tools
RUN apt update && \
    apt install -y build-essential python3 ninja-build pkg-config libglib2.0-dev \
        libpixman-1-dev libslirp-dev flex bison bc file device-tree-compiler git \
        gcc-riscv64-linux-gnu python3.10-venv python3-pip gcc-riscv64-linux-gnu \
        libc6-dev-riscv64-cross tmux curl nano
RUN pip3 install tomli

# Pull repos
RUN git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
RUN git clone https://github.com/qemu/qemu.git
RUN git clone -b 1_36_stable https://github.com/mirror/busybox.git

# Build Linux
WORKDIR /riscv/linux
RUN mkdir build
RUN make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- O=build defconfig
RUN make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- O=build -j`nproc`

# Build QEMU
WORKDIR /riscv/qemu
RUN ./configure --enable-debug --target-list=riscv64-softmmu --enable-slirp
RUN make -j`nproc`

# Build busybox
WORKDIR /riscv
RUN CROSS_COMPILE=riscv64-linux-gnu- make -C busybox clean
RUN CROSS_COMPILE=riscv64-linux-gnu- make -C busybox defconfig
RUN CROSS_COMPILE=riscv64-linux-gnu- make -C busybox -j`nproc`

# Copy scripts
WORKDIR /riscv
COPY build_disk_image.sh .
COPY run_qemu.sh .
RUN chmod +x build_disk_image.sh
RUN chmod +x run_qemu.sh

# CMD ./build_disk_image.sh && ./run_qemu.sh
CMD /bin/bash

Because one of the scripts uses mount, this needs to be run with --privileged, like this:

1
docker run -it --rm --privileged -v ./shared_files:/shared_files $(docker build -q .)

Build the disk

The docker image should contain Busybox, built and ready to install. We also want to add our own files, and tweak the OS just a little. I mounted /proc, since that was useful when trying to get maps for dynamic analysis.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash

dd if=/dev/zero of=rootfs.img bs=1M count=256
mkfs.ext4 rootfs.img
mkdir rootfs
mount -o loop rootfs.img rootfs
if [ $? -ne 0 ]; then
    echo "Failed to mount rootfs.img, missing --privileged flag on docker run?"
    exit 1
else
    echo "Docker --privileged flag check successful"
fi
CROSS_COMPILE=riscv64-linux-gnu- LDFLAGS=--static make -C busybox install CONFIG_PREFIX=../rootfs
cd /riscv/rootfs
mkdir -p proc sys dev etc/init.d root
touch etc/fstab
echo "proc        /proc       proc    defaults    0 0" > etc/fstab
echo -e "#!/bin/sh\nmount -a" > etc/init.d/rcS
chmod +x etc/init.d/rcS
cd ..
cp -r /shared_files/* rootfs/root
umount rootfs

Run QEMU

Using the Linux kernel built in the docker build stage, and the drive that was just built, this script will run it all. -s -S allows you to attach gdb to qemu and debug the kernel, but wasn’t used in the end.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/bash -x

KERNEL=linux/build/arch/riscv/boot/Image
DRIVE=rootfs.img

./qemu/build/qemu-system-riscv64 \
    -nographic \
    -machine virt \
    -kernel $KERNEL \
    -append "root=/dev/vda rw console=ttyS0" \
    -drive file=$DRIVE,format=raw,id=hd0,if=none \
    -device virtio-blk-device,drive=hd0 \
#    -s -S \
    $*

Debugging

Using gdb-multiarch, I could attach to QEMU’s gdb stub, which would allow me to debug the kernel. Unfortunately, this did not allow me to debug userspace binaries (unless this is actually just a skill issue because I sure as heck tried). I tried building from source, but abandoned it. I’m still not sure GDB supports RISC-V.

A crude solution

So if I can’t debug it, and I really don’t want to reverse that shuffle algorithm, instead let’s brute force it. I need some way to incrementally validate guesses, and looking over the code, I believe it builds and modifies only successive bytes in the array, i.e. all offsets are positive. This means we can check that a partial solution is partually correct. I could brute force the flag one character at a time. To do this, I patched the binary to emit one of the registers in entry. To reiterate how this program works, a flag is entered by STDIN, which is then used to build an array with init_shuffle, and shuffle_and_fetch runs in a loop to build some ciphertext. One of the last things that happens in entry is a loop that compares the generated ciphertext with the stored ciphertext. Inside that loop is a register with a pointer to the ciphertext that gets incremented as more of the ciphertext is correct. I patched in 6 bytes to invoke sys_exit and use that register as the exit code. Because the entire .text section is only 528 bytes long, and the .rodata section immediately follows it, I had to clobber something. I chose to kill the print syscall that indicates failure.

Patch in ghidra
Patch in ghidra

Now it’s just a matter of brute forcing. Busybox severely limits my options for programming this, I went with sh (a little different from bash as I found out). The script is pretty simple, just try the flag we’ve built so far with a new character, and when the return code increments, we know that’s the correct letter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/sh

binary=./riscy_emit_a5
charset="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-_=+[]{}|;:'\",.<>?/"

best_guess="picoCTF{"  # initial guess based on flag fmt
echo "$best_guess" | $binary &>/dev/null
best_retcode=$?

while true; do
    success_flag=0
    for i in $(seq 0 $((${#charset} - 1))); do
        char="${charset:$i:1}"
        guess="$best_guess$char"
        echo "$guess" | $binary &>/dev/null
        retcode=$?
        echo -e "\r($retcode) $guess"
        if [ $retcode -gt $best_retcode ]; then
            best_guess=$guess
            best_retcode=$retcode
            success_flag=1
            break
        fi
    done
    if [ $success_flag -eq 0 ]; then
        echo "No additional character was found"
        echo "Best was: $best_guess"
        break
    fi
done

Let’s run it and see what happens, this took about 50s in real time:

Brute forcing the flag
Brute forcing the flag

picoCTF{4ny0n3_g0t_r1scv_h4rdw4r3?_LGUfwl8xyMUlpgv

Well, that’s (almost) a flag. What happened at the end there, I should see a closing curly brace, right? I’m not 100% sure why, but I vaguely remember from the disassembled code that shuffle_and_fetch affects the current index and the 2 immediately after it, so that would maybe indicate the final 2 bytes are a little different. Let’s look at a few sections of the output that are interesting.

First, one of the characters (the v) incremented the ciphertext pointer twice, then successive guesses were incremented by 1 from the previous guess, as would be expected. Probably a bug in my script or the patch.

A bug
A bug

Second, we did actually find the next character (the z), but the script didn’t pick up on it because of the first issue.

A hidden char
A hidden char

So if the earlier hypothesis is correct, and we’re just missing 2 chars, then the last char should be a closing curly brace, so let’s send it:

The flag
The flag

Phew

Success
Success
Share on

Aidan M.
WRITTEN BY
Aidan
Intern