Reverse-Engineering an IP camera - Part 5

In the previous parts of this study, I've demonstrated how I managed to to gain root access to this unbranded IP camera, downloaded its controller software and started to reverse-engineering it. This article will cover some discoveries I made during this adventure.

I wanted to decode the protocol the camera uses to communicate either with the phone app that controls it and also with its home server in China. In order to do that, I started by analyzing the camera power-up process, until I could identify all the steps it takes from initializing to being available for connections:

IP camera power-up sequence

Once the camera is powered-up, the bootloader (U-Boot) starts and then launches the Linux operating system. After the OS initialization, the camera manufacturer initialization shell scripts are launched and check for new software updates that may be available in an SD card attached to the camera. If no update is available, it finally starts the camera main process, IPC.

IPC starts by initializing the camera dedicated hardware (I.e.: The parts that are not directly controlled by the OS). Later it will also test the camera motors (which are responsible for changing the direction of the cameras lens) and initialize the Wi-Fi connection.

It then uses a hard-coded list of internet servers to request another list of servers. This dynamically requested list of servers indicate the ones that are available for the camera to connect to. Once one or two of the requested internet servers replies with the list of camera servers, the camera checks which one of the received servers has the fastest connection and assigns it as its default server for the next connections.

Then it enters on the main network data loop, where it will be sending the server keep-alive datagrams, notifying the server of its current IP address, so the server can relay any phone app connection directly to the camera.

Once the phone app is already configured with the camera ID it can connect to the server to receive the camera IP address (and other data) where the camera is connected. If both the phone and the camera are on the same local network, they start exchanging data directly. If they are on a different network, they start exchanging data via Internet.

This is a simplified explanation. There are several other data (TCP and UDP) being issued or received by the camera in this process.

The first noticeable thing about the camera/phone connection is that the app must authorize itself before it is able to receive the camera image. This is done by the app sending the camera the password it have stored in the phone. Since all camera connections are performed over an unencrypted protocol, only the password field in the authorization datagram is encrypted before being transmission, using a deterministic encryption. In other words, the same byte sequence is transmitted every single time the password must be used, so anyone that has access to the same local network could capture this data and decode the password.

Other vulnerabilities are also easily found when analyzing other data packages being transmitted. For example, the heart-beat signal the camera sends every 30-40 seconds to the server contains the camera ID in plain-text. It has some integrity checks, like a random value hash and a time-stamp, but those can be easily bypassed, since the protocol is simple. Also, the server responds the heart-beat signal by sending the exactly same UDP datagram, except for clearing the last bytes. These characteristics would allow for someone to easily detect which cameras are available on the same network, or even to perform a DoS attack by impersonating the same camera ID on a different IP address, or by simply sending the camera a spoofed heart-beat response that interrupts its communication with the server (There's a single bit on the protocol that is used by the server to notify the camera that it was disabled and must stop talking to it).

The fact that the camera ID is just a 32-bit integer makes things a lot worse, because an attacker could send a bunch of heart-beat datagrams to a server in order to impersonate a range of camera IDs at the same time, blocking any phone app from connecting to them.

All this information was decoded by analyzing the network data with Wireshark and the IPC code in Ghidra, and this process was made a lot easier by the camera developers because they left a lot of debug messages being printed on the camera terminal. The next images depict this: Ghidra includes a nice decompiler, but it is not able to recover variables names, since this information is discarded during the compilation process and do not end up in the IPC binaries. So, when analyzing the decompiled code, we have to deal with a lot of variables with generic names like "uVar4" and "uVar7", as depicted in the next image:

terminal debug 1

However, the developer left the variable names in the debug code (Terminal line at the top of the previous image), which allowed me to rename those variables in Ghidra, improving the code readability:

ewew

 

So I kept using the decompiler to understand how IP code works, and after a few (a lot!) hours, I stumbled with a piece of code that seemed prone for a buffer overflow in the software update function:

werwer

This is the code responsible for parsing the JSON file the camera receives when checking online for a new software update. The memcpy at the bottom of the image copies the string value read from the JSON key "error_code" into a 32-byte memory buffer that will be, later, converted to an integer value. As we can see in the next image, "error_code" expects a small string value like one or two characters, so 32 bytes should be enough.

ew

But, since there's no protection for dealing with larger strings in the code, what would happen if the received JSON data is compromised with a larger string? Well, we can test that by providing a fake web-server to serve the camera with an invalid JSON file containing an almost 800 characters long "error_code". This video shows what happens when the camera receives this data:

In this case the camera just reboots due to the memory access error. This error could be used to another DoS attack, by forcing the camera to continuously reboot, but could also be applied in a more destructive attack, where this buffer overflow could be used as a vector for a code injection and a remote code execution in the camera.

Since we're talking about vulnerabilities, Ghidra also helped me to identify some deprecated libraries the camera uses, which could also be used as a vector for security attacks, like this one:

openssl

There are about 90 CVE reports for this version, including one that is specific to the hardware platform used by the camera:

asass

So, now that some security vulnerabilities are found, it's time to exploit some of them to try to replace the camera software with my own version of IPC. I'll cover that in part 6 of this series.