Using Generative AI to Understand How an Obfuscated Script Works

Tackling Code Obfuscation

When facing a new technical challenge, I’m someone who often feels "in over my head," I tackle these feelings through research and preparation. Today, I'm delving into code obfuscation, a frequent challenge in malware analysis. I'll also demonstrate how a solo intelligence analyst can navigate such a challenge using generative AI.

In the context of Windows, batch files are scripts that can automate the placement, execution, and deletion of malware. To conceal these capabilities, obfuscation techniques are employed. There are numerous methods to hide code, making it difficult for both humans and machines to understand its purpose. A particular tool recently caught my attention due to the perplexing nature of its output.

This signals that there's more for me to learn, so let's dive in.

The obfuscator:

When I run my benign script through BatchObfuscator, it transforms from readable to unreadable.

image #1 original script
      (image: original script)

image #2 obfuscating the script
      (image: obfuscating the script)

image #3 result of obfuscating the script
      (image: result of obfuscating the script)

    Interestingly, the obfuscated script still executes correctly when run.
image #4 successful obfuscated script execution
      (image: successful obfuscated script execution)

image #5 the code that applies obfuscation to a batch file
    (image: the code that applies obfuscation to a batch file)

Upon examining the obfuscation code, I must confess that I don't fully grasp how it functions. It utilizes certutil.exe, a tool I'm familiar with for its Base64 encode/decode capabilities, but this alone doesn't fully explain what we're observing. There are additional elements in this code that warrant further exploration.

Let's seek assistance from GPT-4 and see if it can provide clarity. I've been using a generative AI Framework called Fabric for various tasks, including code analysis.

Let's discover what insights generative AI can offer.

Learn more about Fabric here:

We can download the code from GitHub using wget and then employ Fabric’s built-in system prompts (explain_code) to gain insights from GPT-4.

Generative AI Provided the Following Code Explanation:

This script is designed to run on Windows and performs a series of checks and operations related to batch files (.bat or .cmd) and the use of `certutil.exe` for decoding purposes. Here's a step-by-step explanation:

      1. `@echo off` - This command turns off the display of commands in the script, making the output cleaner.

      2. `if "%~1"=="" exit /b` - Checks if the script was called without any arguments. If so, it exits the script without executing further. `%~1` represents the first argument passed to the script.

      3. `if /i "%~x1" neq ".bat" if /i "%~x1" neq ".cmd" exit /b` - Checks if the extension of the first argument is not `.bat` or `.cmd` (case-insensitive due to `/i`). If neither condition is met, the script exits. This ensures that the script only processes files with these specific extensions.

      4. The `for /f %%i in ("certutil.exe") do if not exist "%%~$path:i" (...)` block checks if `certutil.exe` is available in the system's PATH. If not, it prints "CertUtil.exe not found.", pauses (waiting for user input), and then exits.

      5. `>"temp.~b64" echo(//4mY2xzDQo=` - This command creates a temporary file named `temp.~b64` and writes the Base64 encoded string `//4mY2xzDQo=` into it. The `echo(` is used to ensure that no newline character is added at the end of the string.

      6. `certutil.exe -f -decode "temp.~b64" "%~n1o%~x1"` - Uses `certutil.exe` to forcefully (`-f`) decode the content of `temp.~b64` and output the result into a new file. The output file name is derived from the original input file name (`%~n1`, which extracts the name without extension) with an added "o" before the extension (`%~x1`).

    7. `del "temp.~b64"` - Deletes the temporary Base64 file after decoding.
      8. `copy "%~n1o%~x1" /b + "%~1" /b` - Concatenates the decoded content (from step 6) with the original input file (`%~1`) in binary mode (`/b`). This effectively appends the original script content to whatever was decoded from the Base64 string.

This script appears to be a form of self-modifying or appending code mechanism, where it decodes additional commands or data from a Base64 string and appends it to a specified batch file, provided it meets certain conditions (existence of `certutil.exe`, correct file extension).

      Generative AI provided some useful information about elements I didn’t initially understand. What we’re seeing is that the obfuscation code decodes the following base64 (//4mY2xzDQo=) and writes it to the top of the file, which would affect how characters are displayed.

I tested decoding on both Ubuntu and Windows.

image #7 Ubuntu base64 decoding of the appended string
    (image: Ubuntu base64 decoding of the appended string)

image #8 Windows decoding of Base64 appended string
    (image: Windows decoding of Base64 appended string)

    Ubuntu isn’t properly decoding the first two characters, and Windows isn’t properly decoding the remaining characters.

              The batch file seems to be appended by two non-alphanumeric characters, followed by a clear-screen command to cover tracks. Text can be encoded in many different ways, and if what we’re looking at is Chinese, then it’s likely those two bytes indicate that our document is encoded with a different character set, causing it to be interpreted incorrectly by users.

            This was later verified when I noticed the difference in the character set displayed in Notepad.

(images: comparison of unmodified and obfuscated scripts)

image #9a comparison of unmodified and obfuscated scripts

(image: unmodified script)

image #9b comparison of unmodified and obfuscated scripts

            (image: obfuscated script)

          Now that we have a better understanding, let’s attempt to reproduce the script’s function. First, we’ll place the Base64 string into a temporary file, then use CertUtil to decode it into a new file called obfuscated_script.bat. Second, we’ll concatenate our original script and the obfuscated script together with our decoded Base64 string at the top of the file.

image #10 concatenating the decoded Base64 to the top of our script
            (image: concatenating the decoded Base64 to the top of our script)

          Now we can try reading the contents of the script and see that it has been successfully obfuscated.

image #11 verifying obfuscated_script.bat contents
          (image: verifying obfuscated_script.bat contents)

          And if we run our obfuscated_script.bat, it still functions smoothly.

image #12 successful demonstration of obfuscated_script.bat’s functionality
          (image: successful demonstration of obfuscated_script.bat’s functionality)

          This experience demonstrates how one type of code obfuscation works, but there are many other methods out there. By leveraging generative AI and testing outcomes, threat intelligence analysts can continue to advance and learn new techniques, even when they feel "in over their heads."

More from HYAS Labs

HYAS Threat Intelligence Report - March 25, 2024

HYAS Insight Uncovers and Mitigates a Russian-Based Cyber Attack

Polymorphic Malware Is No Longer Theoretical: BlackMamba PoC

Polymporphic, Intelligent and Fully Autonomous Malware: EyeSpy PoC

Back to Blog