[ Beneath the Waves ]

TDR: Practice Using OVERLAYS

article and software by Ben Lincoln

 

Table of contents

  1. Introduction
  2. Decompilation Walkthrough - OVERLAYS.EXE - Primary Binary
  3. Decompilation Walkthrough - OVERLAYS.EXE - Overlays

Introduction

This is a basic walkthrough of decompiling a very simple PlayStation PSX-EXE binary (OVERLAYS.EXE, which you can download at the bottom of this page) using Ghidra and This Dust Remembers What It Once Was. This binary make use of the PsyQ implementation of memory overlays, which games like the PlayStation version of Diablo and Biohazard 2 use to swap code in and out of RAM since it won't all fit at once. This can make reverse-engineering trickier, which is why I created a greatly-simplified tutorial to introduce them in a controlled setting.

Note that some (maybe many) PlayStation games use other methods of swapping code in and out of RAM. You can see when this has occurred because there will be functions defined in the SYM file, but with no corresponding code (or all zeroes) when the binary is loaded into Ghidra along with the symbol data. You can probably handle these in a similar way.

The practice binaries were compiled using the same PsyQ toolchain as many real PlayStation titles. The source code and PsyQ build instructions are included for reference/reproducibility. They were used to develop the overlay-handling features of TDR version 0.8, as earlier versions did not have that capability.

Decompilation Walkthrough - OVERLAYS.EXE - Primary Binary

This first section is very similar to the process used for games without memory overlays. However, I've included a lot of extra detail and side-notes in this one, so I'd recommend reading through all of it if you want to learn more about using TDR for games I've not documented.

  1. Pick a working directory. For example, C:\TDR\OVERLAYS.
  2. Create a subdirectory of your working directory for the primary binary for this project. For example, C:\TDR\OVERLAYS\Primary.
  3. Copy OVERLAYS.EXE, OVERLAY1.BIN, OVERLAY2.BIN, OVERLAY3.BIN, and OVERLAYS.SYM to that subdirectory.
  4. Open a command prompt or PowerShell prompt and change directory to that same subdirectory.
  5. Perform an initial conversion of OVERLAYS.EXE to ELF format by running the following command:

    PlayStationELFConverter.exe --exe2elf OVERLAYS.EXE OVERLAYS.ELF > PlayStationELFConverter_Log.txt 2>&1

  6. Generate the JSON version of the debug symbols by running the following command:

    SymDumpTE.exe --debug --ignore-duplicate-definitions --rename-for-compatibility --auto-rename-fakes --json OVERLAYS.SYM OVERLAYS.json > SymDumpTE_Log.txt 2>&1

    If you examine the end of SymDumpTE_Log.txt, you'll see this message: Warning: the debug symbols for this project reference 3 PsyQ overlays. Reverse-engineering this type of project requires additional manual effort. Please consult the TDR documentation - for example, the walkthrough of decompiling the OVERLAYS example binary.

    The following overlays were found in this set of debug symbols:

    Overlay ID: 0x04 (decimal: 4), address 0x8004099C, length 0x00000100 (decimal: 256)

    Overlay ID: 0x05 (decimal: 5), address 0x8004099C, length 0x00000248 (decimal: 584)

    Overlay ID: 0x06 (decimal: 6), address 0x8004099C, length 0x00000248 (decimal: 584)

    In this extremely simple example code, there are three memory overlays. Any one of the three can be loaded into the block of memory starting at 0x8004099C, but only one can be loaded at a time (since all three share that same block of memory. When reverse-engineering this type of project using TDR, you'll generally want to perform a decompilation of just the primary binary, and then start handling the overlays after that. This is partly because in most cases it won't be obvious where the overlay data is stored, and having a decompiled version of the main binary will help you locate them. Additionally, if your goal is to generate C code that can be recompiled successfully, you'll want to have clean delineations between the sets of code, for reasons that will become obvious as you progress through this tutorial if they're not already.
  7. Generate the header/stub files, Ghidra scripts, and an updated/extended/mapped version of the JSON debug symbol file by running the following command:

    CreateSkeleton.exe --create-playstation-memory --assume-sn-gp-base --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1

    Examine the log file (CreateSkeleton_Log.txt) and make sure it doesn't end with a Did not find an __SN_GP_BASE value in the debug symbol data error. It won't for OVERLAYS, but it might for othr games, like Diablo and Biohazard 2, so it's good to get in the habit. If you do see that error, you'll need to run this command instead for now, then do another pass later once you know the correct value for the global pointer:

    CreateSkeleton.exe --create-playstation-memory --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1

    You always want to include the global pointer information if possible, because it makes Ghidra's decompilation of the code much more accurate.
  8. Examine the contents of README-OVERLAYS-CreateSkeleton-Manual_Changes_Required.txt in the Output directory. If you want to follow up on any of the recommendations in it, do so now, then re-run the previous CreateSkeleton.exe command.
  9. Launch Ghidra, and create a new project named OVERLAYS. For the base directory, use the Output directory created by TDR.
  10. Import the ELF file you generated earlier. Ghidra will default to 64-bit MIPS, which is wrong. Click the ... button next to the Language field. Scroll up in the list and choose MIPS/default/32/little/default processor architecture, which will show up as MIPS:LE:32:default:default in the import file window. Click OK to begin the import.
  11. Close the import summary dialogue.
  12. Double-click on the ELF in the project list.
  13. An Analyze prompt will appear. Click No, because you don't want that to happen until the debug symbols have been imported.
  14. From the Edit menu, choose Tool Options.
  15. Expand Decompiler, and select Analysis. Uncheck Eliminate unreachable code. Click OK.
  16. From the File menu, choose Parse C Source option. Click the green plus sign button. Open the OVERLAYS.H file in the Output directory. Click Parse to Program.
  17. Click Parse to Program. Click Continue. Click Continue?.
  18. After a moment, you should receive a message indicating that the header has been parsed successfully. If you don't, make sure you resolved any naming conflicts in the JSON file, re-run the CreateSkeleton.exe above, and then re-import the OVERLAYS.H file. Otherwise, Click OK, then click Dismiss.
  19. Copy the OVERLAYSTDRAggressiveArrayIdentification.java, OVERLAYSTDRDecompile.java, OVERLAYSTDRDefineFunctions.java, OVERLAYSTDRExportData.java, and OVERLAYSMapMemoryAndCreateLabels.java scripts from the Output/ghidra_files/ directory into your own Ghidra scripts directory (probably something like C:\Users\yourname\ghidra_scripts). Note: these files are dynamically generated, so you will need to re-copy them (overwriting the existing copies if necessary) every time they change, or when working on multiple projects.
  20. In Ghidra, from the Window menu, choose Script Manager option.
  21. In the Script Manager window, click on the the OVERLAYSTDRMapMemoryAndCreateLabels.java entry, then click the green-and-white play button in the upper-right corner of the window. This script creates any necessary PlayStation memory segments and applies labels found in the debug symbols.
  22. After a noticeable delay, you should see a OVERLAYSTDRMapMemoryAndCreateLabels.java> Finished! message in the console at the bottom of the main Ghidra window.
  23. Use the Script Manager to execute the OVERLAYSTDRDefineFunctions.java entry. This script imports function definitions and a few other things from the debug symbols.
  24. From the Analysis menu, choose Auto Analyze 'OVERLAYS.ELF'. Check the Decompiler Parameter ID box if it's not already checked. Switch to the MIPS Constant Reference Analyzer section. Uncheck Recover global GP register writes if it's checked. Optionally, check Attempt to recover switch tables. Click Analyze.
  25. Wait for the analysis to complete (progress is in the lower-right corner of the main Ghidra window.
  26. Optional, but highly recommended: use the Script Manager to execute the OVERLAYSTDRAggressiveArrayIdentification.java entry. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases. This script attempts to detect cases where a global variable exists with embedded data in the PlayStation binary, but Ghidra has only identified the first element of the entire array. It will generally do a very good job, but some manual cleanup work may be necessary later.
  27. Side-note: this is a perfect opportunity to practice figuring out what the global pointer value is for a game if it doesn't have an __SN_GP_BASE label. This same general process should work for most (if not all) PsyQ-based games, and maybe most (or all) PlayStation games in general.
    1. In Ghidra's Symbol Tree, expand Functions, then locate entry. This is the entrypoint for the binary, where the CPU will start executing code when it's run on a PlayStation (or emulator, etc.). The address of the entrypoint is different for each binary, but all binaries have an entrypoint, and Ghidra is good at marking it as such. If you ever run into a situation where it's unclear, look in the output of PlayStationELFConverter.exe for text like this:

      ProgramCounter: 0x800406F4

      ...or in the JSON version of the debug symbols for text like this:

      "program_counter": 2147747572

      That address is the entrypoint.
    2. If you look about 32 lines into the disassembly (not the decompilation) of entry, you should find a pair of instructions that look like this:

      .text:80040770 04 80 1c 3c lui gp,0x8004

      .text:80040774 88 09 9c 27 addiu gp,gp,0x988

      In MIPS assembly language, lui is the "load upper immediate" instruction[1] (where "upper" is the most significant 16 bits of a 32-bit value). The first line can be read as "set the two most significant bytes of the gp register to 0x8004, and set the two least-significant bytes to 0x0000", or "set the gp register to 0x80040000".
      addiu is the "add immediate unsigned" instruction. The second line can be read as "add 0x0988 to the value stored in the gp register, and store the result in the gp register."
      In other words, the combination of these two lines is "set the gp register to 0x80040988".
      MIPS instructions like these can't load an entire arbitrary 32-bit value into a register in a single operation, so you'll frequently see these two (or equivalents) paired together to achieve that effect.
      If you look in the JSON version of the symbol data for OVERLAYS and search for __SN_GP_BASE, you'll see that its value is 2147748232 in decimal, or 0x80040988 in hex.
      You should see a similar pattern in the entrypoint of most (if not all) PlayStation games. Once you've determine the global pointer value in this way, you can pass it to CreateSkeleton.exe using the --use-gp-base option. For example, if TDR didn't already automatically detect the value for OVERLAYS, you could add --use-gp-base 0x80040988 to your CreateSkeleton.exe oommand instead of using --assume-sn-gp-base.
  28. If this were a real game, you'd probably need to do a lot of additional work in Ghidra at this point, but this one is simple enough that there's only one thing you need to correct in Ghidra.
  29. In Ghidra's Symbol Tree, expand Functions, then double-click on PrintMessage, which probably looks like this in the decompiled code view:

    void PrintMessage(char *message)

    {

    printf((char *)&PTR_DAT_80040988,message);

    return;

    }

  30. Double-click on PTR_DAT_80040988 in the decompiled code view to jump to that offset.
  31. Ghidra has miscategorized the string "%s\n" as a pointer. To fix this, right-click on the hexadecimal 25 73 0a 00, choose Data, then Choose Data Type. In the popup, enter string, then click OK.
  32. Ghidra also likes to assign automatic names to things that contain characters which are not valid in C identifiers (like s_%s_80040988, so save yourself some trouble later by right-clicking on the hex code again, and choosing Add Label. In the dialogue, enter something like s_string_placeholder_with_newline, then click OK.
  33. Go back to the Script Manager window, and run the OVERLAYSTDRDecompile.java script. Click OK in the popup - the location of the output file is preset by TDR, and you shouldn't change it in normal use.
  34. Wait for the decompilation to happen. This will be very fast for the practice binary. You should see a OVERLAYSTDRDecompile.java> Finished! message in the console at the bottom of the main Ghidra window when it's complete.
  35. In the Script Manager window, run the OVERLAYSTDRExportData.java script and wait for it to finish. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases. This script will create a file named XPRTDATA.C in your output directory which contains C code that should create any embedded data from the game binary which is referenced by the decompiled code (global variables, etc.).
  36. Back in the command prompt, create another set of C source code files which contain the decompiled functions and global variable data output by Ghidra by running the following command:

    PopulateSkeleton.exe --name OVERLAYS --input-json OVERLAYS-Mapped.json --input-source Output\OVERLAYS.C --input-data Output\XPRTDATA.C --output Output > PopulateSkeleton_Log.txt 2>&1

  37. Examine the contents of Output/PRIMARY/source-decompiled, which should contain TDR's best attempt at reconstructing the original source code in all of the separate files that were originally used. Anything not matched to one of those files will be placed in THISDUST.C or THISDUST.H instead.

Comparing the source code to the decompiled output, you can see that the combination of Ghidra and TDR has done a pretty good job of recovering something like the original source code, with the exception of having a spurious function named OverlayAddress() in THISDUST.C and THISDUST.H in addition to the global variable of the same name.[2]

This is what MAIN.C looks like in the original source code:

#include

extern char *OverlayAddress;

extern void overlay1_function_1 (void);

extern void overlay1_function_2 (void);

extern void overlay2_function_1 (char *);

extern void overlay2_function_2 (void);

extern void overlay3_function_1 (char *);

extern void overlay3_function_2 (void);

int GlobalNumber;

void PrintMessage(char *message)

{

printf("%s\n", message);

}

void PrintCurrentGlobalNumber()

{

printf("Current global number value is %i\n", GlobalNumber);

}

static void loadOverlay(char *fileName)

{

int fileHandle;

int fileLength;

fileHandle = PCopen(fileName, 0, 0);

fileLength= PClseek(fileHandle, 0, 2);

PClseek(fileHandle, 0, 0);

PCread(fileHandle, OverlayAddress, fileLength);

PCclose(fileHandle);

FlushCache();

}

int main()

{

PrintMessage("PsyQ Overlay Example");

GlobalNumber = 0;

PrintMessage("Loading OVERLAY1.BIN");

loadOverlay("OVERLAY1.BIN");

PrintMessage("Calling overlay1_function_1()");

overlay1_function_1();

PrintMessage("Calling overlay1_function_2()");

overlay1_function_2();

PrintMessage("Loading OVERLAY2.BIN");

loadOverlay("OVERLAY2.BIN");

PrintMessage("Calling overlay2_function_1(\"Sent to overlay 2\")");

overlay2_function_1("Sent to overlay 2");

PrintMessage("Calling overlay2_function_2()");

overlay2_function_2();

PrintMessage("Loading OVERLAY3.BIN");

loadOverlay("OVERLAY3.BIN");

PrintMessage("Calling overlay3_function_1(\"Sent to overlay 3\")");

overlay3_function_1("Sent to overlay 3");

PrintMessage("Calling overlay3_function_2()");

overlay3_function_2();

PrintMessage("Trying to access an overlay which is no longer loaded should cause unexpected behaviour.");

PrintMessage("Calling overlay1_function_1() even though overlay 1 has been overwritten with overlay 3");

overlay1_function_1();

PrintMessage("Calling overlay1_function_2() even though overlay 1 has been overwritten with overlay 3");

overlay1_function_2();

PrintMessage("Calling overlay2_function_1(\"Sent to overlay 2\") even though overlay 2 has been overwritten with overlay 3");

overlay2_function_1("Sent to overlay 2");

PrintMessage("Calling overlay2_function_2() even though overlay 2 has been overwritten with overlay 3");

overlay2_function_2();

PrintMessage("Done!");

return 0;

}

This is the decompiled version:

#include "THISDUST.H"

#include "MAIN.H"

void PrintMessage(char *message)

{

printf(s_string_placeholder_with_newline,message);

return;

}

void PrintCurrentGlobalNumber(void)

{

printf(s_Current_global_number_value_is___80040000,GlobalNumber);

return;

}

void loadOverlay(char *fileName)

{

undefined4 uVar1;

undefined4 uVar2;

uVar1 = PCopen(fileName,0,0);

uVar2 = PClseek(uVar1,0,2);

PClseek(uVar1,0,0);

PCread(uVar1,OverlayAddress,uVar2);

PCclose(uVar1);

FlushCache();

return;

}

int main(void)

{

__main();

PrintMessage(s_PsyQ_Overlay_Example_80040024);

GlobalNumber = 0;

PrintMessage(s_Loading_OVERLAY1_BIN_8004003c);

loadOverlay(s_OVERLAY1_BIN_80040054);

PrintMessage(s_Calling_overlay1_function_1___80040064);

FUN_80040a24();

PrintMessage(s_Calling_overlay1_function_2___80040084);

FUN_80040a5c();

PrintMessage(s_Loading_OVERLAY2_BIN_800400a4);

loadOverlay(s_OVERLAY2_BIN_800400bc);

PrintMessage(s_Calling_overlay2_function_1__Sen_800400cc);

FUN_80040b28(s_Sent_to_overlay_2_80040100);

PrintMessage(s_Calling_overlay2_function_2___80040114);

FUN_80040b70();

PrintMessage(s_Loading_OVERLAY3_BIN_80040134);

loadOverlay(s_OVERLAY3_BIN_8004014c);

PrintMessage(s_Calling_overlay3_function_1__Sen_8004015c);

FUN_80040b28(s_Sent_to_overlay_3_80040190);

PrintMessage(s_Calling_overlay3_function_2___800401a4);

FUN_80040b70();

PrintMessage(s_Trying_to_access_an_overlay_whic_800401c4);

PrintMessage(s_Calling_overlay1_function_1___ev_80040220);

FUN_80040a24();

PrintMessage(s_Calling_overlay1_function_2___ev_80040278);

FUN_80040a5c();

PrintMessage(s_Calling_overlay2_function_1__Sen_800402d0);

FUN_80040b28(s_Sent_to_overlay_2_80040100);

PrintMessage(s_Calling_overlay2_function_2___ev_8004033c);

FUN_80040b70();

PrintMessage(s_Done__8004098c);

return 0;

}

Some notes:

Decompilation Walkthrough - OVERLAYS.EXE - Overlays

So now you know that the binary references code in overlays. How do you go about reverse-engineering those?

When reverse-engineering a real game, you'd first need to figure out where the overlay binaries were located. If you're lucky, they're all in individual .BIN files whose sizes correspond exactly with the list that's output by TDR. That's the default for PsyQ. If you're unlucky, they're concatenated together into a blob, or stored in a larger archive. You'll need to examine the primary game binary itself to figure out for sure.

By following the code from MAIN.C to THISDUST.C (or cheating and looking in the original source), you can see that OVERLAYS does the most basic thing possible and references three separate binaries: OVERLAY1.BIN, OVERLAY2.BIN, and OVERLAY3.BIN. Because all three of these overlays occupy the same block of memory, you'll need to follow the steps below separately for each of the three. That is, the steps below will walk you through reverse-engineering OVERLAY1.BIN, but you'll need to do the exact same steps for OVERLAY2.BIN and OVERLAY3.BIN.

  1. Create a new subdirectory of your working directory for the overlay. For example, C:\TDR\OVERLAYS\OVERLAY1.
  2. Copy OVERLAYS.EXE, OVERLAY1.BIN, OVERLAY2.BIN, OVERLAY3.BIN, and OVERLAYS.SYM to that subdirectory.
  3. Open a command prompt or PowerShell prompt and change directory to that same subdirectory.
  4. Recall the list over overlays obtained when converting the primary binary:

    Overlay ID: 0x04 (decimal: 4), address 0x8004099C, length 0x00000100 (decimal: 256)

    Overlay ID: 0x05 (decimal: 5), address 0x8004099C, length 0x00000248 (decimal: 584)

    Overlay ID: 0x06 (decimal: 6), address 0x8004099C, length 0x00000248 (decimal: 584)

    You'll need to know the overlay ID (0x044 for the first overlay, for example), the address in memory to load it into, and the length of the overlay for the next step.
  5. Perform a conversion of OVERLAYS.EXE with the first overlay loaded by running the following command:

    PlayStationELFConverter.exe --exe2elf --overlay "0x04,0x8004099C,0x00000100,OVERLAY1.BIN" OVERLAYS.EXE OVERLAYS.ELF > PlayStationELFConverter_Log.txt 2>&1

  6. Make a second copy of the resulting ELF file for Ghidra to use by executing the following command:

    COPY OVERLAYS.ELF OVERLAYS-OVERLAY1.ELF

  7. Generate the JSON version of the debug symbols by running the following command:

    SymDumpTE.exe --debug --ignore-duplicate-definitions --rename-for-compatibility --auto-rename-fakes --json OVERLAYS.SYM OVERLAYS.json > SymDumpTE_Log.txt 2>&1

  8. Generate the header/stub files, Ghidra scripts, and an updated/extended/mapped version of the JSON debug symbol file by running the following command:
    IMPORTANT: note the additional --overlay option in this command.

    CreateSkeleton.exe --create-playstation-memory --overlay 0x04 --assume-sn-gp-base --map-sld-functions --name OVERLAYS --externs-to-labels --output-updated-json OVERLAYS-Mapped.json --output Output OVERLAYS.json > CreateSkeleton_Log.txt 2>&1

  9. Examine the contents of README-OVERLAYS-CreateSkeleton-Manual_Changes_Required.txt in the Output directory. If you want to follow up on any of the recommendations in it, do so now, then re-run the previous CreateSkeleton.exe command.
  10. Back in Ghidra, close OVERLAYS.ELF, but leave the project open. Import the OVERLAYS-OVERLAY1.ELF file into the same project as before. Ghidra will default to 64-bit MIPS, which is wrong. Click the ... button next to the Language field. Scroll up in the list and choose MIPS/default/32/little/default processor architecture, which will show up as MIPS:LE:32:default:default in the import file window. Click OK to begin the import.
  11. Close the import summary dialogue.
  12. Double-click on OVERLAYS-OVERLAY1.ELF in the project list.
  13. An Analyze prompt will appear. Click No, because you don't want that to happen until the debug symbols have been imported.
  14. IMPORTANT: remember that most of the scripts and files below will have identical names to the ones used for the primary binary, but will be in the subdirectory for the overlay (C:\TDR\OVERLAYS\OVERLAY1 if you've been following along). It is vital that you use these other versions, as they contain additional data not present in the other files.
  15. From the File menu, choose Parse C Source option. Click the green plus sign button. Open the OVERLAYS.H file in the Output directory (which will be C:\TDR\OVERLAYS\OVERLAY1\Output if you've been following along). Click Parse to Program.
  16. Click Parse to Program. Click Continue. Click Continue?.
  17. After a moment, you should receive a message indicating that the header has been parsed successfully. Click OK, then click Dismiss.
  18. Copy the OVERLAYSTDRAggressiveArrayIdentification.java, OVERLAYSTDRDecompile.java, OVERLAYSTDRDefineFunctions.java, OVERLAYSTDRExportData.java, and OVERLAYSMapMemoryAndCreateLabels.java scripts from the Output/ghidra_files/ directory into your own Ghidra scripts directory (probably something like C:\Users\yourname\ghidra_scripts). Note: these files are dynamically generated, so you will need to re-copy them (overwriting the existing copies if necessary) every time they change, or when working on multiple projects.
  19. In Ghidra, from the Window menu, choose Script Manager option.
  20. In the Script Manager window, click on the the OVERLAYSTDRMapMemoryAndCreateLabels.java entry, then click the green-and-white play button in the upper-right corner of the window. This script creates any necessary PlayStation memory segments and applies labels found in the debug symbols.
  21. After a noticeable delay, you should see a OVERLAYSTDRMapMemoryAndCreateLabels.java> Finished! message in the console at the bottom of the main Ghidra window.
  22. Use the Script Manager to execute the OVERLAYSTDRDefineFunctions.java entry. This script imports function definitions and a few other things from the debug symbols.
  23. From the Analysis menu, choose Auto Analyze 'OVERLAYS.ELF'. Check the Decompiler Parameter ID box if it's not already checked. Switch to the MIPS Constant Reference Analyzer section. Uncheck Recover global GP register writes if it's checked. Optionally, check Attempt to recover switch tables. Click Analyze.
  24. Wait for the analysis to complete (progress is in the lower-right corner of the main Ghidra window.
  25. Optional, but highly recommended: use the Script Manager to execute the OVERLAYSTDRAggressiveArrayIdentification.java entry. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases.
  26. Go back to the Script Manager window, and run the OVERLAYSTDRDecompile.java script. Click OK in the popup - the location of the output file is preset by TDR, and you shouldn't change it in normal use.
  27. Wait for the decompilation to happen. This will be very fast for the practice binary. You should see a OVERLAYSTDRDecompile.java> Finished! message in the console at the bottom of the main Ghidra window when it's complete.
  28. In the Script Manager window, run the OVERLAYSTDRExportData.java script and wait for it to finish. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases. This script will create a file named XPRTDATA.C in your output directory which contains C code that should create any embedded data from the game binary which is referenced by the decompiled code (global variables, etc.).
  29. Back in the command prompt, create another set of C source code files which contain the decompiled functions and global variable data output by Ghidra by running the following command:
    IMPORTANT: note the additional --overlay option in this command.

    PopulateSkeleton.exe --name OVERLAYS --overlay 0x04 --input-json OVERLAYS-Mapped.json --input-source Output\OVERLAYS.C --input-data Output\XPRTDATA.C --output Output > PopulateSkeleton_Log.txt 2>&1

  30. Examine the contents of Output/PRIMARY/source-decompiled, and Output/over0x04/source-decompiled which should contain TDR's best attempt at reconstructing the original source code in all of the separate files that were originally used. Anything not matched to one of those files will be placed in THISDUST.C or THISDUST.H instead.

Some key elements of this second phase:

Output/over0x04/source-decompiled/C/OVERLAYS/OVERLAY1.C has been created based on the additional overlay data incorporated into this step:

#include "THISDUST.H"

#include "OVERLAY1.H"

void overlay1_function_1(void)

{

PrintMessage(s_This_is_a_message_from_overlay1__800409a0);

return;

}

void overlay1_function_2(void)

{

PrintMessage(s_I_m_overlay_1__and_I_m_going_to_c_800409cc);

PrintCurrentGlobalNumber();

return;

}

...it's a pretty reasonable facsimile of the original:

#include "globals.h"

void overlay1_function_1()

{

PrintMessage("This is a message from overlay1_function_1.");

}

void overlay1_function_2()

{

PrintMessage("I'm overlay 1, and I'm going to call PrintCurrentGlobalNumber(), which is in main.c.");

PrintCurrentGlobalNumber();

}

Output/PRIMARY/source-decompiled/C/OVERLAYS/MAIN.C is nearly identical to the corresponding file from the first phase, but has had four of the unknown function calls replaced with the actual functions:

int main(void)

{

__main();

PrintMessage(s_PsyQ_Overlay_Example_80040024);

GlobalNumber = 0;

PrintMessage(s_Loading_OVERLAY1_BIN_8004003c);

loadOverlay(s_OVERLAY1_BIN_80040054);

PrintMessage(s_Calling_overlay1_function_1___80040064);

overlay1_function_1();

PrintMessage(s_Calling_overlay1_function_2___80040084);

overlay1_function_2();

PrintMessage(s_Loading_OVERLAY2_BIN_800400a4);

loadOverlay(s_OVERLAY2_BIN_800400bc);

PrintMessage(s_Calling_overlay2_function_1__Sen_800400cc);

FUN_80040b28(s_Sent_to_overlay_2_80040100);

PrintMessage(s_Calling_overlay2_function_2___80040114);

FUN_80040b70();

PrintMessage(s_Loading_OVERLAY3_BIN_80040134);

loadOverlay(s_OVERLAY3_BIN_8004014c);

PrintMessage(s_Calling_overlay3_function_1__Sen_8004015c);

FUN_80040b28(s_Sent_to_overlay_3_80040190);

PrintMessage(s_Calling_overlay3_function_2___800401a4);

FUN_80040b70();

PrintMessage(s_Trying_to_access_an_overlay_whic_800401c4);

PrintMessage(s_Calling_overlay1_function_1___ev_80040220);

overlay1_function_1();

PrintMessage(s_Calling_overlay1_function_2___ev_80040278);

overlay1_function_2();

PrintMessage(s_Calling_overlay2_function_1__Sen_800402d0);

FUN_80040b28(s_Sent_to_overlay_2_80040100);

PrintMessage(s_Calling_overlay2_function_2___ev_8004033c);

FUN_80040b70();

PrintMessage(s_Done__8004098c);

return 0;

}

If you repeat this phase with the other two overlays, you can build up enough information to recover the original source, but you'll need to manually piece MAIN.C back together using the information from all four phases.

 
Download
File Size Version Release Date Author
OVERLAYS Toy PlayStation PSX-EXE With Source 13 KiB 1.0 2019-09-09 Ben Lincoln
A small, custom PsyQ PSX-EXE (with source code included for reference) that can be used to practice with TDR. This one makes use of PsyQ's memory overlay feature, which requires special handling when reverse-engineering.
 
Footnotes
1. See http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html.
2. This extra function definition is an artifact of the --map-sld-functions flag in CreateSkeleton.exe. For some reason, when OVERLAYS is compiled, PsyQ creates a label for the OverlayAddress global variable, but doesn't include it in the list of externs. This means that as far as the current version of TDR is concerned, it may as well be a function, because there is SLD data that points to it, and only a label defined at that address, which is how library functions appear in SYM data. A future version should do a better job of filtering out edge-case bad data like this.
3. Yes, it's theoretically possible to run the code in some sort of emulator, determine which overlay is loaded at a given point in the program execution, and use that information to map to the original function. If you want to write an extension for TDR which does that, be my guest :). I, unfortunately, don't have anywhere near enough time to do that.
 
[ Page Icon ]