Home > Software > TDR: Practice Using EDGECASE
TDR: Practice Using EDGECASE
Table of contents
- Decompilation Walkthrough
- Discussion of Results
This is a basic walkthrough of decompiling a very simple PlayStation PSX-EXE binary (EDGECASE.EXE, which you can download at the bottom of this page) using Ghidra and This Dust Remembers What It Once Was. It's intended to introduce the general TDR process with something that doesn't require manual workarounds and which (for the most part) easily decompiles to something very much like its original form in Ghidra.
The EDGECASE.EXE binary was compiled using the same PsyQ toolchain as many real PlayStation titles. The source code and PsyQ build instructions are included for reference/reproducibility. It was originally written to debug some problems with SymDump/SymDumpTE.
- Copy EDGECASE.EXE, and EDGECASE.SYM to your working directory.
- Open a command prompt or PowerShell prompt and change directory to your working directory.
- Convert EDGECASE.EXE to ELF format by running the following command:
PlayStationELFConverter.exe --exe2elf EDGECASE.EXE EDGECASE.ELF > PlayStationELFConverter_Log.txt 2>&1
- Generate the JSON version of the debug symbols by running the following command:
SymDumpTE.exe --debug --ignore-duplicate-definitions --rename-for-compatibility --auto-rename-fakes --json EDGECASE.SYM EDGECASE.json > SymDumpTE_Log.txt 2>&1
- Generate the header/stub files, Ghidra scripts, and an updated/extended/mapped version of the JSON debug symbol file by running the following command:
CreateSkeleton.exe --create-playstation-memory --assume-sn-gp-base --map-sld-functions --name EDGECASE --externs-to-labels --output-updated-json EDGECASE-Mapped.json --output Output EDGECASE.json > > CreateSkeleton_Log.txt 2>&1
Examine the log file (CreateSkeleton_Log.txt) and make sure it doesn't end with a Did not find an __SN_GP_BASE value in the debug symbol data error. If it does (the only game I know of that has this problem right now is Diablo), you'll need to run this command instead for now, then do another pass later once you know the correct value for the global pointer:
CreateSkeleton.exe --create-playstation-memory --map-sld-functions --name EDGECASE --externs-to-labels --output-updated-json EDGECASE-Mapped.json --output Output EDGECASE.json > CreateSkeleton_Log.txt 2>&1
- Examine the contents of README-EDGECASE-CreateSkeleton-Manual_Changes_Required.txt in the Output directory. If you want to follow up on any of the recommendations in it, do so now, then re-run the previous CreateSkeleton.exe command.
- Launch Ghidra, and create a new project. For the base directory, use the Output directory created by TDR.
- Import the ELF file you generated earlier. Ghidra will default to 64-bit MIPS, which is wrong. Click the ... button next to the Language field. Scroll up in the list and choose MIPS/default/32/little/default processor architecture, which will show up as MIPS:LE:32:default:default in the import file window. Click OK to begin the import.
- Close the import summary dialogue.
- Double-click on the ELF in the project list.
- An Analyze prompt will appear. Click No, because you don't want that to happen until the debug symbols have been imported.
- From the Edit menu, choose Tool Options.
- Expand Decompiler, and select Analysis. Uncheck Eliminate unreachable code. Click OK.
- From the File menu, choose Parse C Source option. Click the green plus sign button. Open the EDGECASE.H file in the Output directory. Click Parse to Program.
- Click Parse to Program. Click Continue. Click Continue?.
- After a moment, you should receive a message indicating that the header has been parsed successfully. If you don't, make sure you resolved any naming conflicts in the JSON file, re-run the CreateSkeleton.exe above, and then re-import the EDGECASE.H file. Otherwise, Click OK, then click Dismiss.
- Copy the EDGECASETDRAggressiveArrayIdentification.java, EDGECASETDRDecompile.java, EDGECASETDRDefineFunctions.java, EDGECASETDRExportData.java, and EDGECASEMapMemoryAndCreateLabels.java scripts from the Output/ghidra_files/ directory into your own Ghidra scripts directory (probably something like C:\Users\yourname\ghidra_scripts). Note: these files are dynamically generated, so you will need to re-copy them (overwriting the existing copies if necessary) every time they change, or when working on multiple projects.
- In Ghidra, from the Window menu, choose Script Manager option.
- In the Script Manager window, click on the the EDGECASETDRMapMemoryAndCreateLabels.java entry, then click the green-and-white play button in the upper-right corner of the window. This script creates any necessary PlayStation memory segments and applies labels found in the debug symbols.
- After a noticeable delay, you should see a EDGECASETDRMapMemoryAndCreateLabels.java> Finished! message in the console at the bottom of the main Ghidra window.
- Use the Script Manager to execute the EDGECASETDRDefineFunctions.java entry. This script imports function definitions and a few other things from the debug symbols.
- From the Analysis menu, choose Auto Analyze 'EDGECASE.ELF'. Check the Decompiler Parameter ID box if it's not already checked. Switch to the MIPS Constant Reference Analyzer section. Uncheck Recover global GP register writes if it's checked. Optionally, check Attempt to recover switch tables. Click Analyze.
- Wait for the analysis to complete (progress is in the lower-right corner of the main Ghidra window.
- Optional, but highly recommended: use the Script Manager to execute the EDGECASETDRAggressiveArrayIdentification.java entry. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases. This script attempts to detect cases where a global variable exists with embedded data in the PlayStation binary, but Ghidra has only identified the first element of the entire array. It will generally do a very good job, but some manual cleanup work may be necessary later.
- For a more complex binary, you'd need to do some additional manual work in Ghidra at this point, but this one is straightforward enough that you don't really need to. Go back to the Script Manager window, and run the EDGECASETDRDecompile.java script. Click OK in the popup - the location of the output file is preset by TDR, and you shouldn't change it in normal use.
- Wait for the decompilation to happen. This will be very fast for the practice binary. You should see a EDGECASETDRDecompile.java> Finished! message in the console at the bottom of the main Ghidra window when it's complete.
- In the Script Manager window, run the EDGECASETDRExportData.java script and wait for it to finish. The options in the script popup are preset by TDR - you shouldn't need to change them in most cases. This script will create a file named XPRTDATA.C in your output directory which contains C code that should create any embedded data from the game binary which is referenced by the decompiled code (global variables, etc.).
- Back in the command prompt, create another set of C source code files which contain the decompiled functions and global variable data output by Ghidra by running the following command:
PopulateSkeleton.exe --name EDGECASE --input-json EDGECASE-Mapped.json --input-source Output\EDGECASE.C --input-data Output\XPRTDATA.C --output Output > PopulateSkeleton_Log.txt 2>&1
- Examine the contents of Output/PRIMARY/source-decompiled, which should contain TDR's best attempt at reconstructing the original source code in all of the separate files that were originally used. Anything not matched to one of those files will be placed in THISDUST.C or THISDUST.H instead.
- Compare the resulting decompiled code with the original source code.
Discussion of Results
This section (coming soon!) will provide an analysis of the results generated using the process described above, to help evaluate the effectiveness of the TDR toolchain, as well as explain some of the differences in the output. For the most part, Ghidra does an unbelievably phenomenal job of reconstructing the original source code. However, compilation to native code is a lossy process, and so while nearly all of the results are functionally identical to the original source code, some of them are less "readable" than others, or are obviously machine-generated instead of the way a human might write them. This effect can be magnified by compiler optimizations and other factors.