[ Beneath the Waves ]

This Dust Remembers What It Once Was

article and software by Ben Lincoln

 

Table of contents

  1. Introduction
  2. Components
  3. Basic Walkthrough
  4. Results
  5. Downloads

Introduction

This Dust Remembers What It Once Was ("TDR") is a reverse-engineering toolkit I wrote for use with the NSA'a amazing tool Ghidra. Ghidra is a completely free, open-source binary reverse-engineering toolkit that includes not only a disassembler, but a decompiler that must have been written using black magic. I can't thank its authors and the NSA enough for releasing it last year.

I wanted to use Ghidra to help reverse engineer Soul Reaver, my favourite game of all time, but at least when I started, there were a couple of obstacles in my way: Ghidra doesn't support the proprietary PSX-EXE format used for PlayStation binaries, and it also doesn't support the PsyQ .SYM debug symbol format.

I originally started writing TDR specifically for that one project, but I've tried to generalize it enough to work with any PlayStation title that has PsyQ debug symbols available. The PSX-EXE-to-ELF converter means that any PlayStation binary should be importable into Ghidra, even if it wasn't written using PsyQ at all.

I have some additional componnents in mind for later that will extend it to other gaming platforms, but I'm not sure when I'll have time to get around to that.

Be warned, the current version of TDR should be considered an alpha release, in the traditional sense: it's feature-complete, but it's probably full of bugs. I don't know how frequently I'll be able to work on it, so I wanted to get it out there in case it was useful to someone even in its current state.

TDR is a highly-specialized reverse-engineering tool. The documentation below is pretty barebones at the moment, and assumes extensive pre-existing knowledge. I'd like to expand it in the future.

TDR itself is open-source, licensed under the GPLv3. Warning: you may regret looking at some of the code. This is a project that grew organically over about six months. It involved lots of on-the-fly design changes because I was learning about some of the low-level details as I went.

Components

The current version of TDR is made up of four tools:

CreateSkeleton.exe does the bulk of the work in the current version of TDR. From the input data, it generates the following:

Basic Walkthrough

TDR was developed and tested with debug builds of Soul Reaver, so this walkthrough will assume you have one of those. I used the one with a build date of 1999-06-01, but any prototype version should work as long as it includes the .SYM file. Important: some of the European retail versions of Soul Reaver include .SYM files, but I don't think they actually match up with the game binary, as they're much older. Those probably won't work.

You'll probably want to copy all of the TDR binaries and DLLs to C:\Windows for convenience, or at least add their location to your PATH environment variable.

  1. In NTSC builds of Soul Reaver, the game binary is named SLUS_007.08, so the first step is to copy that to a working directory and rename it to KAIN2.EXE.
  2. Debug builds of Soul Reaver have a DEBUG directory on the disc. For NTSC debug builds, the symbols will be in the NTSC subdirectory of that. You really only need the KAIN2.SYM file, but I usually copy all of them into my working directory.
  3. Open a command prompt or PowerShell prompt and change directory to your working directory.
  4. Convert KAIN2.EXE to ELF format by running the following command:

    PlayStationELFConverter.exe --exe2elf KAIN2.EXE KAIN2.ELF > PlayStationELFConverter_Log.txt 2>&1

  5. Generate the JSON version of the debug symbols by running the following command:

    SymDumpTE.exe --json KAIN2.SYM KAIN2.json > SymDumpTE_Log.txt 2>&1

  6. Generate the monolithic header and Ghidra Java script by running the following command:

    CreateSkeleton.exe --name KAIN2 --ignore-labels --externs-to-labels --output Output KAIN2.json > CreateSkeleton_Log.txt 2>&1


    Note: If you're doing extensive manual reverse-engineering in Ghidra, or your debug symbols only contain labels, you can leave out the --ignore-labels flag. I've included it here because if one is only doing the basic walkthrough, a bunch of the labels will cause spurious nonexistent functions to appear in Ghidra, because some of the labels point to areas that superficially look like code.
  7. Examine the contents of README-KAIN2-CreateSkeleton-Manual_Changes_Required.txt in the Output directory. There will most likely be a lot of manual cleanup suggested. For purposes of this walkthrough, there's only one part that's strictly necessary, and it's detailed below. The most you correct the JSON file and re-run CreateSkeleton, the better Ghidra will do with the data, however. Once you've finished making any changes you want to perform, re-run the previous CreateSkeleton.exe command to regenerate the output files.
  8. Launch Ghidra, and create a new project. For the base directory, use the Output directory created by TDR. For the project name, use KAIN2.
  9. Import the ELF file you generated earlier. Ghidra will default to 64-bit MIPS, which is wrong. Click the ... button next to the Language field. Scroll up in the list and choose MIPS/default/32/little/default processor architecture, which will show up as MIPS:LE:32:default:default in the import file window. Click OK to begin the import.
  10. Close the import summary dialogue.
  11. Double-click on KAIN2.ELF in the project list.
  12. An Analyze prompt will appear. Click No, because you don't want that to happen until the debug symbols have been imported.
  13. From the Edit menu, choose Tool Options.
  14. Expand Decompiler, and select Analysis. Uncheck Eliminate unreachable code. Click OK.
  15. From the File menu, choose Parse C Source option. Click the green plus sign button. Open the KAIN2.H file in the Output/source-stubs directory. Click Parse to Program.
  16. Click Parse to Program. Click Continue. Click Continue?.
  17. After a moment, you should receive a message indicating that the header has been parsed successfully. If you don't, make sure you resolved any naming conflicts in the JSON file, re-run the CreateSkeleton.exe above, and then re-import the KAIN2.H file. Otherwise, Click OK, then click Dismiss.
  18. Copy the KAIN2DefineFunctions.java script from the Output/ghidra_files/ directory into your own Ghidra scripts directory (probably something like C:\Users\yourname\ghidra_scripts). Note: this file is dynamically generated, so you will need to re-copy it (overwriting the existing copy if necessary) every time it changes, or when working on multiple projects.
  19. In Ghidra, from the Window menu, choose Script Manager option.
  20. In the Script Manager window, click on the the KAIN2DefineFunctions.java entry, then click the green-and-white play button in the upper-right corner of the window.
  21. After a noticeable delay, you should see a KAIN2DefineFunctions.java> Finished! message in the console at the bottom of the main Ghidra window.
  22. From the Analysis menu, choose Auto Analyze 'KAIN2.ELF'. Check the Decompiler Parameter ID box if it's not already checked. Click Analyze.
  23. Wait for the analysis to complete (progress is in the lower-right corner of the main Ghidra window.
  24. This should be enough for a basic demonstration of the toolchain. However, if you're really trying to fully reverse-engineer the game, at this point, you would do all of that work in Ghidra. That will take awhile, and is outside the scope of this walkthrough.
  25. When you're ready to proceed with generating source code, go to the File menu and choose Export Program.
  26. Choose C/C++ for the Format.
  27. Name the output file KAIN2.C, and place it in your working directory.
  28. Click Options...
  29. Check Create Header File.
  30. Click OK in both windows.
  31. Wait for the decompilation to happen. Click OK in the results window when it appears.
  32. Back in the command prompt, create another set of C source code files which contain the decompiled functions output by Ghidra by running the following command:

    PopulateSkeleton.exe --name KAIN2 --input-json KAIN2.json --input-source KAIN2.C --output Output > PopulateSkeleton_Log.txt 2>&1

  33. Examine the contents of Output/source-decompiled, which should contain TDR's best attempt at reconstructing the original source code in all of the separate files that were originally used. Anything not matched to one of those files will be placed in Unmatched_Decompiled_Functions.C instead.

The one required change for the Soul Reaver files (mentioned above):

As discussed above, there are lots of changes that would be good to make, but one is absolutely required for Soul Reaver, because if you don't, Ghidra won't be able to parse the C header file.

Open KAIN2.json and search for "name": "_walbossAttributes". You should find a section that looks like this:

"UsedByFunctions": [],

"struct_member_signature": "struct .253fake attackDeltas[0]; // size=0, offset=24",

"class_type": "struct_member",

"c_type": "struct .253fake[0]",

"type_name": "struct .253fake[0]",

"size": 0,

"offset": 24,

"parent_name": "struct _walbossAttributes",

"parent_hashcode": 0,

"name": "attackDeltas",

"source_file": null,

"hashcode": 0

}

],

"name": "_walbossAttributes",

"source_file": null,

"hashcode": -1378913783

There should be two instances of the text struct .253fake, or similar depending on which build of Soul Reaver you're looking at (struct .255fake, etc.) Change both instances of the struct name so that they read struct _wba253fake. There's nothing special about this replacement name, it just needs to be unique because there's another struct or union with the same name elsewhere in the debug symbols.

When you're done, the section should look something like this:

"UsedByFunctions": [],

"struct_member_signature": "struct _wba253fake attackDeltas[0]; // size=0, offset=24",

"class_type": "struct_member",

"c_type": "struct _wba253fake[0]",

"type_name": "struct _wba253fake[0]",

"size": 0,

"offset": 24,

"parent_name": "struct _walbossAttributes",

"parent_hashcode": 0,

"name": "attackDeltas",

"source_file": null,

"hashcode": 0

}

],

"name": "_walbossAttributes",

"source_file": null,

"hashcode": -1378913783

Scroll up just past the beginning of the _walbossAttributes struct definition, and you should find the definition of the .253fake struct that it references. It should look something like this:

{

"UsedByFunctions": [],

"members": [

{

"UsedByFunctions": [],

"struct_member_signature": "short plusDelta; // size=0, offset=0",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 0,

"parent_name": "struct .253fake",

"parent_hashcode": 0,

"name": "plusDelta",

"source_file": null,

"hashcode": 0

},

{

"UsedByFunctions": [],

"struct_member_signature": "short minusDelta; // size=0, offset=2",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 2,

"parent_name": "struct .253fake",

"parent_hashcode": 0,

"name": "minusDelta",

"source_file": null,

"hashcode": 0

},

{

"UsedByFunctions": [],

"struct_member_signature": "short validAtHitPoint; // size=0, offset=4",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 4,

"parent_name": "struct .253fake",

"parent_hashcode": 0,

"name": "validAtHitPoint",

"source_file": null,

"hashcode": 0

}

],

"name": ".253fake",

"source_file": null,

"hashcode": -1080400537

},

Replace all the occurrences of .253fake (or whatever it's called in the build you're looking at) with your replacement name, so that it looks like this:

{

"UsedByFunctions": [],

"members": [

{

"UsedByFunctions": [],

"struct_member_signature": "short plusDelta; // size=0, offset=0",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 0,

"parent_name": "struct _wba253fake",

"parent_hashcode": 0,

"name": "plusDelta",

"source_file": null,

"hashcode": 0

},

{

"UsedByFunctions": [],

"struct_member_signature": "short minusDelta; // size=0, offset=2",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 2,

"parent_name": "struct _wba253fake",

"parent_hashcode": 0,

"name": "minusDelta",

"source_file": null,

"hashcode": 0

},

{

"UsedByFunctions": [],

"struct_member_signature": "short validAtHitPoint; // size=0, offset=4",

"class_type": "struct_member",

"c_type": "short",

"type_name": "short",

"size": 2,

"offset": 4,

"parent_name": "struct _wba253fake",

"parent_hashcode": 0,

"name": "validAtHitPoint",

"source_file": null,

"hashcode": 0

}

],

"name": "_wba253fake",

"source_file": null,

"hashcode": -1080400537

},

Results

This section will be greatly expanded in the future.

TDR works pretty well with all of the debug builds of Soul Reaver I've tested it against.

It also does a solid job against the 1997-10-30 beta build of Biohazard 2. I didn't even need to manually edit the JSON file to do a basic decompilation of that one.

It does not do so well with the 1996-08-05 prototype version of Wipeout XL, because the .SYM file for that game only includes labels, not other types of symbols.

Downloads

 
Download
File Size Version Release Date Author
This Dust Remembers What It Once Was 558 KiB 0.2 2019-08-06 Ben Lincoln
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
 
Download
File Size Version Release Date Author
This Dust Remembers What It Once Was (Source Code) 1 MiB 0.2 2019-08-06 Ben Lincoln
This is the .NET source code for the TDR suite.
 
Download
File Size Version Release Date Author
This Dust Remembers What It Once Was 557 KiB 0.1 2019-08-06 Ben Lincoln
This is the .NET executable version of the TDR suite. If you want to use the tool, this is probably the file you want to download.
 
Download
File Size Version Release Date Author
This Dust Remembers What It Once Was (Source Code) 1 MiB 0.1 2019-08-06 Ben Lincoln
This is the .NET source code for the TDR suite.
 
[ Page Icon ]