Hacking Java Bytecode for Programmers (Part1) -The Birds and the Bees of Hex Editing
Index Link to heading
- Hacking Java Bytecode for Programmers (Part1) - The Birds and the Bees of Hex Editing
- Hacking Java Bytecode for Programmers (Part2) - Lions, and Tigers, and OP Codes, OH MY!
- Hacking Java Bytecode for Programmers (Part3) - Yes, disassemble with Javap ALL OVER THE PLACE!
- Hacking Java Bytecode for Programmers (Part4) - Krakatau And The Case Of The Integer Overflow
Tools & References Link to heading
- Ubuntu 12.10
- Java 1.7.0_15
- Python 2.7
- xxd
- Bless Hex Editor
- http://en.wikipedia.org/wiki/Java_bytecode
- http://en.wikipedia.org/wiki/Hexadecimal
- http://linuxcommand.org/man_pages/xxd1.html
Audience Link to heading
- Required: You should be comfortable in Linux ( 1+ years )
- Required: You should be comfortable writing scripts ( 1+ years )
- Desired: You have written web, desktop, or mobile applications ( 1+ years )
- Desired: You have programmed in Java and Python ( 6 months )
What is Hexadecimal? Link to heading
Computers execute binary code. But neck beards were fairly frustrated when editing a binary file and having to parse and edit huge number blocks of ones and zeros. The internet tells me that IBM came along and formalized a hexadecimal standard in the 1950s to pacify their geeks.
Wikipedia puts it nicely. Link to heading
The primary use of hexadecimal notation is a human-friendly representation of binary-coded values.
Essentially, hexadecimal makes it much easier to read and edit binary data.
What is a Hex Editor? Link to heading
A Hex Editor is a handy program that makes editing binary data easier. We will use xxd from the command line. But later we will use Bless which is a GUI Hex editing program.
What is Java Bytecode? Link to heading
Bytecode is compiled Java code that the JVM (Java Virtual Machine) executes.
That simplistic syllabus doesn’t really tell you much so lets actually show you what is up.
Say you have a User.java class file.
public class User {
protected int status = 0;
public boolean setStatusTrue() {
return this.status == 1;
}
public static void main(String[] args) {
System.out.println("Hacking Java Bytecode!");
}
}
Now lets compile it with the Java compiler javac.
$ javac User.java
This will create a compiled User.class file containing data.
$ ls
User.class User.java
$
Now lets run the program.
$ java User
Hacking Java Bytecode!
$
But lets actually take a look at the User.class file by dumping it with cat.
$ cat User.class
����3$
StackMapTablemain([Ljava/lang/String;)VrTable
SourceFile User.java
Hacking Java Bytecode!!
"#Userjava/lang/Objectjava/lang/SystemoutLjava/io/PrintStream;java/io/PrintStreamprintln(Ljava/lang/String;)V!
&
*�*��
1*����
@
% ���
$
Well that sucks. What we need to do instead is dump the User.class file with the command line hexadecimal application xxd.
$ xxd User.class
0000000: cafe babe 0000 0033 0016 0a00 0400 1209 .......3........
0000010: 0003 0013 0700 1407 0015 0100 0673 7461 .............sta
0000020: 7475 7301 0001 4901 0006 3c69 6e69 743e tus...I...
0000030: 0100 0328 2956 0100 0443 6f64 6501 000f ...()V...Code...
0000040: 4c69 6e65 4e75 6d62 6572 5461 626c 6501 LineNumberTable.
0000050: 000d 7365 7453 7461 7475 7354 7275 6501 ..setStatusTrue.
0000060: 0003 2829 5a01 000d 5374 6163 6b4d 6170 ..()Z...StackMap
0000070: 5461 626c 6501 0004 6d61 696e 0100 1628 Table...main...(
0000080: 5b4c 6a61 7661 2f6c 616e 672f 5374 7269 [Ljava/lang/Stri
0000090: 6e67 3b29 5601 000a 536f 7572 6365 4669 ng;)V...SourceFi
00000a0: 6c65 0100 0955 7365 722e 6a61 7661 0c00 le...User.java..
00000b0: 0700 080c 0005 0006 0100 0455 7365 7201 ...........User.
00000c0: 0010 6a61 7661 2f6c 616e 672f 4f62 6a65 ..java/lang/Obje
00000d0: 6374 0021 0003 0004 0000 0001 0004 0005 ct.!............
00000e0: 0006 0000 0003 0001 0007 0008 0001 0009 ................
00000f0: 0000 0026 0002 0001 0000 000a 2ab7 0001 ...&........*...
0000100: 2a03 b500 02b1 0000 0001 000a 0000 000a *...............
0000110: 0002 0000 0001 0004 0003 0001 000b 000c ................
0000120: 0001 0009 0000 0031 0002 0001 0000 000e .......1........
0000130: 2ab4 0002 04a0 0007 04a7 0004 03ac 0000 *...............
0000140: 0002 000a 0000 0006 0001 0000 0006 000d ................
0000150: 0000 0005 0002 0c40 0100 0900 0e00 0f00 .......@........
0000160: 0100 0900 0000 1900 0000 0100 0000 01b1 ................
0000170: 0000 0001 000a 0000 0006 0001 0000 000b ................
0000180: 0001 0010 0000 0002 0011 ..........
$
Voila! Much better. A hex dump containing bytecode. Right?! RIGHT?!?
Well, kinda. If you are a newb (don’t worry, everyone is, we are all just faking it) then there are three distinct logical groupings to account for.
Hexcode Viewer Breakdown Link to heading
The Offset is on the left and is not part of the file but rather derived from the Hex editing/dumping application. You can think of the offset as the line number of a hex dump. Kind of like when you edit the source code of a file.
In the middle is the Bytecode represented using Hexadecimal. This is the actual data of the file we are editing. Eventually this is the portion we will manipulate to change our application’s behavior.
Finally there is the ASCII viewer on the right.The viewer tries its best to display ASCII when it detects and decodes the text in the file. When it can decode the ASCII text correctly, you wil find it displayed. At times though ASCII cannot be rendered. Here you’ll usually see a dot “.” indicating a special character.
Hexcode Viewer Gotcha Link to heading
Unlike source code where the line number “1” is always going to be the line number “1”, you can change the offset of your hex editor. Having the correct offset is critical. In my opinion, a problem with some GUI hex editors is that when you resize your editor window, most of them will automagically adjust the offset.
To illustrate this, here is the default xxd output of our User.class file using 16 columns as well as the output of xxd using 10 columns. This should prove that nothing has fundamentally changed about the bytecode data stored in our file. The only thing that has changed is the way the data is being displayed.
$ xxd User.class
0000000: cafe babe 0000 0033 0016 0a00 0400 1209 .......3........
0000010: 0003 0013 0700 1407 0015 0100 0673 7461 .............sta
0000020: 7475 7301 0001 4901 0006 3c69 6e69 743e tus...I...
0000030: 0100 0328 2956 0100 0443 6f64 6501 000f ...()V...Code...
0000040: 4c69 6e65 4e75 6d62 6572 5461 626c 6501 LineNumberTable.
0000050: 000d 7365 7453 7461 7475 7354 7275 6501 ..setStatusTrue.
0000060: 0003 2829 5a01 000d 5374 6163 6b4d 6170 ..()Z...StackMap
0000070: 5461 626c 6501 0004 6d61 696e 0100 1628 Table...main...(
0000080: 5b4c 6a61 7661 2f6c 616e 672f 5374 7269 [Ljava/lang/Stri
0000090: 6e67 3b29 5601 000a 536f 7572 6365 4669 ng;)V...SourceFi
00000a0: 6c65 0100 0955 7365 722e 6a61 7661 0c00 le...User.java..
00000b0: 0700 080c 0005 0006 0100 0455 7365 7201 ...........User.
00000c0: 0010 6a61 7661 2f6c 616e 672f 4f62 6a65 ..java/lang/Obje
00000d0: 6374 0021 0003 0004 0000 0001 0004 0005 ct.!............
00000e0: 0006 0000 0003 0001 0007 0008 0001 0009 ................
00000f0: 0000 0026 0002 0001 0000 000a 2ab7 0001 ...&........*...
0000100: 2a03 b500 02b1 0000 0001 000a 0000 000a *...............
0000110: 0002 0000 0001 0004 0003 0001 000b 000c ................
0000120: 0001 0009 0000 0031 0002 0001 0000 000e .......1........
0000130: 2ab4 0002 04a0 0007 04a7 0004 03ac 0000 *...............
0000140: 0002 000a 0000 0006 0001 0000 0006 000d ................
0000150: 0000 0005 0002 0c40 0100 0900 0e00 0f00 .......@........
0000160: 0100 0900 0000 1900 0000 0100 0000 01b1 ................
0000170: 0000 0001 000a 0000 0006 0001 0000 000b ................
0000180: 0001 0010 0000 0002 0011 ..........
$
$ xxd -c 10 User.class
0000000: cafe babe 0000 0033 0016 .......3..
000000a: 0a00 0400 1209 0003 0013 ..........
0000014: 0700 1407 0015 0100 0673 .........s
000001e: 7461 7475 7301 0001 4901 tatus...I.
0000028: 0006 3c69 6e69 743e 0100 ....
0000032: 0328 2956 0100 0443 6f64 .()V...Cod
000003c: 6501 000f 4c69 6e65 4e75 e...LineNu
0000046: 6d62 6572 5461 626c 6501 mberTable.
0000050: 000d 7365 7453 7461 7475 ..setStatu
000005a: 7354 7275 6501 0003 2829 sTrue...()
0000064: 5a01 000d 5374 6163 6b4d Z...StackM
000006e: 6170 5461 626c 6501 0004 apTable...
0000078: 6d61 696e 0100 1628 5b4c main...([L
0000082: 6a61 7661 2f6c 616e 672f java/lang/
000008c: 5374 7269 6e67 3b29 5601 String;)V.
0000096: 000a 536f 7572 6365 4669 ..SourceFi
00000a0: 6c65 0100 0955 7365 722e le...User.
00000aa: 6a61 7661 0c00 0700 080c java......
00000b4: 0005 0006 0100 0455 7365 .......Use
00000be: 7201 0010 6a61 7661 2f6c r...java/l
00000c8: 616e 672f 4f62 6a65 6374 ang/Object
00000d2: 0021 0003 0004 0000 0001 .!........
00000dc: 0004 0005 0006 0000 0003 ..........
00000e6: 0001 0007 0008 0001 0009 ..........
00000f0: 0000 0026 0002 0001 0000 ...&......
00000fa: 000a 2ab7 0001 2a03 b500 ..*...*...
0000104: 02b1 0000 0001 000a 0000 ..........
000010e: 000a 0002 0000 0001 0004 ..........
0000118: 0003 0001 000b 000c 0001 ..........
0000122: 0009 0000 0031 0002 0001 .....1....
000012c: 0000 000e 2ab4 0002 04a0 ....*.....
0000136: 0007 04a7 0004 03ac 0000 ..........
0000140: 0002 000a 0000 0006 0001 ..........
000014a: 0000 0006 000d 0000 0005 ..........
0000154: 0002 0c40 0100 0900 0e00 ...@......
000015e: 0f00 0100 0900 0000 1900 ..........
0000168: 0000 0100 0000 01b1 0000 ..........
0000172: 0001 000a 0000 0006 0001 ..........
000017c: 0000 000b 0001 0010 0000 ..........
0000186: 0002 0011 ....
$
How I (a programmer) think about Bytecode Link to heading
How I like to think about Bytecode is usually in two ways.
If I’m operating in a Hex editing application I usually just think of the bytecode as a multidimensional array. Lets take a look at the first two lines of our User.class dumped using 16 column formatting.
0000000: cafe babe 0000 0033 0016 0a00 0400 1209 .......3........
0000010: 0003 0013 0700 1407 0015 0100 0673 7461 .............sta
Then lets strip out the Offset and the ASCII text.
cafe babe 0000 0033 0016 0a00 0400 1209
0003 0013 0700 1407 0015 0100 0673 7461
And finally we take those bytes and create a multidimensional array using Python to illustrate this.
bytecode_multi_array = [
['ca','fe','ba','be','00','00','00','33','00','16','0a','00','04','00','12','09'],
['00','03','00','13','07','00','14','07','00','15','01','00','06','73','74','61']
]
# print array 0 which is actually the first line of our bytecode dump
print bytecode_multi_array[0]
The other way is a bit more hardcore. I simply just visualize the bytecode stream.
Take this bytecode_stream.py script I wrote.
import os
_directory = './'
_file = 'User.class'
if os.path.exists(_directory):
with open(_file, "rb") as f:
print "read file: %s" % _file
stream = f.read()
f.close
print "print the bytecode stream"
print stream.encode('hex')
If you run it from the same directory that our User.class file is stored in you’ll get the following.
$ python bytecode_stream.py
read file: User.class
print the bytecode stream
cafebabe0000003300160a000400120900030013070014070015010006737461747573010001490100063c696e69743e010003282956010004436f646501000f4c696e654e756d6265725461626c6501000d7365745374617475735472756501000328295a01000d537461636b4d61705461626c650100046d61696e010016285b4c6a6176612f6c616e672f537472696e673b295601000a536f7572636546696c65010009557365722e6a6176610c000700080c00050006010004557365720100106a6176612f6c616e672f4f626a65637400210003000400000001000400050006000000030001000700080001000900000026000200010000000a2ab700012a03b50002b100000001000a0000000a000200000001000400030001000b000c0001000900000031000200010000000e2ab4000204a0000704a7000403ac00000002000a00000006000100000006000d0000000500020c40010009000e000f00010009000000190000000100000001b100000001000a0000000600010000000b00010010000000020011
$
Essentially the bytecode is just one long string.
cafebabe0000003300160a000400120900030013070014070015010006737461747573010001490100063c696e69743e010003282956010004436f646501000f4c696e654e756d6265725461626c6501000d7365745374617475735472756501000328295a01000d537461636b4d61705461626c650100046d61696e010016285b4c6a6176612f6c616e672f537472696e673b295601000a536f7572636546696c65010009557365722e6a6176610c000700080c00050006010004557365720100106a6176612f6c616e672f4f626a65637400210003000400000001000400050006000000030001000700080001000900000026000200010000000a2ab700012a03b50002b100000001000a0000000a000200000001000400030001000b000c0001000900000031000200010000000e2ab4000204a0000704a7000403ac00000002000a00000006000100000006000d0000000500020c40010009000e000f00010009000000190000000100000001b100000001000a0000000600010000000b00010010000000020011
Conclusion Link to heading
Hopefully you have a decent understanding on how to view compiled Java Bytecode using the hex dumping program xxd.
You also should understand how, when using Python, we opened the compiled Java Bytecode file and dumped it to screen in hex format. We will be using Python to do some hacking in the future.
In Part 2 we will take a look at Java Op codes and actually manipulating the compiled Bytecode by hand using a hex editor.