Reverse-engineering the network protocol used by a Java desktop application
I wanted to automate stock trading with my broker which has a stock trading application written in Java 6.
But I changed. Now I think financial economy is just a big Ponzi scheme full of rent-seeking, full of people who want to get a slice of the pie of real economy without growing the pie of real economy, full of people who just want to extract value from society without creating value, people who only take but don’t give, people who only shuffle money around.
- I want to reverse-engineer a Java 6 stock trading application.
- The app’s license agreement doesn’t mention anything about reverse engineering.
- I only want the network protocol. I don’t need to reverse the GUI.
- Tools
- The tool java-callgraph is mentioned often, such as in SO 29382231 and SO 4951517.
- AspectJ Aspect Oriented Programming, bytecode instrumentation, bytecode manipulation
- IntelliJ IDEA, an IDE that comes with a class file decompiler
- Helios decompiler: “All-in-one Java reverse engineering tool”
- Tracing runtime method calls using AspectJ
- Download the latest stable release of AspectJ (1.9.1 as of 2018-04-20) from the official download page.
It’s an installer.
- Install it using
java -jar aspectj-1.9.1.jar
(change the filename if required).- Read the
README-AspectJ.html
whose exact location is displayed at the end of installation.
- Read the
- Read The AspectJ™ development environment guide, Chapter 5, “Load-time-weaving”.
- Install it using
- The plan
- Print something everytime a method is called.
- Insert bytecode at the beginning of every method.
- Exclude
java.*
andjavax.*
classes, but includejava.net.*
andjava.nio.*
. Also include everything else.
- What is the minimum knowledge necessary to make sense of AspectJ?
Where is the crash course?
- What can it do that we need?
- Using AspectJ, we can insert tracing code at the beginning of every method at class load time.
- “Aspect weaving” is bytecode manipulation.
- short read, baeldung.com: Intro to AspectJ
- long read, The AspectJ™ Programming Guide
- Downloads: Do we have to do this? Can we use Maven instead?
- FAQ
- What can it do that we need?
- Sample codes
- https://mathewjhall.wordpress.com/2011/03/31/tracing-java-method-execution-with-aspectj/
- Task-oriented AspectJ cookbook: 4.1. Capturing a Method Call
- https://www.yegor256.com/2014/06/01/aop-aspectj-java-method-logging.html
- Tracing method calls in Java with JDB
- 2007, article, Five ways for tracing Java execution
- https://stackoverflow.com/questions/19850695/does-java-have-any-mechanism-for-a-vm-to-trace-method-calls-on-itself-without-u
- Use Javassist?
- https://stackoverflow.com/questions/49159666/how-to-intercept-each-method-call-within-given-method-using-spring-aop-or-aspect
- https://www.rhyous.com/2012/05/26/aop-logging-all-method-calls-and-executions-in-java-with-aspectj/
- Eclipse Trace.aj sample file, github
- https://blog.csanchez.org/2005/03/27/tracing-an-application-using-aspec/
- Download the latest stable release of AspectJ (1.9.1 as of 2018-04-20) from the official download page.
It’s an installer.
- Increase log4j verbosity
- Trading login
2018-07-27T15:11:09.132Z main void esmart.trading.engine.TradingComm.a(i, g) -> void esmart.trading.engine.TradingComm.m() 2018-07-27T15:12:06.743Z Thread-37 void esmart.trading.engine.TradingComm.a(String, String, d) -> void esmart.trading.engine.TradingComm.m()
esmart.trading.engine.TradingComm:a(i,g)
starts two threads:- The thread
o
(of typea.a.a.b
). - The thread
p
(of typea.a.a.c
).
- The thread
- Inferences
- The netty major version is 3.
esmart.feed.engine.network.d:a
determines our public IP address.esmart.feed.engine.ui.qo:n
Socket.connects to somewhere.- Initializing netty
a.a.d.e:h()
initializes netty client.a.a.d.e:a(String,String)
connects.- It uses LengthFieldBasedFrameDecoder
- http://netty.io/3.6/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html
var1.addLast("frameDecoder", new LengthFieldBasedFrameDecoder(2147483647, 0, 4, 0, 4, true));
INFO 2018-07-25 22:14:40,830 [New I/O worker #1] a.a.d.e Connected to : esmartbnis.com/202.129.186.235:62229
- server IP addresses
- 202.129.186.235
- 180.178.108.230
- server IP addresses
a.a.d.e
extendsIdleStateAwareChannelHandler
.- After the frame is decoded, the frame payload (without the length) is passed to the handler.
a.a.d.e:a(byte[])
anda.a.d.h:a
are supposed to be inverses of each other.a.a.d.h:a()
encrypts a serialized MsgPack array usinga.a.c.c:a(byte[])
, a polyalphabetic substitution cipher that is its own inverse: applying the cipher twice (with the same key) gives the original message.- Application uses callback.
Control is inverted.
Netty calls the application.
- http://netty.io/3.6/api/org/jboss/netty/handler/timeout/IdleStateAwareChannelHandler.html
- Servers (open ports) at esmartbnis.com
- News server uses port 843. Input is XML. Output is JSON.
- Update server uses FTP (port 21 for control).
- Feed server uses port 62229.
- Traffic to server is compressed and encrypted.
- Traffic from server is compressed but not encrypted.
- Encryption
- Client generates symmetric key S (in practice the String representation of a random UUID).
- Client constructs and encodes a STOMP frame CONNECT with login, passcode, and S.
- Client encrypts the bytes using server RSA 1024-bit public key hardcoded in client, in ECB mode, with PKCS#1 (v1.5) padding.
- Client encrypts the following frames with S using
a.a.c.c:a(byte[])
.
- Trading server uses port 63339.
a.a.d.e:channelConnected
callsa.a.d.e:d()
which does login by constructing the following message:h var1 = new h(this, "CONNECT"); var1.a("login", this.l); var1.a("passcode", this.m); this.a(var1);
a.a.d.h
is message frame/format/model/struct/POJO/bean, used by both client and server.a.a.d.h:a()
serializes, returning a ChannelBuffer.a.a.d.e.a(byte[])
deserializes server response, returning ana.a.d.h
.
- The protocol seems to be close to STOMP as implemented by ActiveMQ Apollo,
- Similarity: SUBSCRIBE has a
selector
header. - Difference: SEND has a misspelled
replyto
header. Apollo hasreply-to
.
- Similarity: SUBSCRIBE has a
- The protocol (as decoded by
a.a.d.e:a(byte[])
):- Make a STOMP frame/message.
- Encode the STOMP command as the first element in a MsgPack array.
- Encode the STOMP header values in an application-predefined order as the rest of the elements of the MsgPack array. Header names aren’t sent over the network. Header order is hard-coded in the application.
- Serialize the MsgPack array into byte array X.
- ZIP the byte array X into Y.
- Let the length of Y be 32-bit big-endian integer N.
- The byte array NY is what is sent over the network.
- Some data such as login/passcode are encrypted.
a.a.c.d:a(byte[],PublicKey)
encrypts the byte array usingRSA/ECB/PKCS1Padding
cipher.- Java libpcap wrapper
- When the client receives a server response:
a.a.d.e:messageReceived
gets executed.a.a.d.e:b(byte[])
reacts to server response. Probably changes some fields.a.a.d.e:a(byte[])
decodes server response bytes into a STOMP message. It’s a custom binary encoding? Are the authors overlaying STOMP on a legacy protocol?- The class
a.a.c.e
does ZIP compression/decompression. - Package
a.a.b
is an unknown compressor/decompressor. - Field
a.a.c.e:B
is compression method. 1 means compression ina.a.b
package. 2 means ZIP. - Ubuntu 14.04:
zlib-flate
CLI fromqpdf
package, compress/decompress stdin to stdout
- The class
- Then probably a STOMP SEND/MESSAGE frame encapsulates a FIX (Financial Information Exchange) message.
- Then
a.a.d.e:a(a.a.d.h)
sends it to the server.
a.a.c.a
seems to be a base64 decoder copied from the Internet.- 2003-07-22: http://www.cs.huji.ac.il/~dbi/ex2/Base64Coder.java
- https://github.com/chdh
- http://www.source-code.biz/base64coder/java/
- https://github.com/chdh
- 2012-05-24: SO 469695 answer 10736154.
- 2013-11-09: https://github.com/mixpanel/mixpanel-android/blob/master/src/main/java/com/mixpanel/android/util/Base64Coder.java
- copyright year goes back to 2003
- 2003-07-22: http://www.cs.huji.ac.il/~dbi/ex2/Base64Coder.java
- Schedule problems
- The server is down every 00:00 – 01:00 UTC+7.
- The traffic is only interesting in the trading hours.
- We can’t just reverse-engineer the program at any convenient time.
- Capture the traffic.
- Analyze the traffic later when there is time.
- We can’t just reverse-engineer the program at any convenient time.
- The server is also accessible via http://esmartbnis.com. It uses HTTP, JavaScript, WebSocket. It should use the same protocol as the Java client.
- Some strings to find?
loading news failed, your connection / server not ready
- The string
63339
is inTradingComm
.
esmart.feed.engine.network.d
gets the client IP address. It’s copied from the Internet.- created on 2013-12-06 https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java/20418809#20418809
- created on 2008-03-04 https://issues.apache.org/jira/browse/JCS-40
- created on 2013-12-06 https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java/20418809#20418809
- How do we write our own ClassLoader using Javassist, and use it as the boot classloader?
- The use cases we are interested in
- Check application updates.
- Check news.
- Log in.
- Get stock data?
- Place order?
- Producing static call graph Using
java-callgraph
- In hindsight, this step is unnecessary. We should have begun with runtime method tracing using AspectJ and bytecode decompiling using IntelliJ IDEA.
- Building, assuming JDK 8 and Maven 3 are installed:
git clone git@github.com:gousiosg/java-callgraph.git cd java-callgraph mvn package
- Producing static call graph of classes in JAR files:
java -jar target/javacg-0.1-SNAPSHOT-static.jar JAR1 JAR2 ...
- The graph representation is adjacency list.
One line of text represents one edge in the call graph.
The format is documented at its GitHub page.
A line
X Y
means the method X calls the method Y. - It only works with JARs. It doesn’t work with directories.
- The graph representation is adjacency list.
One line of text represents one edge in the call graph.
The format is documented at its GitHub page.
A line
- We can also produce a class dependency graph using the
jdeps
tool that comes with the JDK. - Discerning the static call graph
egrep -i 'java.net|netty|crypt|msgpack|buffer' static | egrep -v 'java.lang|java.awt|javax.swing' | sort | less
- Which methods have anything to do with encryption?
awk '$2 ~ /crypt/' static | sort
- The class
a.a.c.d
is the only class that usesjavax.crypto.*
.
- Which methods use the class
a.a.c.d
?awk '$2 ~ /a.a.c.d/' static | sort
- What other methods are interesting?
java.net.Socket.get(Input|Output)Stream
netty
- After reading the static call graph, I think this might be how the app sends a message:
- Construct plaintext message using msgpack.
- Encrypt the plaintext, producing ciphertext, probably a ByteBuffer.
- Send the ciphertext with java.net or netty.
- The plan:
- Do we need a call graph or call trace?
Will a trace suffice?
- A trace is a line that consists of time, thread id, qualified class name, and method signature.
- Use AOP (aspect-oriented programming) that uses bytecode manipulation such as AspectJ.
- Compute static call graph.
- Find patterns in the call graph.
Find calls from non-network class to network class.
- A network class is a class whose qualified name matches any of these patterns:
java.net.*
java.nio.*
*.netty.*
- A network class is a class whose qualified name matches any of these patterns:
- Don’t exclude any class. Filter it later instead.
- Find patterns in the call graph.
Find calls from non-network class to network class.
- https://stackoverflow.com/questions/29451704/using-javassist-to-log-method-calls-and-argument-values-how-to-make-a-logger-cl
- AspectJ load-time weaving (LTW)
- https://stackoverflow.com/questions/13781372/java-aspect-oriented-programming-runtime-aspect-weaving-and-class-loading-time
- even more intrusive
- 2008, article, “Aspect Weaving in Standard Java Class Libraries”, pdf
- Find tools.
- 2008, article, “The Rigi Reverse Engineering Environment”, pdf
- https://en.wikipedia.org/wiki/Rigi_(software)
- requires source code
- 2001, article, “Shimba—An environment for reverse engineering Java software systems”, abstract
- The abstract sounds promising, but where is the paper? Where is the source code? It’s not on GitHub. That article is locked up somewhere and the source code is nowhere in sight. That research practically doesn’t exist; it practically never happened. What a waste.
- 2018, article, “Scientists should be solving problems, not struggling to access journals”, html
- 2002, article, “Dynamic Analysis For Reverse Engineering and Program Understanding”, pdf
- Generating Java dynamic call graph - no license, course assignment https://github.com/EmilyShepherd/Dynamic-Call-Graph
- https://github.com/dkarv/jdcallgraph
- https://stackoverflow.com/questions/6540423/simple-dynamic-call-graphs-in-java
- https://erik.doernenburg.com/2008/09/call-graph-visualisation-with-aspectj-and-dot/
- 2008, article, “The Rigi Reverse Engineering Environment”, pdf
- Find some information about how to debug Java bytecode.
- 2018-07-22 decision: Write a Java program that uses the JDI (Java Debug Interface).
- 2018-07-23 hindsight: That was a bad decision. We should have computed the call graph first.
- References we will often use
- Details:
- Things that come with JDK 7:
- JPDA (Java Platform Debug Architecture)
- JPDA examples that come with JDK
- JPDA = JVMTI + JDWP + JDI
- JVMTI = Java Virtual Machine Tool Interface
- JDWP = Java Debug Wire Protocol
- JDI = Java Debug Interface
- “We recommend the JDI layer for all debugger development.” (source)
- This enables us to write programs that debug programs.
- JPDA (Java Platform Debug Architecture)
- Sample code
- There doesn’t seem to be any official tutorials.
- dzone.com: Generating a minable event stream with JDI
- Java: Using JPDA to write a debugger
- JDK 7 jdb source code.
- Things that come with JDK 7:
- 2018-07-22 decision: Write a Java program that uses the JDI (Java Debug Interface).
- Use JDB to debug the program.
- Or perhaps Eclipse? crowdstrike.com: Native Java Bytecode Debugging without Source Code
- Find out where to focus.
- Map network-related call graph.
- Put a breakpoint at Socket.getOutputStream.
- It uses netty and thread pools, encumbering the recovery of the interesting stack frames.
- It uses a serialization library
msgpack
. Perhaps put breakpoints there?
- Map network-related call graph.
- Write a program to dump the call graph of every call to netty class from non-netty class.
- Do we need a call graph or call trace?
Will a trace suffice?
- Obstacles and woes:
- There is no
chdir
in Java standard library, and it seems that there will never be. See WONTFIX: JDK-4045688: “Add chdir or equivalent notion of changing working directory”. - Method entry event is abysmally slow because it forces bytecode interpretation (SO 751105). Others experience this problem too: NetBeans 47759, Eclipse 90870, SO 48114042.
- JDI implementation oversight
- Breakpointing an abstract method throws a NullPointerException with no message, because
m.location()
returnsnull
which can’t be passed tocreateBreakpointRequest
which doesn’t check for nulls.
- Breakpointing an abstract method throws a NullPointerException with no message, because
- There is no
- Do I have to do this time-consuming stuff? What are the alternatives?
- I can buy data.
- I can ask the company to provide some documentation about their protocols.
- Whom should I talk to?
- I can ask my sales representative whether he knows the people who made or is maintaining the application.
- Someone in a local Java User Group might know something.
- https://groups.google.com/forum/#!forum/jugi
- The mailing list has devolved into a job board.
- Is there a chat? IRC?
- https://www.facebook.com/groups/ForumJavaIndonesia/
- https://groups.google.com/forum/#!forum/jugi
- People who have made a similar application
- 2008 PT Trimegah Securities Tbk https://www.linkedin.com/in/syahreza-pahlevi-ginting-0041b52/
- 2016 BCA Sekuritas https://www.linkedin.com/in/kusnandartoni/
- 2010 https://www.linkedin.com/in/elvino-a07277b/
- unclear linkedin profile
- 2013-2016 Bareksa, 2012 Limas https://www.linkedin.com/in/arisman-26672a52/
- Whom should I talk to?
- I can try to convince the company to open-source this platform.
- Somewhat unlikely in this country that is behind the US by 10 years or perhaps even more.
- I can switch to the company’s other product that is a web app that uses websockets.
- Which one is easier to reverse-engineer?
- These are possible, but I would rather not do these:
- Get a job in a securities company, and then get the required information.
- The plan:
- Try existing deobfuscators, supercompilers, disassemblers, debuggers, and other tools.
- Can we reverse-engineer by supercompilation or partial evaluation?
- search keywords to try
- java bytecode optimizer
- java deobfuscator
- metacompilation, supercompilation, partial evaluation, program specialization, etc. https://everipedia.org/wiki/Metacompilation/
- https://www.reddit.com/r/REMath/
- https://security.stackexchange.com/questions/29866/reverse-engineering-and-java
- Transpile Java bytecode to JavaScript, and use jsnice?
- supercompile Java programs
- “Ongoing work on Supercompilation of Java code (or supercompilation in general)?” http://lambda-the-ultimate.org/node/2739
- “A Java Supercompiler and Its Application to Verification of Cache-Coherence Protocols” https://link.springer.com/chapter/10.1007/978-3-642-11486-1_16
- 2002, “Supercompiling Java Programs” http://goertzel.org/papers/SupercompilingJavaMay2002.htm
- What are the differences between supercompilation and partial evaluation?
- Reverse engineering Java
- 2012, article, “A Patterns Based Reverse Engineering Approach for Java Source Code”, description
- Reverse engineering Java using Eclipse
- Java agents
- https://blog.takipi.com/double-agent-java-vs-native-agents/
- Related JEPs (Java Enhancement Proposals)
- 2018-07-23: last updated in 2015, created in 2011, JEP 159: Enhanced Class Redefinition, JDK-8046149
- https://en.wikipedia.org/wiki/List_of_electronic_trading_protocols