I wanted to automate stock trading with my broker which has a stock trading application written in Java 6.

But I changed. Now I think financial economy is just a big Ponzi scheme full of rent-seeking, full of people who want to get a slice of the pie of real economy without growing the pie of real economy, full of people who just want to extract value from society without creating value, people who only take but don’t give, people who only shuffle money around.

  • I want to reverse-engineer a Java 6 stock trading application.
    • The app’s license agreement doesn’t mention anything about reverse engineering.
    • I only want the network protocol. I don’t need to reverse the GUI.
  • Tools
    • The tool java-callgraph is mentioned often, such as in SO 29382231 and SO 4951517.
    • AspectJ Aspect Oriented Programming, bytecode instrumentation, bytecode manipulation
    • IntelliJ IDEA, an IDE that comes with a class file decompiler
    • Helios decompiler: “All-in-one Java reverse engineering tool”
  • Tracing runtime method calls using AspectJ
    • Download the latest stable release of AspectJ (1.9.1 as of 2018-04-20) from the official download page. It’s an installer.
    • The plan
      • Print something everytime a method is called.
      • Insert bytecode at the beginning of every method.
      • Exclude java.* and javax.* classes, but include java.net.* and java.nio.*. Also include everything else.
    • What is the minimum knowledge necessary to make sense of AspectJ? Where is the crash course?
    • Sample codes
      • https://mathewjhall.wordpress.com/2011/03/31/tracing-java-method-execution-with-aspectj/
      • Task-oriented AspectJ cookbook: 4.1. Capturing a Method Call
      • https://www.yegor256.com/2014/06/01/aop-aspectj-java-method-logging.html
      • Tracing method calls in Java with JDB
      • 2007, article, Five ways for tracing Java execution
      • https://stackoverflow.com/questions/19850695/does-java-have-any-mechanism-for-a-vm-to-trace-method-calls-on-itself-without-u
        • Use Javassist?
      • https://stackoverflow.com/questions/49159666/how-to-intercept-each-method-call-within-given-method-using-spring-aop-or-aspect
      • https://www.rhyous.com/2012/05/26/aop-logging-all-method-calls-and-executions-in-java-with-aspectj/
      • Eclipse Trace.aj sample file, github
      • https://blog.csanchez.org/2005/03/27/tracing-an-application-using-aspec/
  • Increase log4j verbosity
  • Trading login
    2018-07-27T15:11:09.132Z	main	void esmart.trading.engine.TradingComm.a(i, g) -> void esmart.trading.engine.TradingComm.m()
    2018-07-27T15:12:06.743Z	Thread-37	void esmart.trading.engine.TradingComm.a(String, String, d) -> void esmart.trading.engine.TradingComm.m()
    
    • esmart.trading.engine.TradingComm:a(i,g) starts two threads:
      • The thread o (of type a.a.a.b).
      • The thread p (of type a.a.a.c).
  • Inferences
    • The netty major version is 3.
    • esmart.feed.engine.network.d:a determines our public IP address.
    • esmart.feed.engine.ui.qo:n Socket.connects to somewhere.
    • Initializing netty
      • a.a.d.e:h() initializes netty client.
      • a.a.d.e:a(String,String) connects.
      • It uses LengthFieldBasedFrameDecoder
        • http://netty.io/3.6/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html
        • var1.addLast("frameDecoder", new LengthFieldBasedFrameDecoder(2147483647, 0, 4, 0, 4, true));
        • INFO 2018-07-25 22:14:40,830 [New I/O worker #1] a.a.d.e Connected to : esmartbnis.com/202.129.186.235:62229
          • server IP addresses
            • 202.129.186.235
            • 180.178.108.230
      • a.a.d.e extends IdleStateAwareChannelHandler.
      • After the frame is decoded, the frame payload (without the length) is passed to the handler.
      • a.a.d.e:a(byte[]) and a.a.d.h:a are supposed to be inverses of each other.
      • a.a.d.h:a() encrypts a serialized MsgPack array using a.a.c.c:a(byte[]), a polyalphabetic substitution cipher that is its own inverse: applying the cipher twice (with the same key) gives the original message.
      • Application uses callback. Control is inverted. Netty calls the application.
        • http://netty.io/3.6/api/org/jboss/netty/handler/timeout/IdleStateAwareChannelHandler.html
      • Servers (open ports) at esmartbnis.com
        • News server uses port 843. Input is XML. Output is JSON.
        • Update server uses FTP (port 21 for control).
        • Feed server uses port 62229.
          • Traffic to server is compressed and encrypted.
          • Traffic from server is compressed but not encrypted.
          • Encryption
            • Client generates symmetric key S (in practice the String representation of a random UUID).
            • Client constructs and encodes a STOMP frame CONNECT with login, passcode, and S.
            • Client encrypts the bytes using server RSA 1024-bit public key hardcoded in client, in ECB mode, with PKCS#1 (v1.5) padding.
            • Client encrypts the following frames with S using a.a.c.c:a(byte[]).
        • Trading server uses port 63339.
      • a.a.d.e:channelConnected calls a.a.d.e:d() which does login by constructing the following message:
        h var1 = new h(this, "CONNECT");
        var1.a("login", this.l);
        var1.a("passcode", this.m);
        this.a(var1);
        
        • a.a.d.h is message frame/format/model/struct/POJO/bean, used by both client and server.
          • a.a.d.h:a() serializes, returning a ChannelBuffer.
          • a.a.d.e.a(byte[]) deserializes server response, returning an a.a.d.h.
        • The protocol seems to be close to STOMP as implemented by ActiveMQ Apollo,
          • Similarity: SUBSCRIBE has a selector header.
          • Difference: SEND has a misspelled replyto header. Apollo has reply-to.
      • The protocol (as decoded by a.a.d.e:a(byte[])):
        • Make a STOMP frame/message.
        • Encode the STOMP command as the first element in a MsgPack array.
        • Encode the STOMP header values in an application-predefined order as the rest of the elements of the MsgPack array. Header names aren’t sent over the network. Header order is hard-coded in the application.
        • Serialize the MsgPack array into byte array X.
        • ZIP the byte array X into Y.
        • Let the length of Y be 32-bit big-endian integer N.
        • The byte array NY is what is sent over the network.
        • Some data such as login/passcode are encrypted.
      • a.a.c.d:a(byte[],PublicKey) encrypts the byte array using RSA/ECB/PKCS1Padding cipher.
      • Java libpcap wrapper
        • pcap4j, github
        • jnetpcap
        • pcap4j looks best, so we pick it.
          • https://github.com/java-native-access/jna/issues/281
            • -Djna.nosys=true
      • When the client receives a server response:
        • a.a.d.e:messageReceived gets executed.
          • a.a.d.e:b(byte[]) reacts to server response. Probably changes some fields.
            • a.a.d.e:a(byte[]) decodes server response bytes into a STOMP message. It’s a custom binary encoding? Are the authors overlaying STOMP on a legacy protocol?
              • The class a.a.c.e does ZIP compression/decompression.
              • Package a.a.b is an unknown compressor/decompressor.
              • Field a.a.c.e:B is compression method. 1 means compression in a.a.b package. 2 means ZIP.
              • Ubuntu 14.04: zlib-flate CLI from qpdf package, compress/decompress stdin to stdout
        • Then probably a STOMP SEND/MESSAGE frame encapsulates a FIX (Financial Information Exchange) message.
      • Then a.a.d.e:a(a.a.d.h) sends it to the server.
    • a.a.c.a seems to be a base64 decoder copied from the Internet.
      • 2003-07-22: http://www.cs.huji.ac.il/~dbi/ex2/Base64Coder.java
        • https://github.com/chdh
          • http://www.source-code.biz/base64coder/java/
      • 2012-05-24: SO 469695 answer 10736154.
      • 2013-11-09: https://github.com/mixpanel/mixpanel-android/blob/master/src/main/java/com/mixpanel/android/util/Base64Coder.java
        • copyright year goes back to 2003
    • Schedule problems
      • The server is down every 00:00 – 01:00 UTC+7.
      • The traffic is only interesting in the trading hours.
        • We can’t just reverse-engineer the program at any convenient time.
          • Capture the traffic.
          • Analyze the traffic later when there is time.
    • The server is also accessible via http://esmartbnis.com. It uses HTTP, JavaScript, WebSocket. It should use the same protocol as the Java client.
    • Some strings to find?
      • loading news failed, your connection / server not ready
      • The string 63339 is in TradingComm.
    • esmart.feed.engine.network.d gets the client IP address. It’s copied from the Internet.
      • created on 2013-12-06 https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java/20418809#20418809
        • created on 2008-03-04 https://issues.apache.org/jira/browse/JCS-40
  • How do we write our own ClassLoader using Javassist, and use it as the boot classloader?
  • The use cases we are interested in
    • Check application updates.
    • Check news.
    • Log in.
    • Get stock data?
    • Place order?
  • Producing static call graph Using java-callgraph
    • In hindsight, this step is unnecessary. We should have begun with runtime method tracing using AspectJ and bytecode decompiling using IntelliJ IDEA.
    • Building, assuming JDK 8 and Maven 3 are installed:
      git clone git@github.com:gousiosg/java-callgraph.git
      cd java-callgraph
      mvn package
      
    • Producing static call graph of classes in JAR files:
      java -jar target/javacg-0.1-SNAPSHOT-static.jar JAR1 JAR2 ...
      
      • The graph representation is adjacency list. One line of text represents one edge in the call graph. The format is documented at its GitHub page. A line X Y means the method X calls the method Y.
      • It only works with JARs. It doesn’t work with directories.
    • We can also produce a class dependency graph using the jdeps tool that comes with the JDK.
    • Discerning the static call graph
      • egrep -i 'java.net|netty|crypt|msgpack|buffer' static | egrep -v 'java.lang|java.awt|javax.swing' | sort | less
      • Which methods have anything to do with encryption?
        • awk '$2 ~ /crypt/' static | sort
        • The class a.a.c.d is the only class that uses javax.crypto.*.
      • Which methods use the class a.a.c.d?
        • awk '$2 ~ /a.a.c.d/' static | sort
      • What other methods are interesting?
        • java.net.Socket.get(Input|Output)Stream
        • netty
      • After reading the static call graph, I think this might be how the app sends a message:
        • Construct plaintext message using msgpack.
        • Encrypt the plaintext, producing ciphertext, probably a ByteBuffer.
        • Send the ciphertext with java.net or netty.
  • The plan:
    • Do we need a call graph or call trace? Will a trace suffice?
      • A trace is a line that consists of time, thread id, qualified class name, and method signature.
      • Use AOP (aspect-oriented programming) that uses bytecode manipulation such as AspectJ.
    • Compute static call graph.
      • Find patterns in the call graph. Find calls from non-network class to network class.
        • A network class is a class whose qualified name matches any of these patterns:
          • java.net.*
          • java.nio.*
          • *.netty.*
      • Don’t exclude any class. Filter it later instead.
    • https://stackoverflow.com/questions/29451704/using-javassist-to-log-method-calls-and-argument-values-how-to-make-a-logger-cl
    • AspectJ load-time weaving (LTW)
      • https://stackoverflow.com/questions/13781372/java-aspect-oriented-programming-runtime-aspect-weaving-and-class-loading-time
      • even more intrusive
        • 2008, article, “Aspect Weaving in Standard Java Class Libraries”, pdf
    • Find tools.
      • 2008, article, “The Rigi Reverse Engineering Environment”, pdf
        • https://en.wikipedia.org/wiki/Rigi_(software)
        • requires source code
      • 2001, article, “Shimba—An environment for reverse engineering Java software systems”, abstract
        • The abstract sounds promising, but where is the paper? Where is the source code? It’s not on GitHub. That article is locked up somewhere and the source code is nowhere in sight. That research practically doesn’t exist; it practically never happened. What a waste.
      • 2018, article, “Scientists should be solving problems, not struggling to access journals”, html
      • 2002, article, “Dynamic Analysis For Reverse Engineering and Program Understanding”, pdf
      • Generating Java dynamic call graph - no license, course assignment https://github.com/EmilyShepherd/Dynamic-Call-Graph
        • https://github.com/dkarv/jdcallgraph
        • https://stackoverflow.com/questions/6540423/simple-dynamic-call-graphs-in-java
      • https://erik.doernenburg.com/2008/09/call-graph-visualisation-with-aspectj-and-dot/
    • Find some information about how to debug Java bytecode.
    • Use JDB to debug the program.
    • Find out where to focus.
      • Map network-related call graph.
        • Put a breakpoint at Socket.getOutputStream.
        • It uses netty and thread pools, encumbering the recovery of the interesting stack frames.
        • It uses a serialization library msgpack. Perhaps put breakpoints there?
    • Write a program to dump the call graph of every call to netty class from non-netty class.
  • Obstacles and woes:
  • Do I have to do this time-consuming stuff? What are the alternatives?
    • I can buy data.
    • I can ask the company to provide some documentation about their protocols.
      • Whom should I talk to?
        • I can ask my sales representative whether he knows the people who made or is maintaining the application.
        • Someone in a local Java User Group might know something.
          • https://groups.google.com/forum/#!forum/jugi
            • The mailing list has devolved into a job board.
          • Is there a chat? IRC?
          • https://www.facebook.com/groups/ForumJavaIndonesia/
        • People who have made a similar application
          • 2008 PT Trimegah Securities Tbk https://www.linkedin.com/in/syahreza-pahlevi-ginting-0041b52/
          • 2016 BCA Sekuritas https://www.linkedin.com/in/kusnandartoni/
          • 2010 https://www.linkedin.com/in/elvino-a07277b/
          • unclear linkedin profile
            • 2013-2016 Bareksa, 2012 Limas https://www.linkedin.com/in/arisman-26672a52/
    • I can try to convince the company to open-source this platform.
      • Somewhat unlikely in this country that is behind the US by 10 years or perhaps even more.
    • I can switch to the company’s other product that is a web app that uses websockets.
      • Which one is easier to reverse-engineer?
    • These are possible, but I would rather not do these:
      • Get a job in a securities company, and then get the required information.
  • The plan:
    • Try existing deobfuscators, supercompilers, disassemblers, debuggers, and other tools.
  • Can we reverse-engineer by supercompilation or partial evaluation?
  • search keywords to try
    • java bytecode optimizer
    • java deobfuscator
    • metacompilation, supercompilation, partial evaluation, program specialization, etc. https://everipedia.org/wiki/Metacompilation/
  • https://www.reddit.com/r/REMath/
  • https://security.stackexchange.com/questions/29866/reverse-engineering-and-java
  • Transpile Java bytecode to JavaScript, and use jsnice?
  • supercompile Java programs
    • “Ongoing work on Supercompilation of Java code (or supercompilation in general)?” http://lambda-the-ultimate.org/node/2739
    • “A Java Supercompiler and Its Application to Verification of Cache-Coherence Protocols” https://link.springer.com/chapter/10.1007/978-3-642-11486-1_16
    • 2002, “Supercompiling Java Programs” http://goertzel.org/papers/SupercompilingJavaMay2002.htm
      • What are the differences between supercompilation and partial evaluation?
  • Reverse engineering Java
  • https://en.wikipedia.org/wiki/List_of_electronic_trading_protocols