Mining crypto API

The whole process of mining cryptographic API is operated by cli_process.py that is in turn parametrized by config.yml. While the configuration is exhaustively described in the configuration file itself, here we recapitulate the most important steps of the process. There are several steps to be done:

  1. Download dataset or load it from disk

  2. Decompile the APKs

  3. Detect all third-party libraries

  4. Collect crypto API usage

  5. Assign sample to malware family/category with Euphony

The output of cli_process.py is record.json that summarizes findings extracted from the dataset. This file is further processed by data preparation which further cleans it.

We comment on some problematic parts of the setup below.

Download dataset or load it from disk

This step is thoroughly described at APK Dataset.

Decompile the APKs

Should work seamlessly, provided that you sorted out the Jadx dependency and installed the patched androguard from requirements.txt.

Detect all third-party libraries

Provided that you sorted out the LiteRadar dependency, you just have to specify the path to the LiteRadar binary in the experiment config using the following parameter:

literadar_path: '/path/to/LiteRadar/LiteRadar/literadar.py' # Path to the LiteRadar python2 script

Assign sample to malware family/category with Euphony

Possibly, you can also let Euphony label your malware samples with family names and types. Note that this functionality will probably fail when applied outside of the Androzoo dataset (more precisely it will result in None labels). In Androzoo dataset, it will only work with samples 2012-2017, None labels will be assigned elsewhere.

In order to integrate Euphony with cli_process.py, you have to download the labels from Androzoo website and navigate the cli_process.py to these files using the following configuration keys:

euphony_names_path: '/path/to/euphony/names_proposed.json'
euphony_types_path: '/path/to/euphony/types_proposed.json'