Improve Your Forensic Analyses with hashlookup

Improve Your Forensic Analyses with hashlookup

Alexandre Dulaunoy a@foo.be

Introduction

For several decades, forensic analyses in cybersecurity have relied on known software hash sources. These sources are not numerous. Most investigators and security researchers use sources like the National Software Reference Library (NSRL) and its Reference Data Set (RDS) to distinguish known files from unknown ones. For several years at CIRCL, it became evident that we were finding it increasingly difficult to sort files using hash databases like NSRL during investigations on compromised systems. The reasons are multiple:

To overcome these difficulties, hashlookup was developed as a free-access service, available to the community with a series of open-source tools to facilitate investigators’ work. This article covers the creation of the service, the included software catalogs, and how to use it to improve and facilitate forensic investigations on compromised systems.

From Supply Chain Attacks to Forensic Analyses

Adversaries manipulate software distribution sources, as seen in several recent attacks (e.g., SolarWinds). This attack technique, referenced in the ATT&CK model as “Supply Chain Compromise (T1195)”, is accompanied by a countermeasure to detect them: verifying files against a hash database. But is this realistic? Can we easily find a file published by a software vendor in known file databases? The situation is not simple; there are a few historical databases, but they are not maintained and cover only a negligible part of all software releases.

During cyber incidents, acquiring technical evidence (disks, RAM, etc.) is a crucial step. These evidences per machine can contain more than 100,000 files related to installed programs, system files, user files, or hidden files. For an analyst, all these elements are unknowns and can become evidence during a technical investigation. Contextualizing these files helps to eliminate doubts or better understand the relationships between certain files and software installations. Many adversaries increasingly use existing tools. It is not uncommon to see standard tools used during the infection of a Linux system, such as netcat, socat, sshd, but also in Windows compromises during lateral movements.

Limitations of Existing Solutions - History

For over twenty years, some reference databases have existed, like the National Software Reference Library (NSRL) or KFFs (“Known File Filters”) integrated into some proprietary forensic solutions. These solutions no longer meet the forensic needs of incident response teams. There are several reasons for this:

CIRCL hashlookup

We had this problem of known file databases, and it had become a recurring question within the CIRCL team. It seemed logical to try to solve this problem and help the community. Hashlookup consists of two parts: a collection part to build the database and a publicly accessible API interface to search by cryptographic hash. The collection is continuously performed from multiple sources such as:

The import includes hashing (MD5, SHA-1, SHA-256, ssdeep, TLSH) of all files contained in each publicly distributed distribution or package. Several billion hashes are added to hashlookup per month and stored in a key/value (k/v) database using RocksDB. The database structure is built to support numerous queries with fast response times.

How to Use hashlookup

The database of known hashes is accessible via a ReSTful API. The interface is documented in OpenAPI, and all API endpoints are accessible at https://hashlookup.circl.lu.

If you want to verify a SHA-1 hash, a simple request with curl suffices:

curl https://hashlookup.circl.lu/lookup/sha1/732458574c63c3790cad093a36eadfb990d11ee6 | jq .
{
  "FileName": "snap-hashlookup-import/bin/ls",
  "FileSize": "142144",
  "MD5": "E7793F15C2FF7E747B4BC7079F5CD4F7",
  "RDS:package_id": "294806",
  "SHA-1": "732458574C63C3790CAD093A36EADFB990D11EE6",
  "SHA-256": "1E39354A6E481DAC48375BFEBB126FD96AED4E23BAB3C53ED6ECF1C5E4D5736D",
  "SHA-512": "233382698C722F0AF209865F7E998BC5A0A957CA8389E8A84BA4172F2413BEA1889DD79B12607D9577FD2FC17F300C8E7F223C2179F66786E5A11E28F4D68E53",
  "SSDEEP": "1536:BgfDyKo9d0mLrTpjQ2xioEbuGMC0kDLmLUFqpfgBLO+qDutbxHFb65RRnSULS0pF:BADnGd0mxst7DLmg0OBLIupbn0pJqN",
  "TLSH": "T178D32C07F15308BCC5D1C071865B9262BA31BC599332263F3A8CF6791F66F795B7AA20",
  "insert-timestamp": "1712941499.8515584",
  "mimetype": "application/x-sharedlib",
  "source": "snap:AUhqNxroxCLKaqLTwtZGKUMbBpAe5EU4_221",
  "hashlookup:parent-total": 69,
  "parents": [
    {
      "SHA-1": "00363CBD7E44AA37137E8A6E797507704EF111AC",
      "snap-authority": "canonical",
      "snap-filename": "BC52ksa3GpCgET5MpLjg1WtmtpKvwI6c_11.snap",
      "snap-id": "BC52ksa3GpCgET5MpLjg1WtmtpKvwI6c_11",
      "snap-name": "qt5-core20",
      "snap-publisher-id": "ccpcJpODSdWMi621YDqnMi9Q8UO6hb8L",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2022-02-17T20:28:04.914700Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/BC52ksa3GpCgET5MpLjg1WtmtpKvwI6c_11.snap"
    },
    {
      "SHA-1": "059BACD854F610F6FBF9E47CF49BA7CD8308F23C",
      "snap-authority": "canonical",
      "snap-filename": "H7gdMTiQzGYKTPAyHd34pZS0FBlyENrO_113.snap",
      "snap-id": "H7gdMTiQzGYKTPAyHd34pZS0FBlyENrO_113",
      "snap-name": "auto-cpufreq",
      "snap-publisher-id": "b3wvcwNu3SrCLcZS2ANMrEorRl9z7e6j",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2021-12-15T19:19:49.317528Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/H7gdMTiQzGYKTPAyHd34pZS0FBlyENrO_113.snap"
    },
    {
      "SHA-1": "0844D3CB657F353AB2CE1DB164CE6BDFFD2BB6FD",
      "snap-authority": "canonical",
      "snap-filename": "8BtI009xODljWTvzy37M55T8ZQiOiVft_3.snap",
      "snap-id": "8BtI009xODljWTvzy37M55T8ZQiOiVft_3",
      "snap-name": "osreport",
      "snap-publisher-id": "Yrin91Qs2D8dW9QVSQgQg9VxaGkpfQsr",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2021-05-11T18:56:58.598072Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/8BtI009xODljWTvzy37M55T8ZQiOiVft_3.snap"
    },
    {
      "SHA-1": "09FD28A9B2B6C1D7AFA0F35D63CB90E19607DD73",
      "snap-authority": "canonical",
      "snap-filename": "DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_1778.snap",
      "snap-id": "DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_1778",
      "snap-name": "core20",
      "snap-publisher-id": "canonical",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2019-05-29T16:03:15.848435Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_1778.snap"
    },
    {
      "SHA-1": "0B1FF89DAAE9D4932E5A09A3FC6B014C43219B8C",
      "snap-authority": "canonical",
      "snap-filename": "3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_492.snap",
      "snap-id": "3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_492",
      "snap-name": "bashtop",
      "snap-publisher-id": "jyL6NPmmwE6knQhm89MUOgpM4FSKEUJa",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2020-07-03T20:19:52.131066Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_492.snap"
    },
    {
      "SHA-1": "0EE1130462493787F486BF66B8DE49F6AC1F98CF",
      "snap-authority": "canonical",
      "snap-filename": "DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_2105.snap",
      "snap-id": "DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_2105",
      "snap-name": "core20",
      "snap-publisher-id": "canonical",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2019-05-29T16:03:15.848435Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_2105.snap"
    },
    {
      "SHA-1": "1A092638422762239916983CBB72DE7DDA4AC55C",
      "snap-authority": "canonical",
      "snap-filename": "YLuShGmTbSKFis3tecfrbi8x3VhtxAQu_9.snap",
      "snap-id": "YLuShGmTbSKFis3tecfrbi8x3VhtxAQu_9",
      "snap-name": "xsos",
      "snap-publisher-id": "wsytObaH0PmCvRj7IuRcloFzbtXUu6rK",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2018-05-18T10:26:21.757359Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/YLuShGmTbSKFis3tecfrbi8x3VhtxAQu_9.snap"
    },
    {
      "FileSize": "1249368",
      "MD5": "E8E201B6D1B7F39776DA07F6713E1675",
      "PackageDescription": "GNU core utilities\n This package contains the basic file, shell and text manipulation\n utilities which are expected
 to exist on every operating system.\n .\n Specifically, this package includes:\n arch base64 basename cat chcon chgrp chmod chown chroot cksum co
mm cp\n csplit cut date dd df dir dircolors dirname du echo env expand expr\n factor false flock fmt fold groups head hostid id install join link 
ln\n logname ls md5sum mkdir mkfifo mknod mktemp mv nice nl nohup nproc numfmt\n od paste pathchk pinky pr printenv printf ptx pwd readlink realpa
th rm\n rmdir runcon sha*sum seq shred sleep sort split stat stty sum sync tac\n tail tee test timeout touch tr true truncate tsort tty uname unex
pand\n uniq unlink users vdir wc who whoami yes",
      "PackageMaintainer": "Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>",
      "PackageName": "coreutils",
      "PackageSection": "utils",
      "PackageVersion": "8.30-3ubuntu2",
      "SHA-1": "1D4AB60C729A361D46A90D92DEFACA518B2918D2",
      "SHA-256": "99AA50AF84DE1737735F2F51E570D60F5842AA1D4A3129527906E7FFDA368853"
    },
    {
      "SHA-1": "1E10EA9987C122605DBE27813C264D123CD7F06D",
      "snap-authority": "canonical",
      "snap-filename": "3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_435.snap",
      "snap-id": "3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_435",
      "snap-name": "bashtop",
      "snap-publisher-id": "jyL6NPmmwE6knQhm89MUOgpM4FSKEUJa",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2020-07-03T20:19:52.131066Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/3Ng7sRVkFDVIFzOMQmiHK1pdKWHbkOfW_435.snap"
    },
    {
      "SHA-1": "1EAE139BC814D30FD0A35EA65DE7B900D8F9B32E",
      "snap-authority": "canonical",
      "snap-filename": "n7gtTdzEszxnF6S3DCTG8BqIqQWflGcn_70.snap",
      "snap-id": "n7gtTdzEszxnF6S3DCTG8BqIqQWflGcn_70",
      "snap-name": "kdf",
      "snap-publisher-id": "2rsYZu6kqYVFsSejExu4YENdXQEO40Xb",
      "snap-signkey": "BWDEoaqyr25nF5SNCvEv2v7QnM9QsfCc0PBMYD_i2NGSQ32EF2d4D0hqUel3m8ul",
      "snap-timestamp": "2019-10-17T20:16:37.917856Z",
      "source-url": "https://api.snapcraft.io/api/v1/snaps/download/n7gtTdzEszxnF6S3DCTG8BqIqQWflGcn_70.snap"
    }
  ],
  "hashlookup:trust": 100
}

If the value is present in hashlookup, a JSON object is returned with the existing metadata for that hash. All known hashes are included, but for certain sources like NSRL, only the MD5 or SHA-1 value will be present. If during the hashlookup collection process new hashes are encountered, they are added to the existing metadata. In addition to the file’s metadata, a list of “parents” or “children” containing the SHA-1 values of the file’s parents or children is available. These are values that help determine the origin of a file, such as its package or the original archive.

To perform a quick analysis of a directory, the command:

sha1sum * | cut -f1 -d" " | parallel 'curl -s https://hashlookup.circl.lu/lookup/sha1/{}' | jq .

can do the job if you don’t have much time or simply have a list of suspicious files to sort.

ReSTful API Return Codes

The API returns HTTP codes depending on the success or failure of the result.

HTTP Return Code Description and Interpretation
200 The searched hash is present in at least one of the databases
404 The searched hash is not present in any of the databases
400 The input used for the hash is in an incorrect format

Improving Query Speed

Hashlookup supports a bulk feature that makes a single request with a list of SHA-1 or MD5 values.

curl -X 'POST' 'https://hashlookup.circl.lu/bulk/sha1' -H "Content-Type: application/json" -d "{\"hashes\": [\"FFFFFDAC1B1B4C513896C805C2C698D9688BE69F\", \"FFFFFF4DB8282D002893A9BAF00E9E9D4BA45E65\", \"FFFFFE4C92E3F7282C7502F1734B243FA52326FB\"]}" | jq .

This approach is a bit better to improve performance and avoid issuing a request for each hash to test.

Nevertheless, the fastest approach is to use the Bloom filter provided by Hashlookup, which can be downloaded at the following location https://cra.circl.lu/hashlookup/hashlookup-full.bloom.

If you don’t want to share the queries and prefer to avoid online queries, the Bloom filter is a file of around 1GB that can be used locally to perform lookups.

The hashlookup-analyser supports the Bloom filter natively.

python3 bin/hashlookup-analyser.py --bloomfilters /home/adulau/hashlookup/hashlookup-full.bloom --include-stats -d /bin

Tools and Integration of hashlookup

To automate and facilitate the use of hashlookup, tools like hashlookup-forensic-analyser exist. This tool allows generating CSV files with known and unknown files. For example, during a forensic investigation on a Linux server, the command:

python3 hashlookup-analyser.py --cache -d /sbin/ --include-stats --print-unknown

will list the unknown files in the /sbin directory. The –cache option avoids making multiple requests for the same hash.

hashlookup_result,filename,sha-1,size
stats,Analysed directory /sbin/ on maurer running Linux-5.11.0-38-generic-x86_64-with-glibc2.29 at 2021-11-15 09:17:39.486575+00:00 - Found 472 on hashlookup.circl.lu - Unknown files 0 - Excluded files 0

The result on /sbin allows us to conclude that the 472 discovered files are known to hashlookup. This facilitates the elimination of files to analyze and ensures that the present files come from known sources.

There are several integrations with IT security tools to facilitate hash searches. For example, MISP has a hashlookup expansion module. MISP is a threat intelligence platform and can contain a significant number of IoCs (Indicators of Compromise). The hashlookup module can be used to better contextualize these indicators and verify if the origin of a file is already known.

Result of a hash query with the results from Hashlookup.

Import of Hashlookup results into MISP with the corresponding relationships.

Florian Roth’s tool munin also integrates hashlookup support in addition to other sources like CAPE, VirusTotal, HybridAnalysis. Munin operates in three modes:

This provides an overview of known files as well as potential malware.

There are also several tools with hashlookup integration with FlowIntel and misp-modules to facilitate contextualization.

Conclusion

Hashlookup is a relatively young project but helps to quickly classify file indicators or technical fingerprints in digital forensic cases. Improvements are planned, such as adding metadata on signatures and function exports, as well as the possibility of performing approximate searches based on SSDEEP.

The hashlookup export format is being standardized at the IETF with the publication of a first Internet-Draft. Do not hesitate to propose new sources for metadata, integrate your tools with the hashlookup API, or suggest improvements to the API.