Skip to content

CVE Quick Search: Implementing our own vulnerability database

8. November 2021

Not only for penetration testing it is interesting to know, which vulnerabilities exist for a certain software product. Also from the perspective of an IT team it can be useful to quickly obtain information about an employed product version. So far various databases existed for these queries like e.g., https://nvd.nist.gov/vuln/search, https://cvedetails.com or https://snyk.io/vuln

However, during the last years, we could identify several issues with these databases:

  • Many databases only index vulnerabilities for certain product groups (e.g., Snyk: Web Technologies)
  • Many databases search for keywords in full-text descriptions. Searching for specific product versions is not precise.
  • Many databases are outdated or list incorrect information

1Figure: Incorrect vulnerability results for Windows 10

3Figure: Keyword search returns a different product than the originally searched for product

This is why we decided to implement our own solution. We considered the following key points:

  • Products and version numbers can be searched using unique identifiers. This allows a more precise search query.
  • The system performs a daily import of the lastest vulnerability data from the National Institute of Standards and Technology (NIST). Vulnerabilities are thus kept up to date and have a verified CVE entry.
  • The system is based on Elastic Stack https://www.elastic.co/de/elastic-stack/ to query and visualize data in real time.

Technical Implementation: NIST NVD & Elastic Stack

Upon finding vulnerabilities in products, security researchers commonly register a CVE entry per vulnerability. These CVE entries are given a unique identifier, detailed vulnerability information, as well as a general description.

They can be registered at https://cve.mitre.org and are indexed in the National Vulnerability Database (NVD) in real time (https://cve.mitre.org/about/cve_and_nvd_relationship.html). NIST publishes these data sets publicly and freely, which contain all registered vulnerabilities. We use this data stream as a basis for our own database.

The technical details of the data import and subsequent provisioning are illustrated as follows:

4Figure: Overview of the technical components of the vulnerability database

1. Daily import of vulnerability data from the NIST NVD

The data sets are organized by year numbers and refreshed daily by NIST. Every night we download the latest files onto our file server.

2. Pre-Processing of vulnerability data

Afterwards the files are pre-processed to make them compatible with the Elastic Stack Parser. One process that happens here is the expansion of all JSON files: The downloaded files contain JSON objects, however they are often nested, which makes it harder to identify single objects for the parser. We read the JSON and write all object seperators into separate lines. This way we can use a regex ( ‘^{‘ ) to precisely determine, when a new object begins.

5
6

Furthermore we strip the file of all unneeded metadata (e.g., autor, version information, etc.), which leaves only the CVE entries in the file as sequential JSON objects.

3. Reading in the pre-processed vulnerability data using Logstash

After the pre-processing, our Logstash parser is able to read the individual lines of the files using the Multiline Codec (https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html). Every time a complete JSON object is read in, Logstash forwards this CVE object to our Elasticsearch instance.

The CVE Quick Search – Data formats and vulnerability queries

After all CVE entries were read and stored in the Elasticsearch database, we have to understand, which format these entries have and how we can search them for specific products and product vulnerabilities. Our final result is illustrated in the following screenshot: Using unique identifiers, we can return exact vulnerability reports for the queried product version.

2021 09 17 09 56 10 ClipboardFigure: Preview of our vulnerability query frontend

1. Format of product versions

The general format of product versions is specified in the NIST specification. Section 5.3.3 gives a short overview (https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir7695.pdf):

8

cpe:2.3:part:vendor:product_name:version:update:edition:sw_edition:target_sw:target_hw:language:other

  • part: either ‘a’ (application), ‘o’ (operating system) or ‘h’ (hardware)
  • vendor: unique identifier of the product vendor
  • product_name: a unique name identifier of the product
  • version: the version number of the product
  • edition: deprecated
  • sw_edition: Version for identifiying different market versions
  • target_sw: Software environment the product is used with/in
  • target_hw: Hardware environment the product is used with/in
  • language: Supported language
  • other: other annotations

A colon is used as a separating character. Asterisk (*) is used as a wildcard symbol.

In our screenshot: “cpe:2.3:o:juniper:junos:17.4r3:*:*:*:*:*:*:*” we can determine that the operating system JunOS of the vendor Juniper in version 17.4r3 is prone to a vulnerability.

Looking at the JSON file, it becomes apparent that there are two formats that are used to store the version number of a vulnerability.

  • Format 1: Using the attributes “versionStartIncluding/versionStartExcluding” and “versionEndIncluding/versionEndExcluding” a range of vulnerable versions is defined.
  • Format 2: A single vulnerable software version is stored in “cpe23Uri”.

2. Querying the database

To query the database for specific products, an easy interface to find correct product identifiers is required. We have decided to implement this component using JavaScript Auto-Complete, that displays products and associated CPE identifiers dynamically:

9Figure: Autocomplete mechanism of the query frontend

After a choice was made, the vulnerabilities matching the specific product identifier can be queried.

Outlook: Kibana – Visualising vulnerabilities and trends

A big advantage that storing vulnerability data in an Elasticsearch database has, is its direct connection to Kibana. Kibana autonomously queries Elasticsearch to generate visualisations from it. In the following we illustrate a selection of visualizations of vulnerability data:

10Figure: Amount of registered vulnerabilities per year

11Figure: Fractions of the respective risk severity groups per year

We see great potential in using this data for real time statistics on our homepage to provide vulnerability trends which are updated on a daily basis.

Outlook – Threat Intelligence and automatization

Another item on our CVE database roadmap is the implementation of a system that automatically notifies customers of new vulnerabilities, once they are released for a certain CPE identifier. Elasticsearch offers an extensive REST API that allows us to realize this task with the already implemented ELK stack.

Currently we are working on implementing live statistics for our homepage. As soon as this milestone is complete, we will continue with the topic of “Threat Intelligence”. As you can see, we not only focus on the field of penetration testing here at Pentest Factory GmbH, but also have great interest in researching cybersecurity topics and extending our understanding, as well as our service line.