Creating a Mapping

Charles Hu, Ryan Lin6/10/24About 23 min

Mapping Introduction

Security Tool Type

This walkthrough is targeted at converting security scans to OHDF. If your security tool provides exports that are not security scans or are APIs, contact the SAF team for further instruction on how to develop those services into an OHDF mapper.

We begin the creation of every OHDF mapper by first planning out how we want correlate the fields in the OHDF and security tool's data format. This mapping is important for the following reasons:

It gives us a good framework for guiding how we actually implement the *-to-OHDF mapper.
It notifies us of potential knowledge gaps we may have of a security tool's data format. We need to close these gaps as best as we can in order to ensure that we can create field correlations that are as accurate/applicable as possible.
It gives us a chance to review our mapping in a format that is still human readable.
It helps us reduce the amount of corrections and rewrites we have to perform on the actual mapper implementation due to potential mapping errors.

As we develop the mappings in the following examples, note that we will break down and analyze the fields according to the OHDF schema structures we previously learned about (profiles, controls, and results). This is not necessarily the only or the best way to do this and you are encouraged to develop your own methods for correlating fields that work best for you.

Use of Exported Security Tool Data

The following examples will use generated instances of exported security tool data that do not reflect the entirety of the schemas they are based on. Ideally, you will want to use the full schema when creating a mapping; however, the full schema is always not readily available. To reflect this, we will be using generated export data and will fill in missing schema data when possible.

Mapping Demo - GoSec

1. Breaking Down the Data

Let's first start with our GoSec sample, which you should be already familiar with:

GoSec Source Data

{
  "Golang errors": {},
  "Issues": [
    {
      "severity": "MEDIUM",
      "confidence": "HIGH",
      "cwe": {
        "id": "22",
        "url": "https://cwe.mitre.org/data/definitions/22.html"
      },
      "rule_id": "G304",
      "details": "Potential file inclusion via variable",
      "file": "C:\\Users\\AGILLUM\\OneDrive - The MITRE Corporation\\Documents\\Code\\grype-0.34.4\\internal\\file\\tar.go",
      "code": "82: \t\tcase tar.TypeReg:\n83: \t\t\tf, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))\n84: \t\t\tif err != nil {\n",
      "line": "83",
      "column": "14",
      "nosec": false,
      "suppressions": null
    }
  ],
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  },
  "GosecVersion": "dev"
}

We can begin our mapping process by identifying the purpose of each field in the source data. This is relatively simple in this case as we have a formal schema for GoSec that defines many of its fields. While we do this, it's also important that we try to tie back each identified purpose to the three general OHDF schema structures that we learned about earlier; that is, in addition to finding the purpose of each field, we should also ask ourselves the following:

Is this field recording metadata?
Is this field recording requirements?
Is this field recording requirement testing?

Let's apply this to our GoSec source data. Try to find the answers to these two questions for each field yourself and then refer back to our annotated solution to check your work.

GoSec Annotated Source Data

{
  // Purpose: Go compilation errors
  // Recording: Metadata - Not specifically related to the requirements and will be already recorded as a security issue in 'Issues' if critical
  "Golang errors": {},
  // Purpose: Container for identified security issues
  // Recording: Requirements - This entity records all identified security issues in a Go source code
  "Issues": [
    {
      // Purpose: The severity of the identified issue
      // Recording: Requirements - This is specifically related to the severity level of the requirement
      "severity": "MEDIUM",
      // Purpose: How sure that the identified issue if applicable to this source code
      // Recording: Requirements testing - This field gives the applicability of the issue after source code testing
      "confidence": "HIGH",
      // Purpose: The associated CWE for the security issue
      // Recording: Requirements - This gives the associated CWE for the security issue
      "cwe": {
        "id": "22",
        "url": "https://cwe.mitre.org/data/definitions/22.html"
      },
      // Purpose: The internal GoSec ID for the security issue
      // Recording: Requirements - This gives an ID for the security issue
      "rule_id": "G304",
      // Purpose: Explanation of the security issue
      // Recording: Requirements - This explains the security issue
      "details": "Potential file inclusion via variable",
      // Purpose: The offending file
      // Recording: Requirement testing - This specifically notes which file fails the requirement after source code testing
      "file": "C:\\Users\\AGILLUM\\OneDrive - The MITRE Corporation\\Documents\\Code\\grype-0.34.4\\internal\\file\\tar.go",
      // Purpose: The offending code
      // Recording: Requirement testing - This specifies the code that fails the requirement after source code testing
      "code": "82: \t\tcase tar.TypeReg:\n83: \t\t\tf, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))\n84: \t\t\tif err != nil {\n",
      // Purpose: The line number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "line": "83",
      // Purpose: The column number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "column": "14",
      // Purpose: Whether this security issue should be ignored
      // Recording: Requirements - Specifies whether this security issue should be ignored
      "nosec": false,
      // Purpose: The supression level for info on the security issue
      // Recording: Requirements - Specifies the info suppression level of the security issue
      "suppressions": null
    }
  ],
  // Purpose: The statistics of the GoSec scan on the source code
  // Recording: Metadata - Info on the scan itself
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  },
  // Purpose: The version of the GoSec instance currently running
  // Recording: Metadata - Info on the scan itself
  "GosecVersion": "dev"
}

Note that this process is subjective and your answers may vary from ours. Don't be so quick to backtrack on your own answers, as your solutions may be as equally valid as ours. There is no objective answer to this process and what's important is that you can identify and break down the source data into useful pieces that we can correlate back to the OHDF schema.

The purpose of this annotation is to understand the intention of the security tool that produced the source data. By breaking down the source data field by field, we can gain a better understanding of how the security tool structures its export data and intends for it to be read by a user. GoSec, for example, organizes its data by security issues because GoSec is intended for code correction and hardening. This manifests in the real world through the user reading the GoSec report issue by issue and solving them as they go along. We can use this knowledge to help guide us in both correlating the schemas and reconfiguring the structure of the source data to fit OHDF.

2. Correlating the Schemas

Now that we're familiar with the specifics of the GoSec schema, we can begin to map these fields to the OHDF schema. We'll do this by taking a field in the source data and correlating it to the most applicable OHDF schema field. While performing these correlations, it is important to account for the intention and structure of the source data schema.

This initial mapping is informal and does not need to adhere to any programming syntax since you're probably annotating the documents by hand. The mapping process entails taking the full field path from the source data and placing it as the corresponding value for the OHDF schema field. Proper implementation of these mappings will be done later in the course. An example of such mapping is as follows:

{
  profiles: [
    {
      controls: [
        {
          title: Issues.details,
        },
      ],
    },
  ];
}

This shows us a mapping of the field Issues.details from the source data to the field profiles.controls.title in OHDF.

Note that sometimes every relevant field within the OHDF schema is addressed, but the source data still contains unmapped data. In such an event, it is good practice to ensure that such data is still passed through to and recorded somewhere within the OHDF schema for data preservation. In most cases, you should place such data in the passthrough field.

Let's first start with an empty OHDF schema as follows:

Empty OHDF Schema

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version,
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id,
          title,
          desc,
          descriptions,
          impact,
          refs,
          tags,
          code,
          source_location,
          results: [
            {
              status,
              code_desc,
              message,
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Mapping Version 1

We can begin our schema correlation by directly porting over fields from the GoSec source data that are obvious: GosecVersion, Issues.rule_id, Issues.details, and Issues.cwe. Try to map these fields youself as you follow along and refer to our solution to check your work.

Correlated Fields

OHDF	Source Data
`profiles.version`	`GosecVersion`
`profiles.controls.id`	`Issues.rule_id`
`profiles.controls.title`	`Issues.details`
`profiles.controls.tags`	`Issues.cwe`

OHDF Schema Mapping Ver. 1

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version: GosecVersion,          // Version of GoSec instance
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            Issues.cwe                // Associated CWE for the requirement
          },
          code,
          source_location,
          results: [
            {
              status,
              code_desc,
              message,
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining GoSec Source Data

{
  // Purpose: Go compilation errors
  // Recording: Metadata - Not specifically related to the requirements and will be already recorded as a security issue in 'Issues' if critical
  "Golang errors": {},
  // Purpose: Container for identified security issues
  // Recording: Requirements - This entity records all identified security issues in a Go source code
  "Issues": [
    {
      // Purpose: The severity of the identified issue
      // Recording: Requirements - This is specifically related to the severity level of the requirement
      "severity": "MEDIUM",
      // Purpose: How sure that the identified issue if applicable to this source code
      // Recording: Requirements testing - This field gives the applicability of the issue after source code testing
      "confidence": "HIGH",
      // Purpose: The offending file
      // Recording: Requirement testing - This specifically notes which file fails the requirement after source code testing
      "file": "C:\\Users\\AGILLUM\\OneDrive - The MITRE Corporation\\Documents\\Code\\grype-0.34.4\\internal\\file\\tar.go",
      // Purpose: The offending code
      // Recording: Requirement testing - This specifies the code that fails the requirement after source code testing
      "code": "82: \t\tcase tar.TypeReg:\n83: \t\t\tf, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))\n84: \t\t\tif err != nil {\n",
      // Purpose: The line number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "line": "83",
      // Purpose: The column number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "column": "14",
      // Purpose: Whether this security issue should be ignored
      // Recording: Requirements - Specifies whether this security issue should be ignored
      "nosec": false,
      // Purpose: The supression level for info on the security issue
      // Recording: Requirements - Specifies the info suppression level of the security issue
      "suppressions": null
    }
  ],
  // Purpose: The statistics of the GoSec scan on the source code
  // Recording: Metadata - Info on the scan itself
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  }
}

Reasoning

version: GosecVersion: We want to specifically place this in version in profiles because this is metadata on the actual security tool that generated these results. This is different from version on the top level since that refers to the platform, or the tool creating the actual OHDF file itself.
id: Issues.rule_id: rule_id gives us an ID used by GoSec to identify each failing requirement in the source code.
title: Issues.details: The source data doesn't provide a proper title that goes along with the failing requirement. In lieu of this, we can use Issues.details which does give us a human readable description of the requirement and repurpose it as a title for our control.
tags: {Issues.cwe}: Issues.cwe provides an association of our GoSec security issue with a known vulnerability database. This can be used in tags to act as metadata which can used to filter or organize controls created from these GoSec security issues.
:::

Mapping Version 2

Next, let's look at Issues.code, which can be a little bit tricky.

Correlated Fields

OHDF	Source Data
`profiles.controls.results.code_desc`	`Issues.code`

OHDF Schema Mapping Ver. 2

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version: GosecVersion,          // Version of GoSec instance
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            Issues.cwe                // Associated CWE for the requirement
          },
          code,
          source_location,
          results: [
            {
              status,
              code_desc: Issues.code, // The code failing the requirement test
              message,
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining GoSec Source Data

{
  // Purpose: Go compilation errors
  // Recording: Metadata - Not specifically related to the requirements and will be already recorded as a security issue in 'Issues' if critical
  "Golang errors": {},
  // Purpose: Container for identified security issues
  // Recording: Requirements - This entity records all identified security issues in a Go source code
  "Issues": [
    {
      // Purpose: The severity of the identified issue
      // Recording: Requirements - This is specifically related to the severity level of the requirement
      "severity": "MEDIUM",
      // Purpose: How sure that the identified issue if applicable to this source code
      // Recording: Requirements testing - This field gives the applicability of the issue after source code testing
      "confidence": "HIGH",
      // Purpose: The offending file
      // Recording: Requirement testing - This specifically notes which file fails the requirement after source code testing
      "file": "C:\\Users\\AGILLUM\\OneDrive - The MITRE Corporation\\Documents\\Code\\grype-0.34.4\\internal\\file\\tar.go",
      // Purpose: The line number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "line": "83",
      // Purpose: The column number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "column": "14",
      // Purpose: Whether this security issue should be ignored
      // Recording: Requirements - Specifies whether this security issue should be ignored
      "nosec": false,
      // Purpose: The supression level for info on the security issue
      // Recording: Requirements - Specifies the info suppression level of the security issue
      "suppressions": null
    }
  ],
  // Purpose: The statistics of the GoSec scan on the source code
  // Recording: Metadata - Info on the scan itself
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  }
}

Reasoning

code_desc: Issues.code: Issues.code refers to the source code that is explicitly failing the GoSec requirement. This does not align with our definition for code_desc but remember that we always aim to seek parity with the original security tool. GoSec focuses heavily on identifying a security issue and directly pinpointing where that issue occurs for immediate resolution. We want to mirror that behavior by also pinpointing where the security issue occurs at instead of providing a code description of what test went wrong.
:::

Mapping Version 3

Next, let's look at the fields Issues.severity, Issues.confidence, Issues.nosec, and Issues.suppressions.

Correlated Fields

OHDF	Source Data
`profiles.controls.tags`	`Issues.severity`
`profiles.controls.tags`	`Issues.confidence`
`profiles.controls.tags`	`Issues.nosec`
`profiles.controls.tags`	`Issues.suppressions`

OHDF Schema Mapping Ver. 3

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version: GosecVersion,          // Version of GoSec instance
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            Issues.cwe                // Associated CWE for the requirement
            Issues.severity,          // Severity of the requirement
            Issues.confidence,        // Applicability of the requirement
            Issues.nosec,             // Whether to ignore the requirement
            Issues.suppressions       // Info suppression level of the requirement
          },
          code,
          source_location,
          results: [
            {
              status,
              code_desc: Issues.code, // The code for the requirement test
              message,
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining GoSec Source Data

{
  // Purpose: Go compilation errors
  // Recording: Metadata - Not specifically related to the requirements and will be already recorded as a security issue in 'Issues' if critical
  "Golang errors": {},
  // Purpose: Container for identified security issues
  // Recording: Requirements - This entity records all identified security issues in a Go source code
  "Issues": [
    {
      // Purpose: The offending file
      // Recording: Requirement testing - This specifically notes which file fails the requirement after source code testing
      "file": "C:\\Users\\AGILLUM\\OneDrive - The MITRE Corporation\\Documents\\Code\\grype-0.34.4\\internal\\file\\tar.go",
      // Purpose: The line number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "line": "83",
      // Purpose: The column number of the offending code
      // Recording: Requirement testing - This field specifies the location of the failing code
      "column": "14"
    }
  ],
  // Purpose: The statistics of the GoSec scan on the source code
  // Recording: Metadata - Info on the scan itself
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  }
}

Reasoning

We map these fields to tags in the OHDF schema due to each of these fields acting as descriptive traits of the control which we can categorize and sort the controls by.

It may be tempting to map Issues.severity to impact; however, it is important to note that severity is not the same as impact. These two fields are related but distinct metrics, and while conversion is possible, it is not performed in this case due to uncertainty on how to translate the two metrics properly.

Mapping Version 4

Next, let's look at the fields Issues.file, Issues.line, and Issues.column.

Correlated Fields

OHDF	Source Data
`profiles.controls.results.message`	`Issues.file`
`profiles.controls.results.message`	`Issues.line`
`profiles.controls.results.message`	`Issues.column`

OHDF Schema Mapping Ver. 4

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version: GosecVersion,          // Version of GoSec instance
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            Issues.cwe                // Associated CWE for the requirement
            Issues.severity,          // Severity of the requirement
            Issues.confidence,        // Applicability of the requirement
            Issues.nosec,             // Whether to ignore the requirement
            Issues.suppressions       // Info suppression level of the requirement
          },
          code,
          source_location,
          results: [
            {
              status,
              code_desc: Issues.code, // The code for the requirement test
              message: Issues.file + Issues.line + Issues.column,   // All materials describing where the issue occurred
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining GoSec Source Data

{
  // Purpose: Go compilation errors
  // Recording: Metadata - Not specifically related to the requirements and will be already recorded as a security issue in 'Issues' if critical
  "Golang errors": {},
  // Purpose: The statistics of the GoSec scan on the source code
  // Recording: Metadata - Info on the scan itself
  "Stats": {
    "files": 199,
    "lines": 12401,
    "nosec": 0,
    "found": 7
  }
}

Reasoning

This is a situation similar to the one Issues.code is in. Issues.file, Issues.line, and Issues.column do not fit the definition given for message (i.e., they are not explanations for the test status); however, they do elaborate upon the location of the offending code snippet identified by GoSec, which we want to seek parity with. Thus, in lieu of any suitable substitutes, we instead use the locational information of the failing source code snippet as a description for the control result.

Mapping Version 5

Next, let's look at the fields Golang errors and Stats.

Correlated Fields

OHDF	Source Data
`passthrough.auxiliary_data`	`Golang errors`
`passthrough.auxiliary_data`	`Stats`

OHDF Schema Mapping Ver. 5

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version: GosecVersion,          // Version of GoSec instance
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            Issues.cwe                // Associated CWE for the requirement
            Issues.severity,          // Severity of the requirement
            Issues.confidence,        // Applicability of the requirement
            Issues.nosec,             // Whether to ignore the requirement
            Issues.suppressions       // Info suppression level of the requirement
          },
          code,
          source_location,
          results: [
            {
              status,
              code_desc: Issues.code, // The code for the requirement test
              message: Issues.file + Issues.line + Issues.column,   // All materials describing where the issue occurred
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {                               // Go source data compilation errors; Stats on GoSec scan
        name: 'Gosec',
        data: Golang errors, Stats
      }
    ],
    raw
  }
}

Reasoning

For these fields, we can place them in passthrough as there are no reasonably applicable fields in the OHDF schema to place them in.

Golang errors is not used anywhere in controls as it is a description of compilation errors which would've already been flagged by GoSec as security issues if they already occurred. Thus, to avoid duplication, Golang errors is omitted from the main OHDF structures.

Mapping Version 6

Finally, let's fill the rest of the empty fields using what available information we can use, prioritizing required fields and omitting any applicable unrequired fields.

Correlated Fields

OHDF	Source Data
`platform.name`	`'Heimdall Tools'`
`platform.release`	`HeimdallToolsVersion`
`version`	`HeimdallToolsVersion`
`profiles.name`	`'Gosec scanner'`
`profiles.title`	`'gosec'`
`profiles.controls.impact`	`0.5`
`profiles.controls.results.status`	`'Failure'`
`profiles.status`	`'loaded'`

OHDF Schema Mapping Ver. 6

{
  platform: {                         // We fill in Heimdall for the platform as it handles the generation of this OHDF file
    name: 'Heimdall Tools',
    release: HeimdallToolsVersion
  },
  version: HeimdallToolsVersion,      // See 'platform' reasoning
  statistics: {},                     // Not enough info to fill
  profiles: [
    {
      name: 'Gosec scanner',          // We know that this report is generated from GoSec
      version: GosecVersion,          // Version of GoSec instance
      sha256: '',                     // Leave it empty as OHDF Converters will generate one for us
      title: 'gosec',                 // We know that this report is generated from GoSec
      supports: [],                   // Not enough info to fill
      attributes: [],                 // Not enough info to fill
      groups: [],                     // Not enough info to fill
      controls: [
        {
          id: Issues.rule_id,         // ID of the requirement
          title: Issues.details,      // Human readable title for the requirement
          desc: '',                   // Not enough info to fill
          impact: 0.5,                // Have no solid information on impact of security issue, so we default to 0.5
          refs: [],                   // Not enough info to fill
          tags: {
            Issues.cwe                // Associated CWE for the requirement
            Issues.severity,          // Severity of the requirement
            Issues.confidence,        // Applicability of the requirement
            Issues.nosec,             // Whether to ignore the requirement
            Issues.suppressions       // Info suppression level of the requirement
          },
          source_location: {},        // Not enough info to fill
          results: [
            {
              status: 'failed',      // The security scan only reports failed requirements, so all findings we receive get fail statuses
              code_desc: Issues.code, // The code failing the requirement test
              message: Issues.file + Issues.line + Issues.column,   // All materials describing where the issue occurred
              start_time              // Not enough info to fill
            }
          ]
        },
      ],
      status: 'loaded'                // Give loaded status to denote that profile is loaded by OHDF Converters
    },
  ],
  passthrough: {
    auxiliary_data: [
      {                               // Go source data compilation errors; Stats on GoSec scan
        name: 'Gosec',
        data: Golang errors, Stats
      }
    ],
    raw
  }
}

Reasoning

Unfilled fields are a result of a lack of sufficent information from the source data that can be used to reasonably associate with these fields.

The reasoning for the filled fields are as follows:

platform: {name: 'Heimdall Tools'}: platform refers to the instance generating the OHDF file, which in this case is Heimdall2 or Heimdall Tools.
platform: {release: HeimdallToolsVersion}: Similar reasoning as above.
version: HeimdallToolsVersion: Similar reasoning as platform fields.
name: 'Gosec scanner': profiles refers to the security tool that generated the original source data. In this case, it is the GoSec security scanning tool.
title: 'gosec': We can give a succinct title for the profile here.
impact: 0.5: We use a default value of 0.5 since we do not have a direct mapping set up to convert the Issues.severity field.
status: 'failed': GoSec only reports security vulnerabilities if it finds that the scanned source code has such security issues present. As such, every reported issue in a GoSec scan will always be a result of a failed security test.
status: 'loaded': The status of a profile is typically loaded since it has been ingested and converted to OHDF.
:::

Now we have a finalized mapping for GoSec to OHDF.

Mapping Demo - DbProtect

It is also possible to convert XML-based results formats. The OHDF library relies on fast-xml-parser to convert these XML files into a JSON-like format for compatibility with the base converter. Let us take a look at one such example: The DbProtect Mapper.

1. Breaking Down the Data

Let's first start with our DbProtect sample.

DbProtect Source Data

<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <metadata>
        <item name="Organization" type="xs:string" length="202"/>
        <item name="Task" type="xs:string" length="802"/>
        <item name="Asset Type" type="xs:string" length="66"/>
        <item name="Asset" type="xs:string" length="1026"/>
        <item name="IP Address, Port, Instance" type="xs:string" length="532"/>
        <item name="Job Name" type="xs:string" length="258"/>
        <item name="Policy" type="xs:string" length="130"/>
        <item name="Result Status" type="xs:string" length="802"/>
        <item name="Check Category" type="xs:string" length="802"/>
        <item name="Risk DV" type="xs:string" length="802"/>
        <item name="Check ID" type="xs:int" precision="1"/>
        <item name="Check" type="xs:string" length="1026"/>
        <item name="Details" type="xs:string" length="8002"/>
        <item name="Date" type="xs:string" length="54"/>
    </metadata>
    <data>
        <row>
            <value>TEST ORGANIZATION (Local DBP server)</value>
            <value>Audit</value>
            <value>Microsoft SQL Server</value>
            <value>CONDS181</value>
            <value>10.0.10.204, 1433, MSSQLSERVER</value>
            <value>Heimdall Test scan report generation</value>
            <value>DISA-STIG SQL Server 2016 V2R1-1 Audit (Built-In)</value>
            <value>Fact</value>
            <value>Improper Access Controls</value>
            <value>Medium</value>
            <value>2986</value>
            <value>Schema ownership</value>
            <value>Schema name=DatabaseMailUserRole;Database=msdb;Owner name=DatabaseMailUserRole</value>
            <value>Feb 18 2021 15:57</value>
        </row>
        <row>
            <value>TEST ORGANIZATION (Local DBP server)</value>
            <value>Audit</value>
            <value>Microsoft SQL Server</value>
            <value>CONDS181</value>
            <value>10.0.10.204, 1433, MSSQLSERVER</value>
            <value>Heimdall Test scan report generation</value>
            <value>DISA-STIG SQL Server 2016 V2R1-1 Audit (Built-In)</value>
            <value>Fact</value>
            <value>Improper Access Controls</value>
            <value>Medium</value>
            <value>2986</value>
            <value>Schema ownership</value>
            <value>Schema name=db_accessadmin;Database=AppDetective;Owner name=db_accessadmin</value>
            <value>Feb 18 2021 15:57</value>
        </row>
        ...
    </data>
</dataset>

You may notice that the metadata field maps corresponding keys to each of the values in a row of the data field. Consider ways you might try to restructure this data, but focus mainly on the purpose of each of the values for now. We will handle data restructuring in a later portion.

Unlike with GoSec, DbProtect does not have a published schema. Thus, we must empirically reason the mapping based on our sample data. Because we are reasoning solely off of sample data, even more so than the previous example, this process is subjective and your mapping may vary from ours. One thing we can use to inform ourselves about the purpose of each field in place of a schema, however, is the metadata fields at the top of the sample.

Knowledge Check

What questions do we ask ourselves when analyzing security data fields?

What is the purpose of this field?
What is this field recording (metadata, requirements, requirement testing)?
:::

Let us now apply this philosophy to our DbProtect source data. Try to find the answers to these two questions for each field yourself and then refer back to our annotated solution to check your work.

DbProtect Annotated Source Data

<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <!-- Purpose: Assigning keys to the value fields in the data subsection -->
    <!-- Recording: Metadata - Labels of the values (might be useful when parsing) -->
    <metadata>
        <item name="Organization" type="xs:string" length="202"/>
        <item name="Task" type="xs:string" length="802"/>
        <item name="Asset Type" type="xs:string" length="66"/>
        <item name="Asset" type="xs:string" length="1026"/>
        <item name="IP Address, Port, Instance" type="xs:string" length="532"/>
        <item name="Job Name" type="xs:string" length="258"/>
        <item name="Policy" type="xs:string" length="130"/>
        <item name="Result Status" type="xs:string" length="802"/>
        <item name="Check Category" type="xs:string" length="802"/>
        <item name="Risk DV" type="xs:string" length="802"/>
        <item name="Check ID" type="xs:int" precision="1"/>
        <item name="Check" type="xs:string" length="1026"/>
        <item name="Details" type="xs:string" length="8002"/>
        <item name="Date" type="xs:string" length="54"/>
    </metadata>
    <data>
        <row>
            <!-- Purpose: Customer's organization name -->
            <!-- Recording: Metadata. This just tells us about the scan itself, not the compliance-related items -->
            <value>TEST ORGANIZATION (Local DBP server)</value>
            <!-- Purpose: Vendor's term for the type of scan, in this case an "Audit" -->
            <!-- Recording: Metadata. Again, not telling us about any compliance-related items -->
            <value>Audit</value>
            <!-- Purpose: Type of database scanned by the DBProtect tool -->
            <!-- Recording: Metadata. This one is up for debate. The type of database is not directly related to controls, but we could include it in the results if we want.  -->
            <value>Microsoft SQL Server</value>
            <!-- Purpose: Name of the Server hosting the database -->
            <!-- Recording: Metadata, for similar reasons as the database type. -->
            <value>CONDS181</value>
            <!-- Purpose: Database's IP, Port, Instance -->
            <!-- Recording: Metadata. See above. -->
            <value>10.0.10.204, 1433, MSSQLSERVER</value>
            <!-- Purpose: Customer's added name for the report -->
            <!-- Recording: Metadata. See above. -->
            <value>Heimdall Test scan report generation</value>
            <!-- Purpose: The baseline guidance that the scan tested against -->
            <!-- Recording: Requirement. This is more general information, but still compliance-related so we can consider it a requirement. -->
            <value>DISA-STIG SQL Server 2016 V2R1-1 Audit (Built-In)</value>
            <!-- Purpose: Result of the test -->
            <!-- Recording: Requirement testing. This tells us whether the control was fulfilled or not. -->
            <value>Fact</value>
            <!-- Purpose: Type of requirement -->
            <!-- Recording: Requirements. This tells us the type of control being tested. -->
            <value>Improper Access Controls</value>
            <!-- Purpose: Severity of the requirement -->
            <!-- Recording: Requirements. Tells us more about the control being tested. -->
            <value>Medium</value>
            <!-- Purpose: Vendor's unique ID for the type of test -->
            <!-- Recording: Requirements. This is a unique identifier for each test we are running. We may want to collapse results of the same ID when making our mapper. -->
            <value>2986</value>
            <!-- Purpose: Check type -->
            <!-- Recording: Requirements. This tells us about the type of control being tested as well. -->
            <value>Schema ownership</value>
            <!-- Purpose: Result details -->
            <!-- Recording: Requirements. This tells us about the specific areas related to the control. -->
            <value>Schema name=DatabaseMailUserRole;Database=msdb;Owner name=DatabaseMailUserRole</value>
            <!-- Purpose: Date of the scan/tests -->
            <!-- Recording: Requirement testing. This tells us when the scan was run. -->
            <value>Feb 18 2021 15:57</value>
        </row>
        ...
    </data>
</dataset>

2. Correlating the Schemas

Now that we're familiar with the specifics of the DbProtect report structure, we can begin to map these fields to the OHDF schema.

Note that since we used an export from DbProtect and are not basing our mapping off the full schema, we will inevitably miss some fields. We can rectify this by adding them into our mapping post hoc whenever needed; however, we can still capture these missed fields without manual intervention by using our passthrough field to capture unmapped data.

Let's first start with an empty OHDF schema as follows:

Empty OHDF Schema

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version,
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id,
          title,
          desc,
          descriptions,
          impact,
          refs,
          tags,
          code,
          source_location,
          results: [
            {
              status,
              code_desc,
              message,
              run_time,
              start_time
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Mapping Version 1

We can begin our schema correlation by directly porting over fields from the DbProtect source data that are obvious: Check ID, Result Status, and Date. Try to map these fields yourself, and refer to our solution below to check your work.

As in the previous example, note that these mapping are informal and do not need to adhere to any programming syntax. Proper implementation of these mappings will be done later in the course.

Correlated Fields

OHDF	Source Data
`profiles.controls.id`	`Check ID`
`profiles.controls.results.status`	`Result Status`
`profiles.controls.results.start_time`	`Date`

OHDF Schema Mapping Ver. 1

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version,
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: 'Check ID',         // ID of the requirement
          title,
          desc,
          descriptions,
          impact,
          refs,
          tags: {
            nist,
            cci
          },
          code,
          source_location,
          results: [
            {
              status: 'Result Status', // The result of the scan for that particular control
              code_desc,
              message,
              run_time,
              start_time: 'Date'   // Some indication of when the scan was run
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining DbProtect Source Data


<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <!-- Purpose: Assigning keys to the value fields in the data subsection -->
    <!-- Recording: Metadata - Labels of the values (might be useful when parsing) -->
    <metadata>
      ...
    </metadata>
    <data>
        <row>
            <!-- Purpose: Organization: Customer's organization name -->
            <!-- Recording: Metadata. This just tells us about the scan itself, not the compliance-related items -->
            <value>TEST ORGANIZATION (Local DBP server)</value>
            <!-- Purpose: Task: Vendor's term for the type of scan, in this case an "Audit" -->
            <!-- Recording: Metadata. Again, not telling us about any compliance-related items -->
            <value>Audit</value>
            <!-- Purpose: Asset Type: Type of database scanned by the DBProtect tool -->
            <!-- Recording: Metadata. This one is up for debate. The type of database is not directly related to controls, but we could include it in the results if we want.  -->
            <value>Microsoft SQL Server</value>
            <!-- Purpose: Asset: Name of the Server hosting the database -->
            <!-- Recording: Metadata, for similar reasons as the database type. -->
            <value>CONDS181</value>
            <!-- Purpose: IP Address, Port, Instance: Database's IP, Port, Instance -->
            <!-- Recording: Metadata. See above. -->
            <value>10.0.10.204, 1433, MSSQLSERVER</value>
            <!-- Purpose: Job Name: Customer's added name for the report -->
            <!-- Recording: Metadata. See above. -->
            <value>Heimdall Test scan report generation</value>
            <!-- Purpose: Policy: The baseline guidance that the scan tested against -->
            <!-- Recording: Requirement. This is more general information, but still compliance-related so we can consider it a requirement. -->
            <value>DISA-STIG SQL Server 2016 V2R1-1 Audit (Built-In)</value>
            <!-- Purpose: Check Category: Type of requirement -->
            <!-- Recording: Requirements. This tells us the type of control being tested. -->
            <value>Improper Access Controls</value>
            <!-- Purpose: Risk DV: Severity of the requirement -->
            <!-- Recording: Requirements. Tells us more about the control being tested. -->
            <value>Medium</value>
            <!-- Purpose: Check: Check type -->
            <!-- Recording: Requirements. This tells us about the type of control being tested as well. -->
            <value>Schema ownership</value>
            <!-- Purpose: Details: Result details -->
            <!-- Recording: Requirements. This tells us about the specific areas related to the control. -->
            <value>Schema name=DatabaseMailUserRole;Database=msdb;Owner name=DatabaseMailUserRole</value>
        </row>
        ...
    </data>
</dataset>

Reasoning

id: 'Check ID': Check ID gives us an ID used by DbProtect to identify each failing requirement.
status: 'Result Status': Result Status gives us information about the result of the scan, whether it passed or failed or was skipped.
start_time: 'Date': Date provides information about when the scan was run.
:::

Mapping Version 2

Now let us handle some of the more detailed information about the specific controls, namely Task, Check Category, Risk DV, Check, and Details.

Correlated Fields

OHDF	Source Data
`profiles.controls.desc`	`Task`, `Check Category`
`profiles.controls.impact`	`Risk DV`
`profiles.controls.title`	`Check`
`profiles.controls.results.code_desc`	`Details`

OHDF Schema Mapping Ver. 2

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name,
      version,
      sha256,
      title,
      maintainer,
      summary,
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: 'Check ID',         // ID of the requirement
          title: 'Check',
          desc: ['Task', 'Check Category'],
          descriptions,
          impact: 'Risk DV',
          refs,
          tags: {
            nist,
            cci
          },
          code,
          source_location,
          results: [
            {
              status: 'Result Status', // The result of the scan for that particular control
              code_desc: 'Details',
              message,
              run_time,
              start_time: 'Date'   // Some indication of when the scan was run
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Remaining DbProtect Source Data


<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <!-- Purpose: Assigning keys to the value fields in the data subsection -->
    <!-- Recording: Metadata - Labels of the values (might be useful when parsing) -->
    <metadata>
      ...
    </metadata>
    <data>
        <row>
            <!-- Purpose: Organization: Customer's organization name -->
            <!-- Recording: Metadata. This just tells us about the scan itself, not the compliance-related items -->
            <value>TEST ORGANIZATION (Local DBP server)</value>
            <!-- Purpose: Asset Type: Type of database scanned by the DBProtect tool -->
            <!-- Recording: Metadata. This one is up for debate. The type of database is not directly related to controls, but we could include it in the results if we want.  -->
            <value>Microsoft SQL Server</value>
            <!-- Purpose: Asset: Name of the Server hosting the database -->
            <!-- Recording: Metadata, for similar reasons as the database type. -->
            <value>CONDS181</value>
            <!-- Purpose: IP Address, Port, Instance: Database's IP, Port, Instance -->
            <!-- Recording: Metadata. See above. -->
            <value>10.0.10.204, 1433, MSSQLSERVER</value>
            <!-- Purpose: Job Name: Customer's added name for the report -->
            <!-- Recording: Metadata. See above. -->
            <value>Heimdall Test scan report generation</value>
            <!-- Purpose: Policy: The baseline guidance that the scan tested against -->
            <!-- Recording: Requirement. This is more general information, but still compliance-related so we can consider it a requirement. -->
            <value>DISA-STIG SQL Server 2016 V2R1-1 Audit (Built-In)</value>
        </row>
        ...
    </data>
</dataset>

Reasoning

For the control description, we want to give information about the type of control being run. Thus, it makes sense to store the category of the check, and the particular task it was for (whether it was for an Audit, or for another reason).

In this case, we can directly map impact to Risk DV because there exists an intuitive mapping: High, Medium, Low, and Informational can map to 0.7, 0.5, 0.3, and 0, respectively.

Note that Check appears to also correspond to the Check ID, and is a brief desciption of the check being run. Thus, it would make sense to map it to the title field of a control.

Finally, for the details, recall that the code_desc is looking for the test expectations as defined by the particular control. Details is telling us about the particular result's associated parameters.

Mapping Version 3

Finally, we are left with the more general, report-level information from our DbProtect sample file: Organization, Asset Type, Check Asset, IP Address, Port, Instance, Job Name, Policy. Notice that OHDF does not have fields that correspond to some of this additional info, so consider ways to condense this data and encode it into other fields.

Correlated Fields

OHDF	Source Data
`profiles.summary`	`Organization`, `Asset`, `Asset Type`, `IP Address, Port, Instance`
`profiles.title`	`Job Name`
`profiles.name`	`Policy`

OHDF Schema Mapping Ver. 3

{
  platform: {
    name,
    release,
    target_id
  },
  version,
  statistics: {
    duration
  },
  profiles: [
    {
      name: 'Policy',
      version,
      sha256,
      title: 'Job Name',
      maintainer,
      summary: ['Organization', 'Asset', 'Asset Type', 'IP Address, Port, Instance'],
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: 'Check ID',         // ID of the requirement
          title: 'Check',
          desc: ['Task', 'Check Category'],
          descriptions,
          impact: 'Risk DV',
          refs,
          tags: {
            nist,
            cci
          },
          code,
          source_location,
          results: [
            {
              status: 'Result Status', // The result of the scan for that particular control
              code_desc: 'Details',
              message,
              run_time,
              start_time: 'Date'   // Some indication of when the scan was run
            }
          ]
        },
      ],
      status
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Reasoning

As mentioned previously, OHDF does not have a specific place to put information like the Organization, Asset, or Asset Type. However, we do have a summary field for profiles. Recall our goal of preserving as much data as possible from the source. With this in mind, it would be reasonable to map all these fields to the summary for the sake of preservation.

The profile name and title fields are a little tricky. Note that the Policy field refers to the baseline guidance that the scan tested against. This aligns more with the profile name than the Job Name, which is just a name the customer added for the report. Thus, we map the Policy field to the profiles.name and the Job Name field to the profiles.title instead.

Mapping Version 4

Finally, let's fill the rest of the empty fields using what available information we can use, prioritizing required fields and omitting any applicable unrequired fields.

Correlated Fields

OHDF	Source Data
`platform.name`	`'Heimdall Tools'`
`platform.release`	`HeimdallToolsVersion`
`version`	`HeimdallToolsVersion`
`profiles.controls.tags.nist`	`DEFAULT_STATIC_CODE_ANALYSIS_NIST_TAGS`
`profiles.controls.tags.cci`	`DEFAULT_STATIC_CODE_ANALYSIS_CCI_TAGS`
`profiles.status`	`'loaded'`

OHDF Schema Mapping Ver. 4

{
  platform: {
    name: 'Heimdall Tools',
    release: HeimdallToolsVersion,
    target_id
  },
  version: HeimdallToolsVersion,
  statistics: {
    duration
  },
  profiles: [
    {
      name: 'Policy',
      version,
      sha256,
      title: 'Job Name',
      maintainer,
      summary: ['Organization', 'Asset', 'Asset Type', 'IP Address, Port, Instance'],
      license,
      copyright,
      copyright_email,
      supports,
      attributes,
      groups,
      controls: [
        {
          id: 'Check ID',         // ID of the requirement
          title: 'Check',
          desc: ['Task', 'Check Category'],
          descriptions,
          impact: 'Risk DV',
          refs,
          tags: {
            nist: DEFAULT_STATIC_CODE_ANALYSIS_NIST_TAGS,
            cci: DEFAULT_STATIC_CODE_ANALYSIS_CCI_TAGS
          },
          code,
          source_location,
          results: [
            {
              status: 'Result Status', // The result of the scan for that particular control
              code_desc: 'Details',
              message,
              run_time,
              start_time: 'Date'   // Some indication of when the scan was run
            }
          ]
        },
      ],
      status: 'loaded'
    },
  ],
  passthrough: {
    auxiliary_data: [
      {
        name,
        data
      },
    ],
    raw
  }
}

Reasoning

Unfilled fields are a result of a lack of sufficent information from the source data that can be used to reasonably associate with these fields.

The reasoning for the filled fields are as follows:

platform: {name: 'Heimdall Tools'}: platform refers to the instance generating the OHDF file, which in this case is Heimdall2 or Heimdall Tools.
platform: {release: HeimdallToolsVersion}: Similar reasoning as above.
version: HeimdallToolsVersion: Similar reasoning as platform fields.
profiles.controls.tags.nist: DEFAULT_STATIC_CODE_ANALYSIS_NIST_TAGS: The DbProtect sample does not have any associated NIST tags within it. Thus, we use the default tags for a static code analysis.
profiles.controls.tags.cci: DEFAULT_STATIC_CODE_ANALYSIS_CCI_TAGS: Similar reasoning as above.
status: 'loaded': The status of a profile is typically loaded since it has been ingested and converted to OHDF.
:::

Now we have a finalized mapping for DbProtect to OHDF.