5.2. Hi-C dataset format

An Hi-C dataset is a folder containing the heatmaps and a metadata.json file.

5.2.1. Metadata file

As its name suggests, the JSON file holds basic metadata about the dataset. Its format is the following:

{
  "Binsize": 5000,
  "Assembly": "droYak2",
  "Species": "Drosophila yakuba",
  "Comment": "nm_none - No NaN",
  "Date": "",
  "Dataset": "Dyak_c",
  "Dims": {
    "2R|2R": [1234, 1234],
    "4|X": [345, 2345],
    ...
  }
  "MapFiles": {
    "2R|2R": "2R_2R.tsv.gz",
    "4|X": "4_X.tsv.gz",
    ...
  }
}

Note

The MapFiles maps to the key-value store of the heatmaps files. The files can be named in any way, but not the key. They must be of the form X|Y.

5.2.2. Heatmap files

There are one file per matrix. The matrices are written as tabulated-separated files, optionally gzipped. There is no header row, and 3 columns:

  • row position (an integer),
  • column position (an integer),
  • value (a floating-point number).

In order to save space, only the non-empty boxes from the matrices are written (the matrices are sparse).