FAQ

The ground truth data corresponding to the training images contains, for each sequence,

  • Depth maps & visualization

    The depth maps contain the ground truth depth value (in meter) at each pixel. Additionally, we include a visualization showing the inverse depth.

  • Camera data

    We provide the camera matrices (intrinsic and extrinsic) for the camera.

  • SDK

    The SDK includes example scripts to read/write the depth and camera data, and documentation on the data format.


The test images contain 12 test sequences with withheld ground truth, for both clean and final passes.

The submission format is analogous to the one used in KITTI.

In this format, the depth of each frame is stored as an unsigned 16-Bit Single-Channel PNG in increments of 1/256th meters.

The submission file is simply a .zip-file containing the following structure:
/
/clean/
/clean/ambush_1/
/clean/ambush_1/frame_0001.png
/clean/ambush_1/...
/clean/ambush_1/frame_0023.png
/clean/...
/final/
/final/ambush_1/
/final/ambush_1/frame_0001.png
/final/ambush_1/...
/final/ambush_1/frame_0023.png
/final/... 
Please make sure that your archive contains only the depth images. It should contain 1128 files.

Do evaluate the data, we use the same standard metrics as in the KITTI depth estimation benchmark. In particular, these are:
  • SILog, the scale invariant logarithmic error, given in log(m)

  • sqErrorRel, the relative squared error in percent

  • absErrorRel, the relative absolute error in percent

  • iRMSE, the RMSE of the inverse error, given in 1/m

Note that, due to the PNG format limiting the maximum range, pixels with a ground truth depth larger than 256 meters are not included in the evaluation.

Currently, we only support submissions for Single Image Depth Estimation algorithms. In the near future, we plan to add the option to define whether your algorithm estimates depth using single frames or multiple frames.

The depth values are returned from Blender in an additional Z-buffer pass, similar to the optical flow.

The extrinsic camera matrix is given by Blender, as the world matrix of the camera object.

The intrinsic camera matrix is computed using the focal length as given by Blender, the hard-coded pixel dimensions of 32 px / mm, zero pixel skew, and the principal point at x = 511.5 and y = 217.5.

If you have any questions or problems regarding this dataset, please do not hesitate to contact us.