Sintel Depth

The ground truth data corresponding to the training images contains, for each sequence,

Depth maps & visualization
The depth maps contain the ground truth depth value (in meter) at each pixel. Additionally, we include a visualization showing the inverse depth.
Camera data
We provide the camera matrices (intrinsic and extrinsic) for the camera.
SDK
The SDK includes example scripts to read/write the depth and camera data, and documentation on the data format.

The test images contain 12 test sequences with withheld ground truth, for both clean and final passes.

The submission format is analogous to the one used in KITTI.

In this format, the depth of each frame is stored as an unsigned 16-Bit Single-Channel PNG in increments of 1/256th meters.

The submission file is simply a .zip-file containing the following structure:

/
/clean/
/clean/ambush_1/
/clean/ambush_1/frame_0001.png
/clean/ambush_1/...
/clean/ambush_1/frame_0023.png
/clean/...
/final/
/final/ambush_1/
/final/ambush_1/frame_0001.png
/final/ambush_1/...
/final/ambush_1/frame_0023.png
/final/...

Please make sure that your archive contains only the depth images. It should contain 1128 files.

Do evaluate the data, we use the same standard metrics as in the KITTI depth estimation benchmark. In particular, these are:

SILog, the scale invariant logarithmic error, given in log(m)
sqErrorRel, the relative squared error in percent
absErrorRel, the relative absolute error in percent
iRMSE, the RMSE of the inverse error, given in 1/m

Note that, due to the PNG format limiting the maximum range, pixels with a ground truth depth larger than 256 meters are not included in the evaluation.

Currently, we only support submissions for Single Image Depth Estimation algorithms. In the near future, we plan to add the option to define whether your algorithm estimates depth using single frames or multiple frames.

The depth values are returned from Blender in an additional Z-buffer pass, similar to the optical flow.

The extrinsic camera matrix is given by Blender, as the world matrix of the camera object.

The intrinsic camera matrix is computed using the focal length as given by Blender, the hard-coded pixel dimensions of 32 px / mm, zero pixel skew, and the principal point at x = 511.5 and y = 217.5.

FAQ

What data is contained in the archives?

What is the submission format?

Which metrics do you use to evaluate?

Is this benchmark for Single Image Depth Estimation only?

How was the data generated?