Data Description - Surgical Tool Localization in endoscopic videos

Training Data: ¶

The dataset consists of video clips taken from surgical training exercises using the da Vinci robotic system
During these exercises the surgical trainees are performing standard activities such as dissecting tissue, suturing, etc.
There are 24,695 video clips, each 30 seconds long and captured at 60 fps with a resolution of of 720p (1280 x 720) from one channel of the endoscope (see example screenshots below)
For the extent of each clip, there are three robotic surgical tools installed and within the surgical field, although for some clips tools may be obscured or otherwise temporarily not visible
Each clip can contain three of 14 possible tools

Training labels:¶

For each 30 second clip within the training set we provide the corresponding "tool presence" labels (indicating which robotic tools are installed)
The same robotic tools are installed for the entire duration of each clip, hence there is one label per video clip
Labels are a list of 4 values corresponding to the tools present in the 4 robotic arms [USM1, USM2, USM3, USM4], e.g. ['cadiere forceps', 'nan', 'force bipolar', 'grasping retractor], where the arm with the camera installed is labeled, "nan"
The four arms are usually (though not necessarily) installed from left to right (i.e. USM1 is the leftmost arm, and USM4 is the rightmost arm)
Please note that there are cases where the label might indicate 3 tools present but only 2 or less tools can be seen in the video. This happens when the surgeon have moved the tool away from the view even though it is installed. This is a noise in training labels that gets introduced due to extracting tool information from robot system data directly.
*
A snapshot of the labels csv file is also shown below:

Testing Data: ¶

The testing dataset will also consist of video clips taken from surgical training exercises (similar to training set) using the da Vinci robotic system
The length of each video clip will be variable
The videos will be sampled at 1Hz (1 fps)

Testing labels:¶

The test set will have tool presence labels (same as those for training set), but will also be annotated for bounding boxes around the robotic tools. These annotations are generated with an experienced crowd of annotators. A few examples of bounding box annotations from the test set are shown below:

It is important to note that the clevis of each robotic surgical tool is considered as the ground truth for most tools (as shown above). However, there are following exceptions in this rule:

If a tool's clevis is not well defined, then the bounding box includes the surgical tip as well e.g monocular curve scissor as shown in the left image above (purple colored bounding box)
The tool is very large and the clevis does not always show in the field of view (e.g tip-up fenestrated grasper, suction irrigator, stapler, grasping retractor)

Moreover, using the information available in the UI to make predictions is not allowed. To enforce this, the UI will be blurred from the test set so any model using that information will not perform well. An example of image taken from the test set with blurred UI is given below:

Additional bounding box examples:

*Training Data: *¶

Training labels:¶

Testing Data: ¶

Testing labels:¶

Training Data: ¶