Humans setting the table

A large part of the knowledge that robots need to accomplish human-scale manipulation tasks can be acquired by watching humans perform these tasks and interpreting the observed activity to build activity models at multiple levels of abstraction. An excellent source for such research is the Kitchen Data Set.

The dataset is provided to foster research in the areas of markerless human motion capture, motion segmentation and human activity recognition. The recorded activities have been selected with the intention to provide realistic and seemingly natural motions, and consist of everyday manipulation activities in a natural kitchen environment. It should aid researchers in these fields by providing a comprehensive collection of sensory input data that can be used to try out and to verify their algorithms. It is also meant to serve as a benchmark for comparative studies given the manually annotated “ground truth” labels of the underlying actions.

Knowledge base Publications



The Kitchen Data Set contains observations of several subjects setting a table in different ways. Some perform the activity like a robot would do, transporting the items one-by-one, other subjects behave more natural and grasp as many objects as they can at once. In addition, there are two episodes where the subjects repetitively performed reaching and grasping actions. Applications of the data are mainly in the areas of human motion tracking, motion segmentation, and activity recognition.

Description of the data

To provide sufficient information for recognizing and characterizing the observed activities, we recorded the following multi-modal sensor data:

  • Video data from four fixed, overhead cameras (384×288 pixels RGB color or 780×582 pixels raw Bayer pattern, at 25Hz)
  • Motion capture data (*.bvh file format) extracted from the videos using our markerless full-body MeMoMan tracker
  • RFID tag readings from three fixed readers embedded in the environment (sample rate 2Hz)
  • Magnetic (reed) sensors detecting when a door or drawer is opened. (sample rate 10Hz)
  • Action labels (the data is labeled separately for the left hand, the right hand, and the trunk of the person)


Partly supported by DFG


  • DFG Logo