Ask questionsAccessing landmarks, tracking multiple hands, and enabling depth on desktop


I found out about Mediapipe after seeing Google's blog post regarding hand tracking. Currently, I am working on using Mediapipe to build a cross platform interface using gestures to control multiple systems. I am using the Desktop CPU example as a base for how to move forward, and I have successfully retrieved the hand landmarks. I just want to ensure that I am retrieving them in the most efficient and proper way.

The process I use is as follows:

  1. Create a listener of class OutputStreamPoller which listens for the hand_landmarks output stream in the HandLandmark subgraph.
  2. If there is an available packet, load the packet into a variable of class mediapipe::Packet using the .Next() method of the OutputStreamPoller class.
  3. Use the .Get<T>() method of the Packet class and load into another variable called hand_landmarks.
  4. Loop through the variable and retrieve the x, y, and z coordinates and place them into a vector for processing.

Is this process correct or is there a better way to go about retrieving the coordinates of the hand landmarks?

I have additional questions, but I am unsure if I should place them in a separate issue. I will ask them here but please let me know if I should open a separate issue.

  1. In the hand tracking examples, only a single hand is to be detected. How would I alter the build such that it can detect multiple hands (specifically 2)?
  2. How would I enable the desktop implementations of hand tracking such that they can capture depth (similar to how the android/ios 3D builds can output z coordinates)?

Answer questions JECBello

@Sara533 In order to access the hand landmarks from the GPU buffer, you need to use a for loop. Here is a snippet of the code I used to extract the landmarks, which was in a buffer I had named hand_landmarks, and print them to the console.

for (const auto &landmark : hand_landmarks) 
    LOG(INFO) << "x coordinate: " << landmark.x(); 
    LOG(INFO) << "y coordinate: " << landmark.y(); 
    LOG(INFO) << "z coordinate: " << landmark.z(); 

If anyone knows a more efficient way of accessing the landmark coordinates, please feel free to correct me!

Github User Rank List