profile
viewpoint

Ask questionsAccessing landmarks, tracking multiple hands, and enabling depth on desktop

Hello,

I found out about Mediapipe after seeing Google's blog post regarding hand tracking. Currently, I am working on using Mediapipe to build a cross platform interface using gestures to control multiple systems. I am using the Desktop CPU example as a base for how to move forward, and I have successfully retrieved the hand landmarks. I just want to ensure that I am retrieving them in the most efficient and proper way.

The process I use is as follows:

  1. Create a listener of class OutputStreamPoller which listens for the hand_landmarks output stream in the HandLandmark subgraph.
  2. If there is an available packet, load the packet into a variable of class mediapipe::Packet using the .Next() method of the OutputStreamPoller class.
  3. Use the .Get<T>() method of the Packet class and load into another variable called hand_landmarks.
  4. Loop through the variable and retrieve the x, y, and z coordinates and place them into a vector for processing.

Is this process correct or is there a better way to go about retrieving the coordinates of the hand landmarks?

I have additional questions, but I am unsure if I should place them in a separate issue. I will ask them here but please let me know if I should open a separate issue.

  1. In the hand tracking examples, only a single hand is to be detected. How would I alter the build such that it can detect multiple hands (specifically 2)?
  2. How would I enable the desktop implementations of hand tracking such that they can capture depth (similar to how the android/ios 3D builds can output z coordinates)?
google/mediapipe

Answer questions JECBello

@Sara533 In order to access the hand landmarks from the GPU buffer, you need to use a for loop. Here is a snippet of the code I used to extract the landmarks, which was in a buffer I had named hand_landmarks, and print them to the console.

for (const auto &landmark : hand_landmarks) 
{ 
    LOG(INFO) << "x coordinate: " << landmark.x(); 
    LOG(INFO) << "y coordinate: " << landmark.y(); 
    LOG(INFO) << "z coordinate: " << landmark.z(); 
}

If anyone knows a more efficient way of accessing the landmark coordinates, please feel free to correct me!

useful!

Related questions

Get hand_tracking_cpu source code to extract detected hand metadata hot 2
SSL certification error while downloading dependency libraries via Bazel hot 1
Build of Hello World Failed on Darwin (MacOS) hot 1
Build of Hello World Failed on Darwin (MacOS) hot 1
Bazel build failed in Android Studio hot 1
Fail to pull in MediaPipe's external dependencies due to unstable internet connection hot 1
at com.google.mediapipe.apps.objectdetectioncpu.MainActivity.<clinit>(MainActivity.java:50) hot 1
Hand Tracking Desktop GPU Build Error hot 1
ERROR: An error occurred during the fetch of repository 'bazel_skylib' hot 1
Bazel build failed to fetch Maven dependency hot 1
Running error hot 1
Hello World failed build on Raspberry Pi - mediapipe hot 1
Unable to load the hand detection model hot 1
build aar failed hot 1
Error in building TensorFlow Object Detection Demo hot 1
Github User Rank List