Shubhambindal2017/Instance-Segmentation-using-Mask-RCNN-with-different-backbones-on-Crack-Data 2

Instance Segmentation using Mask-RCNN with different backbones (Resnet-101 & MobileNet) on Crack Data and also created a Rest-API for it using Flask.

Shubhambindal2017/Unoffical-Implementation-of-GRAD-CAM-with-Sanity-Checks 1

Implementation of GRAD-CAM with the sanity checks proposed by 'Sanity Checks for Saliency Maps', and found that the GRAD-CAM passes the checks.

Shubhambindal2017/ai-platform 0

Open source platform for machine learning tasks

Shubhambindal2017/Algorithms 0

Efficient Algorithms

Shubhambindal2017/Background-Matting 0

Background Matting: The World is Your Green Screen

Shubhambindal2017/Calculator 0

it is a normal calculator made with tkinter

Shubhambindal2017/Content-Based-Image-Retrival-using-Encoder-Decoder 0

Content Based Image Retrival using Encoder Decoder, my program accepts an input image and return N images from the provided dataset similar to the input image.

🤗 Fast, efficient, open-access datasets and evaluation metrics in PyTorch, TensorFlow, NumPy and Pandas

Shubhambindal2017/datasets-tagging 0

A Streamlit app to add structured tags to the datasets

Shubhambindal2017/deep-learning 0

Repo for the Deep Learning Nanodegree Foundations program.

issue commentgoogle/mediapipe

Face orientation angles - pitch/yaw/roll from face geometry

@kostyaby Hi, thanks for the reply and suggestions.

**(a)** ['h'][2] was just to get the Pose Transform Matrix as a array - getPackedDataList() can also be used to get it.
`pt_matrix = results.multiFaceGeometry[0].getPoseTransformMatrix().getPackedDataList()`

**(b)** Yeah, you were correct about column-major matrix, thanks for correcting it, btw I have also re-tried it correctly - but still not much difference in the angle values (except that the angles sign becomes opposite).

**(c)** Now I have also tried a library (instead of my own code) - Three.js to get Euler angles - but still output angles are not as expected. https://threejs.org/build/three.js

**Example** -
When I kept my face straight and moved it to right - 90* to webcam (with negligible vertical angle), I got

**pt_matrix** = [0.5567780137062073, 0.034023914486169815, 0.8299639821052551, 0, -0.011918997392058372, 0.9993847608566284, -0.03297339752316475, 0, -0.8305754661560059, 0.008466490544378757, 0.5568410754203796, 0, -1.418548345565796, 6.790719509124756, -39.25355529785156, 1]

**pt_matrix_three_js_format.elements** are also the same = [0.5567780137062073, 0.034023914486169815, 0.8299639821052551, 0, -0.011918997392058372, 0.9993847608566284, -0.03297339752316475, 0, -0.8305754661560059, 0.008466490544378757, 0.5568410754203796, 0, -1.418548345565796, 6.790719509124756, -39.25355529785156, 1]

Values I got

- In order 'XYZ' euler_angles['x'] : -0.015203329261613606, euler_angles['y'] : -0.9801402166372382, euler_angles['z'] : 0.021403821484576348 pitch : -0.8710866012382058 , yaw : -56.15789774435194, roll : 1.2263486365176608
- In order 'ZYX' euler_angles['x'] : -0.05914602971372085, euler_angles['y'] : -0.9790431118901572, euler_angles['z'] : 0.06103268613630561 pitch : -3.388817877551565 , yaw : -56.09503827266044, roll : 3.4969153279569225

* Values of Pitch and Yaw makes sense but isn't yaw value unexpected? - as ideally it should be 90 or close to it.
@kostyaby What's your thought on this?**

**Code that I used for above example**

```
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/face_mesh.js" crossorigin="anonymous"></script>
<script src="https://threejs.org/build/three.js" crossorigin="anonymous"></script>
</head>
<body>
<div class="container">
<video class="input_video"></video>
<canvas class="output_canvas" width="1280px" height="720px"></canvas>
</div>
</body>
</html>
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
function onResults(results) {
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasCtx.drawImage(
results.image, 0, 0, canvasElement.width, canvasElement.height);
if (results.multiFaceLandmarks) {
for (const landmarks of results.multiFaceLandmarks) {
drawConnectors(canvasCtx, landmarks, FACEMESH_TESSELATION,
{color: '#C0C0C070', lineWidth: 1});
drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_EYE, {color: '#FF3030'});
drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_EYEBROW, {color: '#FF3030'});
drawConnectors(canvasCtx, landmarks, FACEMESH_RIGHT_IRIS, {color: '#FF3030'});
drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_EYE, {color: '#30FF30'});
drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_EYEBROW, {color: '#30FF30'});
drawConnectors(canvasCtx, landmarks, FACEMESH_LEFT_IRIS, {color: '#30FF30'});
drawConnectors(canvasCtx, landmarks, FACEMESH_FACE_OVAL, {color: '#E0E0E0'});
drawConnectors(canvasCtx, landmarks, FACEMESH_LIPS, {color: '#E0E0E0'});
}
}
if (results.multiFaceGeometry){
for (const facegeometry of results.multiFaceGeometry){
const pt_matrix = facegeometry.getPoseTransformMatrix().getPackedDataList();
const pt_matrix_three_js_format = new THREE.Matrix4().fromArray(pt_matrix);
const euler_angles = new THREE.Euler().setFromRotationMatrix(pt_matrix_three_js_format, 'XYZ');
const pitch = THREE.MathUtils.radToDeg(euler_angles['x']);
const yaw = THREE.MathUtils.radToDeg(euler_angles['y']);
const roll = THREE.MathUtils.radToDeg(euler_angles['z']);
console.log('-');
}
}
canvasCtx.restore();
}
const faceMesh = new FaceMesh({locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
}});
faceMesh.setOptions({
maxNumFaces: 1,
enableFaceGeometry: true,
refineLandmarks: false,
minDetectionConfidence: 0.48,
minTrackingConfidence: 0.5
});
faceMesh.onResults(onResults);
const camera = new Camera(videoElement, {
onFrame: async () => {
await faceMesh.send({image: videoElement});
},
width: 1280,
height: 720
});
camera.start();
</script>
```

comment created time in 6 days

pull request commentvladmandic/human

actually, I started this thread to understand why the angles are multiplied by 2, as by definition/logic it should not be multiplied. But at the same time it also true that by removing it, all the angles will now be halved. Is there any evaluation strategy that you use to evaluate performance of yaw/pitch/roll?

comment created time in 7 days

pull request commentvladmandic/human

@vladmandic but removing 2x will also make the yaw/pitch/roll halved (to what current 'human' output) right? (Please confirm whether I am right or not - based on your test results.) If yes, then will those values of yaw/pitch/roll be correct? Example - I think after removing 2x, if we move face left/right 90* then yaw will be close to +-45* (it will not reach a value of 90* - what ideally it should output).

comment created time in 8 days

pull request commentvladmandic/human

Thanks @vladmandic, but I doubt that removing 2x will also cause the angles (yaw/pitch/roll) to be halved of what the current 'human' gave as output ; will wait for your test results.

Btw @ButzYung if possible can you please the code snippet of another method (matrix => quaternions => eulers) that you used to get angles from the rotation matrix.

comment created time in 9 days

issue openedgoogle/mediapipe

Face orientation angles - pitch/yaw/roll from face geometry

Hi, I'm having some issues with retrieving the actual orientation angles i.e pitch/yaw/roll from face geometry. I need them for my application. I'm probably missing some obvious detail. Can you give me some hints? Thanks

I tried a solution based on the approach suggested by @kostyaby here at : https://github.com/google/mediapipe/issues/1561#issuecomment-771027016

Here is my code snippet that I am using.

```
// Function based on https://www.geometrictools.com/Documentation/EulerAngles.pdf
const rotationMatrixToEulerAngle = (r) => {
const [r00, r01, r02, r10, r11, r12, r20, r21, r22] = r;
let thetaX: number;
let thetaY: number;
let thetaZ: number;
if (r10 < 1) { // YZX calculation
if (r10 > -1) {
thetaZ = Math.asin(r10);
thetaY = Math.atan2(-r20, r00);
thetaX = Math.atan2(-r12, r11);
} else {
thetaZ = -Math.PI / 2;
thetaY = -Math.atan2(r21, r22);
thetaX = 0;
}
} else {
thetaZ = Math.PI / 2;
thetaY = Math.atan2(r21, r22);
thetaX = 0;
}
if (isNaN(thetaX)) thetaX = 0;
if (isNaN(thetaY)) thetaY = 0;
if (isNaN(thetaZ)) thetaZ = 0;
return { pitch: -thetaX, yaw: -thetaY, roll: -thetaZ };
};
//results - assume results is the result object of FaceMesh
pt_matrix = results.multiFaceGeometry[0].getPoseTransformMatrix()['h'][2]
rotation_matrix = [pt_matrix[0], pt_matrix[1], pt_matrix[2],
pt_matrix[4], pt_matrix[5], pt_matrix[6],
pt_matrix[8], pt_matrix[9], pt_matrix[10]] ]
angles = rotationMatrixToEulerAngle(rotation_matrix)
```

But seems like the pitch/yaw/roll I am getting are not very accurate, can anyone please help? Solution - FaceMesh in JS

created time in 9 days

pull request commentvladmandic/human

@ButzYung @vladmandic Thanks for this awesome library, just a question why do we need to multiply each of -thetaX, -thetaY and -thetaZ by '2' inorder to get pitch, yaw and roll respectively? Isn't -thetaX, -thetaY and -thetaZ itself the pitch, yaw and roll (euler angles)?

`return { pitch: 2 * -thetaX, yaw: 2 * -thetaY, roll: 2 * -thetaZ };`

https://github.com/vladmandic/human/blob/main/src/face/angles.ts#L81

I am asking this based on these references:

- https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html
- https://github.com/natanielruiz/deep-head-pose/blob/master/code/datasets.py#L540:L544
- https://github.com/google/mediapipe/issues/1561#issuecomment-771027016

comment created time in 9 days

issue openedtensorflow/models

SSD-MobilenetV2 - Preprocessing during Inference

I have fine-tuned an SSD-Mobilenetv2 with train config fixed resize 300x300 built using tensorflow objection detection API and saved in TF Saved_Model format. Questions:

- How during inference it is able to accept input images of any shape (not 300x300) without the need for any preprocessing to resize to 300x300 first and then pass to the model.
- Does it is because saved_model by default does resize during inference? (If yes, does it also normalize them because before doing convolution operations) (I am new to saved_model format but I think it is not because of saved_model, but then how is it possible - as I think SSD-Mobilenet includes FC layers which require fixed input size) OR does the architecture uses AdaptivePooling in b/w to achieve this.

In simple words - Documentation is not clear - regarding what pre-processing (Resize / Normalization) steps are required to inference from saved_model format. Here too - no pre-processing like resizing and normalization is applied to input image. https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_from_saved_model_tf2_colab.ipynb

created time in 2 months

issue openedtensorflow/models

Get/save predictions made during evaluation for object detection models

At present, if someone wants to evaluate and get/save predictions on same dataset for a trained model, they need to run evaluation (by running model_main_tf2.py with checkpoint_dir arguement) and inference separately, so what if the user gets an option to save the prediction during the evaluation itself, it can reduce need to run inference separately.

And a queries: I have fine-tuned an SSD-Mobilenetv2 with train config fixed resize 300x300 built using tensorflow objection detection API and saved in TF Saved_Model format. Questions:

- How during inference it is able to accept input images of any shape (not 300x300) without the need for any preprocessing to resize to 300x300 first and then pass to the model.
- Does it is because saved_model by default does resize during inference? (If yes, does it also normalize them because before doing convolution operations) (I am new to saved_model format but I think it is not because of saved_model, but then how is it possible - as I think SSD-Mobilenet includes FC layers which require fixed input size) OR does the architecture uses AdaptivePooling in b/w to achieve this.

created time in 2 months