profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/henkish/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Henrik Jönsson henkish Plug and Trade Sweden

PlugAndTrade/rabbitmq-copy 1

Export rabbitmq configuration from one server and import into another

henkish/jupyterlab 0

Jupityer lab notebook

henkish/vagrant-devbox 0

Ubuntu development box

PlugAndTrade/Docker 0

Docker images

PlugAndTrade/win-elasticsearch 0

Windows container running Elasticsearch

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Hi! Not many, but as I understand it. But I think you could have only one single request and can run into this issue. If there is no available resources in your region, the function cannot be invoked.

Another client of ours are using GKE in same region as we use for this project and got problem with cluster unable to add additional nodes because of no available resources.

When digging in my Cloud Logs I found messages logged by Cloud Functions regarding not being able to execute. But only for a few of the invocations that returned 500 error. In most cases nothing was logged.

Possible solutions could be changing region for cloud functions. Or to migrate to Cloud Run as we did (and we haven’t experienced any random 500/504 problems since migrating).

henkish

comment created time in 10 days

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

We finally found the issue with random 500/504 errors. We used Cloud functions as backend. And apparently - Google can run out of available resources to invoke a Cloud function. Cloud function backend will then return 500 or not respond at all within 15 seconds. And this causes either 500 or 504 as response from ESP.

So that explains why everything was working fine most days, but some days we had lots of problems and errors.

Now we have migrated all backend services from Cloud Functions to instead use Cloud Run. In Cloud Run it is possible to set min number of instances, and each instance can handle concurrent requests (not supported in Cloud functions). So setting min number of instances to 1 "ensures" that you always have at least one service running when Google is running out of resources.

I will not close this issue though - because the original issue (ESP running in Cloud Run crashes on large HTTP requests) is still not solved.

henkish

comment created time in 13 days

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

I am sad to say that we still experience lots of random error behaviour running ESPv2 on Cloud Run. Today was (for unknown reason) an especially bad day for us with lots of issues.

Some of our more common issues (my conclusions from looking at cloud logs) are:

ESP/Cloud run responds with 500 error without forwarding request to backend cloud function (trace has no log entries from Cloud Function that should've been invoked for that URL) image

  • ESP/Cloud run responds with 504 gateway timeout for request that seems to respond in time image

  • ESP/Cloud waits several seconds before forwarding request to cloud function, causing timeout: image

As described in previous posts in thread - we experienced lots of these problems when a large request was sent to ESP - causing it to restart and respond with 504 errors. But now we have removed most of those large requests, and are a bit clueless why we have so much problems. Any help would be greatly appreciated!

henkish

comment created time in 24 days

startedmetabase/metabase

started time in 2 months

startedalyssaxuu/mapus

started time in 2 months

startedapache/pinot

started time in 2 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Do you know if problem also exist if I would host ESP in App Engine instead of Cloud Run? Or is problem isolated to Cloud Run?

henkish

comment created time in 2 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Any progress? 🤞

henkish

comment created time in 2 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Sorry pressed close issue by accident..

Great news! If you need access to our logs from our cloud project, or if I can assist in any other way, just let me know!

We have made temporary change in our infrastructure to route large requests directly to backend service instead of via ESP to avoid 504 Gateway Timeout outage problems.

But even with these changes, we experience bursts of 504 responses. What seem to help is to re-deploy Cloud Run service to force restart of containers. We have also scaled up to minimum 4 instances to

henkish

comment created time in 3 months

IssuesEvent

issue closedGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Hello! Not sure if my issue is related to ESP v2 or Cloud Run - but I think it is related to ESP - and therefore I post it here :)

We are using approach "Endpoints for Cloud Run with ESPv2" - meaning we are running ESP v2 as container in Cloud Run as API proxy in front of code that we run in Cloud Functions and other Cloud Run instances.

We experienced problems with API proxy responding with 504 gateway timeout on lots of requests. We pinpointed the problem to a partner that sends large JSON documents - that had started to send a lot more requests than previous days. It seemed that for the large files (>15 Mb JSON) - ESP sent back 500 HTTP status code as response. And then other requests started to time out.

We tried solving the problem by removing the endpoint from Google Endpoints service configuration - so it would return 404 - but the problem was still present.

I have tested around in a separate Cloud Run instance - and have posted large JSON requests to different endpoints - and I can replicate 500 errors. And when I look at the Cloud logs - it seems that directly after 500 error is logged - ESP is starting up a new instance - leading me to believe it has crashed.

I have tried changing different option (like memory and request timeouts and disabling metrics) but nothing has helped. I also tried enabling debug info - but I cannot see that it logged anything more about the 500 error.

I have same behaviour for both unauthenticated valid routes, authenticated valid routes (should return 401) and non existing routes (should return 404).

I have been using Visual Studio Code with plugin "humao.rest-client" to make the requests.

We use gcr.io/endpoints-release/endpoints-runtime-serverless:2 as base Docker image, and we run it without any arguments (ESPv2_ARGS).

Any ideas on what might be wrong (or how to continue debugging our issues) would be greatly appreciated!

closed time in 3 months

henkish

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

I just realized that I have another Cloud Run instance that runs NodeJS with Express web server. I tried sending POST request to a non defined endpoint with same JSON payload (15Mb). It also takes quite some time, but after ~15 seconds it responds with 404 as expected.

So maybe this indicates that problem is in ESP and not in Cloud Run internal software?

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Thank you for taking time to help :)

I think that I have enabled debug (using environment variable ESPv2_ARGS).

I now did GET request to non existing endpoint and got 404 as expected. Then I did POST to same endpoint with large JSON file and got 500 error.

I downloaded logs from "Log viewer" from my "Cloud run revision", I have only changed my project ID and the Cloud Run URL in the logs.

downloaded-logs-20210701-143137.csv

Please note that the log entries are newest first in the CSV file. And I added rows with "*************" in all columns to separate where the two requests starts and ends in the log.

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Thank you for taking time to help :)

I think that I have enabled debug (using environment variable ESPv2_ARGS).

I now did GET request to non existing endpoint and got 404 as expected. Then I did POST to same endpoint with large JSON file and got 500 error.

I downloaded logs from "Log viewer" from my "Cloud run revision", I have only changed my project ID and the Cloud Run URL in the logs.

downloaded-logs-20210701-143137.csv

Please note that the log entries are newest first in the CSV file. And I added rows with "*************" in all columns to separate where the two requests starts and ends in the log.

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

I am running it like this: image

But it only seems to log 500 error (no exception with stack trace or similar) - and then it looks like ESP/Envoy is starting up again:

image

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

I have deployed it now like this: image

And this is how the logs look like. There is no exception data or similar, just 500 error logged - and then it looks like Envoy is starting up a new instance.

image

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Since our project was created over 1 year ago, and we have lots of routes configured - I decided to try set up a new clean ESP v2 instance using tutorial: https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-run

I only have one example route configured (as in tutorial) in openapi specification (GET /hello)

I tried POSTing a 15Mb json file to / (route would normally respond with 404 and "The current request is not defined by this API.") - and I get 500 Internal Server Error as response, instead of expected 404.

So it seems that our issue can be reproduced by just following the official tutorial and then sending a large request to any endpoint.

henkish

comment created time in 3 months

issue commentGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Addition: If I send file that is above max limit (32 Mb) - it takes a long time - and then I get "413 Request Entity Too Large" response. Regardless if route is unauthenticated, authenticated or not existing.

So it seems that ESP handles entire request, before sending back response. I would have guessed that server would send back 404 if there is no matching route in service configuration - before reading request body. And 401 if authentication failes.

But this is maybe expected behaviour?

henkish

comment created time in 3 months

issue openedGoogleCloudPlatform/esp-v2

POST requests with large payloads returns 500

Hello! Not sure if my issue is related to ESP v2 or Cloud Run - but I think it is related to ESP - and therefore I post it here :)

We are using approach "Endpoints for Cloud Run with ESPv2" - meaning we are running ESP v2 as container in Cloud Run as API proxy in front of code that we run in Cloud Functions and other Cloud Run instances.

We experienced problems with API proxy responding with 504 gateway timeout on lots of requests. We pinpointed the problem to a partner that sends large JSON documents - that had started to send a lot more requests than previous days. It seemed that for the large files (>15 Mb JSON) - ESP sent back 500 HTTP status code as response. And then other requests started to time out.

We tried solving the problem by removing the endpoint from Google Endpoints service configuration - so it would return 404 - but the problem was still present.

I have tested around in a separate Cloud Run instance - and have posted large JSON requests to different endpoints - and I can replicate 500 errors. And when I look at the Cloud logs - it seems that directly after 500 error is logged - ESP is starting up a new instance - leading me to believe it has crashed.

I have tried changing different option (like memory and request timeouts and disabling metrics) but nothing has helped. I also tried enabling debug info - but I cannot see that it logged anything more about the 500 error.

I have same behaviour for both unauthenticated valid routes, authenticated valid routes (should return 401) and non existing routes (should return 404).

I have been using Visual Studio Code with plugin "humao.rest-client" to make the requests.

We use gcr.io/endpoints-release/endpoints-runtime-serverless:2 as base Docker image, and we run it without any arguments (ESPv2_ARGS).

Any ideas on what might be wrong (or how to continue debugging our issues) would be greatly appreciated!

created time in 3 months