GCP Api Calls failing in GKE. Looks like local metadata service is down

Cluster information:

Kubernetes version: v1.28.14-gke.1099000
Cloud being used: GKE
Installation method: bubble gum and chopsticks
Host OS: Google special Sauce
CNI and version: ??
CRI and version: ??

Hullo Community,

My app is trying to make a GCP pubsub “delete_subscription()” call after being told to shutdown. It is never able to complete this. I suspect that the GKE cluster node’s metadata services ( that proxies auth and api calls ) is already down . We’re talking seconds here. GCP support has not been helpful.

The error thrown by the app is :
“Connection refused (169.254.169.254:80)”

the stacktrace is:


   at Google.Apis.Http.ConfigurableMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Google.Apis.Auth.OAuth2.ComputeCredential.RequestAccessTokenAsync(CancellationToken taskCancellationToken)
   at Google.Apis.Auth.OAuth2.TokenRefreshManager.RefreshTokenAsync()
   at Google.Apis.Auth.OAuth2.TokenRefreshManager.GetAccessTokenForRequestAsync(CancellationToken cancellationToken)
   at Google.Apis.Auth.OAuth2.ServiceCredential.GetAccessTokenWithHeadersForRequestAsync(String authUri, CancellationToken cancellationToken)
   at Grpc.Auth.GoogleAuthInterceptors.<>c__DisplayClass3_0.<<FromCredential>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Grpc.Net.Client.Internal.GrpcProtocolHelpers.ReadCredentialMetadata(DefaultCallCredentialsConfigurator configurator, GrpcChannel channel, HttpRequestMessage message, IMethod method, CallCredentials credentials, CancellationToken cancellationToken)
   at Google.Apis.Auth.OAuth2.TokenRefreshManager.<GetAccessTokenForRequestAsync>g__LogException|10_0(Task task)
   at Grpc.Net.Client.Internal.GrpcCall`2.ReadCredentials(HttpRequestMessage request)
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)

Has anyone had an error like this. A pod that wants to make an GCP API call after the nodes started it’s shutdown?

David