Gateway API cross-namespace routes doesnt work...

Hello everybody... I am struggling for over a week now to make my cross-namespace httproutes working.

I have the following setup (the very same as the official examples):

 

 

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: mgmt-internal-gw
  namespace: gateway
  annotations:
    cert-manager.io/issuer: cf-issuer
spec:
  gatewayClassName: gke-l7-rilb
  listeners:
  - name: http-listener
    port: 80
    protocol: HTTP
    hostname: "my.domain.com"
  - name: https-listener
    port: 443
    protocol: HTTPS
    hostname: "my.domain.com"
    allowedRoutes:
      namespaces:
        from: All
    tls:
      mode: Terminate 
      certificateRefs:
        - name: internal-wildcard-cert
          kind: Secret
          group: ""
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http-filter-redirect
  namespace: gateway
spec:
  parentRefs:
  - name: mgmt-internal-gw
    sectionName: http-listener
  hostnames:
  - "my.domain.com"
  rules:
  - filters:
    - type: RequestRedirect
      requestRedirect:
        scheme: https
        statusCode: 301
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: argocd-route
  namespace: system-argocd
spec:
  parentRefs:
    - name: mgmt-internal-gw
      namespace: gateway
      sectionName: https-listener
  hostnames:
  - "my.domain.com"
  rules:
  - matches:
    - path:
        value: /
        type: PathPrefix
    backendRefs:
      - name: argocd-argo-cd-server
        port: 8080

 

 

I can see all routes and the gateway itself healthy in GCP console. When I try to reach my domain, I got 503 "no healthy upstream". I confirm that service inside (ArgoCD) is working pretty fine - i can access with port-forward or from another pod via curl. If I deploy argocd (or any other service) within the same namespace and have the httproute + gateway + service within the same namespace, all works well. I have also tried to create referencegrants but still the same. I am using autopilot, I have tried with versions 27,28,29. The cluster is all private if that matters... 
Do you guys have any suggestion what could be missing here in my setup?

Solved Solved
1 3 202
1 ACCEPTED SOLUTION

Hello All, I am passing by to tell you that I found the solution myself. The solution is not described in any documentation (dont know why....). If someone here can help me how to give GCP feedback so the documents can be updated?

The solution:

- the exposed services require NodePort, not ClusterIP like in the documentation... Otherwise, the backend healthchecks do not pass.  There is even no meaningful alert in GCP UI when browsing the gateway or the routes. You need to go to the services, click on the desired service you want to expose, go to backends, expand advanced options, and then you will see a small alert that the backend cannot access the service... Really annoying.

- the HealthCheckPolicy, GCPBackendPolicy and GCPGatewayPolicy are NOT required! (but ofc they are best practice)

- the ReferenceGrant is not needed at all.

good luck everyone!

View solution in original post

3 REPLIES 3

Hi @adimitrov,

Welcome to the Google Cloud Community!

It may be challenging to troubleshoot this issue without more visibility but If I understand your issue correctly, you are receiving 503 errors, "no healthy upstream" on your GKE.

HTTP 503 Service Unavailable, no health upstream means that either there are no hosts available to serve the traffic or all hosts have failed the backend health checks. 

  • HealthCheckPolicy, and GCPBackendPolicy resources must exist in the same namespace as the target Service or ServiceImport resource. You might want to checking out this documentation. 

Add the Health check configuration on your YAML file as this could ensure that traffic is only routed to servers that are able to respond effectively.

Here's a sample YAML that is available on the documentation provided.

apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: lb-healthcheck
 
namespace: lb-service-namespace
spec:
  default:
    checkIntervalSec: INTERVAL
   
timeoutSec: TIMEOUT
   
healthyThreshold: HEALTHY_THRESHOLD
   
unhealthyThreshold: UNHEALTHY_THRESHOLD
   
logConfig:
      enabled: ENABLED
   
config:
      type: PROTOCOL
     
httpHealthCheck:
        portSpecification: PORT_SPECIFICATION
       
port: PORT
       
portName: PORT_NAME
       
host: HOST
       
requestPath: REQUEST_PATH
       
response: RESPONSE
       
proxyHeader: PROXY_HEADER
     
httpsHealthCheck:
        portSpecification: PORT_SPECIFICATION
       
port: PORT
       
portName: PORT_NAME
       
host: HOST
       
requestPath: REQUEST_PATH
       
response: RESPONSE
       
proxyHeader: PROXY_HEADER
     
grpcHealthCheck:
        grpcServiceName: GRPC_SERVICE_NAME
       
portSpecification: PORT_SPECIFICATION
       
port: PORT
       
portName: PORT_NAME
     
http2HealthCheck:
        portSpecification: PORT_SPECIFICATION
       
port: PORT
       
portName: PORT_NAME
       
host: HOST
       
requestPath: REQUEST_PATH
       
response: RESPONSE
       
proxyHeader: PROXY_HEADER
 
targetRef:
    group: ""
   
kind: Service
   
name: lb-service

Make sure you review the restrictions and limitations before deploying GKE resources. Also you might want consider filing a ticket with our Google Support team as they are well-equipped to handle issues like these.

Hope you find this helpful.

Hello RFelizardo,

I do appreciate your help here 🙂 As per your suggestion and all the things described in the documentation, I have also created the Health check policy, backend policy and gateway policy. Here are the specs:

 

 

 

---
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: lb-healthcheck
  namespace: system-argocd
spec:
  default:
    checkIntervalSec: 5
    timeoutSec: 5
    healthyThreshold: 3
    unhealthyThreshold: 3
    logConfig:
      enabled: true
    config:
      type: HTTPS
      # httpHealthCheck:
      #   port: 8080
      #   requestPath: /
      httpsHealthCheck:
        port: 8443
        requestPath: /
  targetRef:
    group: ""
    kind: Service
    name: argocd-argo-cd-server
---
apiVersion: networking.gke.io/v1
kind: GCPBackendPolicy
metadata:
  name: my-backend-policy
  namespace: system-argocd
spec:
  default:
    sessionAffinity:
      type: CLIENT_IP
  targetRef:
    group: ""
    kind: Service
    name: argocd-argo-cd-server
---
apiVersion: networking.gke.io/v1
kind: GCPGatewayPolicy
metadata:
  name: my-gateway-policy
  namespace: default
spec:
  default:
    allowGlobalAccess: true
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: mgmt-internal-gw

 

 

Unfortunately the behavior is still the same. I still got 503... 
I do not think it is a problem of insufficient nodes because as I mentioned, if I do a port-forwarding or move the service + pods within the same namespace as the Gateway, everything works pretty fine. 

One  interesting thing is that I have tried to setup my own vanilla Kubernetes deployed on Compute VM's and tested out the cross-namespace scenario, and it seems that it is working. So I guess it is kind of a GKE thing... It becomes really frustrating as GKE promises to be the easiest and slickest way to consume Kubernetes, however...
Gonna try with the support, fingers crossed solution comes from their end.

Hello All, I am passing by to tell you that I found the solution myself. The solution is not described in any documentation (dont know why....). If someone here can help me how to give GCP feedback so the documents can be updated?

The solution:

- the exposed services require NodePort, not ClusterIP like in the documentation... Otherwise, the backend healthchecks do not pass.  There is even no meaningful alert in GCP UI when browsing the gateway or the routes. You need to go to the services, click on the desired service you want to expose, go to backends, expand advanced options, and then you will see a small alert that the backend cannot access the service... Really annoying.

- the HealthCheckPolicy, GCPBackendPolicy and GCPGatewayPolicy are NOT required! (but ofc they are best practice)

- the ReferenceGrant is not needed at all.

good luck everyone!

Top Labels in this Space