gRPC for Service-to-Service Communication#

gRPC is a high-performance RPC framework that uses HTTP/2 for transport and Protocol Buffers (protobuf) for serialization. For service-to-service communication within a microservices architecture, gRPC offers significant advantages over REST: strongly typed contracts, efficient binary serialization, streaming support, and code generation in every major language.

Why gRPC for Internal Services#

REST with JSON is the standard for public APIs. For internal service-to-service calls, gRPC is often the better choice.

Performance. Protobuf messages are 3-10x smaller than equivalent JSON and 20-100x faster to serialize/deserialize. Over millions of internal calls per minute, this adds up.

Contract enforcement. A .proto file is an unambiguous contract. Both client and server are generated from the same definition. There is no room for disagreement about field names, types, or required vs optional.

Streaming. gRPC natively supports four communication patterns: unary (request-response), server streaming, client streaming, and bidirectional streaming. REST requires workarounds (WebSockets, SSE) for anything beyond request-response.

Code generation. From a single .proto file, you generate client and server code in Go, Java, Python, Rust, C++, and more. No hand-written HTTP clients, no parsing JSON into structs.

Protobuf Service Definitions#

A .proto file defines your service contract. It specifies the service methods, request/response message types, and field types.

syntax = "proto3";

package orders.v1;

option go_package = "github.com/example/orders/v1;ordersv1";

import "google/protobuf/timestamp.proto";

// The Order service manages customer orders.
service OrderService {
  // Unary: create a new order
  rpc CreateOrder(CreateOrderRequest) returns (CreateOrderResponse);

  // Unary: get a single order by ID
  rpc GetOrder(GetOrderRequest) returns (Order);

  // Server streaming: watch for order status changes
  rpc WatchOrderStatus(WatchOrderStatusRequest) returns (stream OrderStatusUpdate);

  // Client streaming: submit a batch of order line items
  rpc SubmitLineItems(stream LineItem) returns (SubmitLineItemsResponse);

  // Bidirectional streaming: real-time order processing
  rpc ProcessOrders(stream OrderAction) returns (stream OrderResult);
}

message CreateOrderRequest {
  string customer_id = 1;
  repeated LineItem items = 2;
  Address shipping_address = 3;
}

message CreateOrderResponse {
  string order_id = 1;
  google.protobuf.Timestamp created_at = 2;
}

message GetOrderRequest {
  string order_id = 1;
}

message Order {
  string order_id = 1;
  string customer_id = 2;
  OrderStatus status = 3;
  repeated LineItem items = 4;
  google.protobuf.Timestamp created_at = 5;
  google.protobuf.Timestamp updated_at = 6;
}

enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0;
  ORDER_STATUS_PENDING = 1;
  ORDER_STATUS_CONFIRMED = 2;
  ORDER_STATUS_SHIPPED = 3;
  ORDER_STATUS_DELIVERED = 4;
  ORDER_STATUS_CANCELLED = 5;
}

message LineItem {
  string product_id = 1;
  int32 quantity = 2;
  int64 price_cents = 3;
}

message Address {
  string street = 1;
  string city = 2;
  string state = 3;
  string zip_code = 4;
  string country = 5;
}

Proto file conventions#

Package naming: Use service.version (e.g., orders.v1). The version in the package name lets you run old and new versions simultaneously during migrations.
Field numbering: Never reuse field numbers. When you remove a field, mark it as reserved so it cannot be accidentally reused.
Enums: Always include an UNSPECIFIED = 0 value. Proto3 uses 0 as the default, and you need to distinguish “field was not set” from “field was explicitly set to the first real value.”
Timestamps: Use google.protobuf.Timestamp, not int64 or string. It has well-defined semantics and library support in every language.
Money: Use int64 for cents, not float/double. Floating point arithmetic produces rounding errors in financial calculations.

Streaming Patterns#

Unary (request-response)#

The simplest pattern. Client sends one request, server returns one response. Use this for most CRUD operations.

// Server implementation (Go)
func (s *server) GetOrder(ctx context.Context, req *pb.GetOrderRequest) (*pb.Order, error) {
    order, err := s.store.FindOrder(ctx, req.OrderId)
    if err != nil {
        if errors.Is(err, ErrNotFound) {
            return nil, status.Errorf(codes.NotFound, "order %s not found", req.OrderId)
        }
        return nil, status.Errorf(codes.Internal, "failed to fetch order: %v", err)
    }
    return orderToProto(order), nil
}

Server streaming#

Server sends a stream of messages in response to a single client request. Use this for watching for changes, streaming large result sets, or delivering real-time updates.

// Server implementation: stream order status updates
func (s *server) WatchOrderStatus(req *pb.WatchOrderStatusRequest, stream pb.OrderService_WatchOrderStatusServer) error {
    ctx := stream.Context()
    ch := s.statusWatcher.Subscribe(req.OrderId)
    defer s.statusWatcher.Unsubscribe(req.OrderId, ch)

    for {
        select {
        case <-ctx.Done():
            return ctx.Err()
        case update, ok := <-ch:
            if !ok {
                return nil // channel closed, order finalized
            }
            if err := stream.Send(update); err != nil {
                return err
            }
        }
    }
}

Client streaming#

Client sends a stream of messages, server responds with a single message after the stream completes. Use this for batch operations, file uploads, or aggregation.

// Server implementation: receive batch of line items
func (s *server) SubmitLineItems(stream pb.OrderService_SubmitLineItemsServer) error {
    var items []*pb.LineItem
    var totalCents int64

    for {
        item, err := stream.Recv()
        if err == io.EOF {
            // Client finished sending, return the result
            return stream.SendAndClose(&pb.SubmitLineItemsResponse{
                ItemCount:  int32(len(items)),
                TotalCents: totalCents,
            })
        }
        if err != nil {
            return err
        }
        items = append(items, item)
        totalCents += item.PriceCents * int64(item.Quantity)
    }
}

Bidirectional streaming#

Both client and server send streams of messages independently. Use this for real-time collaborative workflows, chat-style communication, or processing pipelines where results come back as they are ready.

// Server implementation: process orders and return results as each completes
func (s *server) ProcessOrders(stream pb.OrderService_ProcessOrdersServer) error {
    for {
        action, err := stream.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }

        // Process each action and stream back the result
        result, err := s.processAction(stream.Context(), action)
        if err != nil {
            // Send error result, do not kill the stream
            stream.Send(&pb.OrderResult{
                OrderId: action.OrderId,
                Success: false,
                Error:   err.Error(),
            })
            continue
        }

        if err := stream.Send(result); err != nil {
            return err
        }
    }
}

Load Balancing#

gRPC uses long-lived HTTP/2 connections. This breaks traditional L4 (TCP) load balancers because all requests from a client flow over a single connection to a single backend. You need L7 (application-level) load balancing.

Client-side load balancing#

The gRPC client resolves multiple backend addresses and distributes requests across them.

// Go client with round-robin load balancing
import "google.golang.org/grpc/balancer/roundrobin"

conn, err := grpc.Dial(
    "dns:///order-service.production.svc.cluster.local:50051",
    grpc.WithDefaultServiceConfig(`{"loadBalancingConfig": [{"round_robin":{}}]}`),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
)

The dns:/// prefix tells the gRPC resolver to look up all A records for the hostname and connect to each one. In Kubernetes, use a headless Service (ClusterIP: None) so DNS returns individual pod IPs.

apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  clusterIP: None   # headless -- DNS returns pod IPs
  selector:
    app: order-service
  ports:
    - port: 50051
      targetPort: 50051
      protocol: TCP

Proxy-based load balancing#

Use an L7 proxy like Envoy, Linkerd, or a service mesh for transparent load balancing. This is operationally simpler because clients do not need load balancing configuration.

# Envoy L7 load balancing for gRPC
clusters:
  - name: order-service
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {}
    load_assignment:
      cluster_name: order-service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: order-service.production.svc.cluster.local
                    port_value: 50051

Health Checking#

gRPC has a standard health checking protocol defined in grpc.health.v1. Implement it so load balancers and Kubernetes can probe your service.

import "google.golang.org/grpc/health"
import healthpb "google.golang.org/grpc/health/grpc_health_v1"

func main() {
    server := grpc.NewServer()

    // Register your service
    pb.RegisterOrderServiceServer(server, &orderServer{})

    // Register health service
    healthServer := health.NewServer()
    healthpb.RegisterHealthServer(server, healthServer)

    // Set service health status
    healthServer.SetServingStatus("orders.v1.OrderService", healthpb.HealthCheckResponse_SERVING)

    // ...
}

Kubernetes liveness and readiness probes using grpc-health-probe:

containers:
  - name: order-service
    ports:
      - containerPort: 50051
    livenessProbe:
      grpc:
        port: 50051
      initialDelaySeconds: 10
      periodSeconds: 10
    readinessProbe:
      grpc:
        port: 50051
      initialDelaySeconds: 5
      periodSeconds: 5

Kubernetes 1.24+ supports native gRPC probes. For older versions, use the grpc-health-probe binary in an exec probe.

Deadline Propagation#

Deadlines prevent requests from hanging indefinitely. In a microservice chain (A calls B calls C), the deadline must propagate so that downstream services know how much time remains.

// Service A: set a 5-second deadline
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

resp, err := orderClient.GetOrder(ctx, &pb.GetOrderRequest{OrderId: "123"})

// Service B: called by A, calls C -- the deadline propagates automatically
func (s *server) GetOrder(ctx context.Context, req *pb.GetOrderRequest) (*pb.Order, error) {
    // ctx already carries the deadline from the caller.
    // Check remaining time before making a downstream call.
    deadline, ok := ctx.Deadline()
    if ok && time.Until(deadline) < 100*time.Millisecond {
        return nil, status.Errorf(codes.DeadlineExceeded, "insufficient time remaining")
    }

    // This call to the inventory service inherits the deadline from ctx
    stock, err := s.inventoryClient.CheckStock(ctx, &inventorypb.CheckStockRequest{
        ProductId: req.Items[0].ProductId,
    })
    // ...
}

Rules for deadlines:

Always set a deadline on the initial call. A request without a deadline can hang forever.
Never extend a deadline downstream. If Service A gives you 5 seconds, you cannot give Service C 10 seconds.
Check remaining time before making downstream calls. If there is not enough time for the downstream call to complete, fail fast with DeadlineExceeded instead of starting a call that will time out.
Set deadlines based on SLOs. If your API must respond in 500ms, set a 500ms deadline. Downstream services should have shorter timeouts.

Error Handling#

gRPC uses a standard set of status codes. Map your application errors to the appropriate gRPC code.

gRPC Code	HTTP Equivalent	Use When
`OK`	200	Request succeeded
`InvalidArgument`	400	Client sent bad input
`NotFound`	404	Resource does not exist
`AlreadyExists`	409	Resource already exists (create conflicts)
`PermissionDenied`	403	Caller lacks permission
`Unauthenticated`	401	No valid credentials
`ResourceExhausted`	429	Rate limit exceeded or quota exhausted
`FailedPrecondition`	400	Operation rejected due to system state
`Unavailable`	503	Service temporarily unavailable (retryable)
`Internal`	500	Unexpected server error
`DeadlineExceeded`	504	Deadline expired before completion

import "google.golang.org/grpc/status"
import "google.golang.org/grpc/codes"

// Return structured errors with details
func (s *server) CreateOrder(ctx context.Context, req *pb.CreateOrderRequest) (*pb.CreateOrderResponse, error) {
    if req.CustomerId == "" {
        st := status.New(codes.InvalidArgument, "customer_id is required")
        // Attach error details for richer error information
        detailed, _ := st.WithDetails(&errdetails.BadRequest_FieldViolation{
            Field:       "customer_id",
            Description: "must be a non-empty string",
        })
        return nil, detailed.Err()
    }

    // ...
}

Error handling principles:

Use Unavailable for transient errors that clients should retry. Use Internal for permanent errors.
Never expose internal error messages to clients in production. Log the full error server-side and return a sanitized message.
Use error details (the google.rpc.Status model) to provide structured error information for programmatic handling.
Interceptors (middleware) should catch panics and convert them to Internal errors rather than crashing the connection.

Debugging with grpcurl#

grpcurl is like curl for gRPC. It requires server reflection to be enabled.

// Enable server reflection
import "google.golang.org/grpc/reflection"

server := grpc.NewServer()
reflection.Register(server)

# List services
grpcurl -plaintext localhost:50051 list

# Describe a service
grpcurl -plaintext localhost:50051 describe orders.v1.OrderService

# Call a unary method
grpcurl -plaintext -d '{"order_id": "123"}' \
  localhost:50051 orders.v1.OrderService/GetOrder

# Call with headers (metadata)
grpcurl -plaintext -H "authorization: Bearer token123" \
  -d '{"customer_id": "cust-1", "items": [{"product_id": "p1", "quantity": 2, "price_cents": 1999}]}' \
  localhost:50051 orders.v1.OrderService/CreateOrder

Disable reflection in production to avoid exposing your API surface. Use grpcurl with proto files instead:

grpcurl -proto orders/v1/orders.proto -plaintext \
  -d '{"order_id": "123"}' \
  localhost:50051 orders.v1.OrderService/GetOrder