Tunnel architectuve
Overview
The tunnel is used when Legion autonomous workers and pollers need access to customer tools which can't be accessed from the public internet, either because they're internal self-hosted tools or because customer networking/security/compliance requirements need us to connect to the tools from within the customer network. In those cases our solution for enabling access to the tools it to have the customer run the Legion tunnel, that is provided as a Docker image, on compute in their internal network and the tunnel container will create a outbound networking tunnel that the workers will be able to use to send calls into the private network. Cloudflare tunnels are used as the networking infra to set up the network link over which the Legion specific logic runs (DLP data masking, etc). On Legion's backend, a dedicated proxy VM for the customer is used to route relevant traffic from the pollers and workers through the Cloudflare tunnel and into the tunnel container in the customer network.
Flow diagram
Flow high level overview
- Proxy configuration is saved for each customer in MongoDB, which indicates if/which tools should use which proxy instance for a given customer and is used to map specific tools for a specific customer to a proxy instance in Legion's backend which should be used for traffic to that tool.
- When pollers need to poll a tool and when an autonomous investigation worker is started and receives the investigation configuration, the proxy mapping is fetched from the DB and used to determine if traffic should go through the tunnel or not. Only tunnel type proxies will be covered by this explanation, and other proxy types are out of scope.
- The proxy instance in Legion's backend uses mitmproxy as a proxy server that can intercept traffic and reroute it through the tunnel. It uses a custom CA to generate leaf certificate shown to the caller to ensure HTTPs traffic will work correctly
- The tunnel container running in customer's environment contains 2 components:
- cloudflared - a Cloudflare tunnel client that establishes outbound connection to the Cloudflare servers and allows us to send calls into the network that will be routed to a fixed internal target
- FastAPI server - receives calls forwarded from the proxy instance (routed through cloudflared), sends them to the internal target, runs data masking DLP rules on the response before sending it back out to the proxy instance
- To run the image, the customer needs to set environment variables (that we provide the values for) that contain the Cloudflare tunnel access token (used by cloudflared to authenticate against Cloudflare servers), a json file containing the customer's DLP rules, and optional custom CA certificates used by the customer that the tunnel should add to its trusted certificates store
Flow detailed flow
- When the poller or worker needs to run API or browser actions on a tool that needs a tunnel proxy it will:
- Add the tunnel proxy EC2 machine (taken from environment variables) as the proxy url for Playwright/httpx
- Add the Tunnel proxy's server certificate (single certificate globally) as the trusted server certificate
- The proxy instance mitmproxy server will receive the request and:
- Pack the original request's properties (host, scheme, port, headers, etc.) into the body of a new network request that replaces the original request
- Sets the new request's properties to be a POST request to the customer's dedicated Cloudflare tunnel DNS name
- Adds service auth headers (client id and client secret) as headers to authenticate against the Cloudflare tunnel
- Assigns a request id to allow correlating logs to tunnel logs
- The cloudflare backend receives all requests to the public DNS assigned to each tunnel. For each request it will:
- Verify the client id and client secret headers in the request and the source IP match the service auth policy protecting the tunnel via Cloudflare Zero Trust, and block the request if not authorized
- Use the tunnel configuration in Cloudflare to forward the requests to the cloudflare daemon client running in the customer network matching the tunnel
- The tunnel container that customers runs in their internal network will:
- cloudflared receives the request from Cloudflare servers and uses the tunnel configuration (that it pulls from Cloudflare servers) to determine where to forward the request. We instruct cloudflare to forward all incoming requests to http://localhost:8080 which FastAPI is listening on.
- On init, the FastAPI server will read and parse the DLP rules json file, as well as the optional custom CA certificates
- When a request is received, mitmproxy will unpack the original request's properties from the body (which were added there by the tunnel proxy) and set them back to the request's metadata. It will also remove all the headers we added to the request to avoid exposing them to the customer needlessly (like the Cloudflare auth headers)
- When a response from the target internal service is returned, mitmproxy will try to read its content as text when possible (e.g not a binary stream response), and apply the DLP masking rules to the content before returning it