Three conventional discovery files served at fixed URLs every web crawler and disclosure pipeline already knows how to ask for: /robots.txt, /sitemap.xml and /.well-known/security.txt. On CodeB they are per-tenant by request host — one IIS site per tenant, one canonical host per sitemap. No caller authentication, no parameters, no cookies.
Multi-tenancy by domain. CodeB ships one IIS site per tenant FQDN (see CPaaS). The discovery files honour the request Host: header, so each tenant's crawlers see only that tenant's URLs — never a sister tenant's.
Standard robots-exclusion file (RFC 9309). The Sitemap: line is rewritten on every request to point at the same host the crawler asked on — or, if the operator has set Site:CanonicalHost in the tenant's appsettings.json, at that canonical override.
The Disallow list is host-independent — it covers backend .ashx handlers, OAuth2 path aliases, tenant-scoped WebRTC + auth-flow pages, the admin / superadmin dashboards, and PWA support files. Only the trailing Sitemap: line varies between tenants.
Diagnostic headers
X-Build-Version — handler build slug. Bumped on every behaviour change. Smoke probes assert against it.
X-Tenant — the request Host: as parsed.
X-Canonical-Host — the resolved canonical host that ended up on the Sitemap: line.
Example
curl -i https://phone.codeb.io/robots.txt
curl -i https://aloaha.com/robots.txt # different Sitemap: line
URL Rewrite serves /robots.txt from a dynamic handler under the hood. The public URL stays at /robots.txt for crawlers that only know the conventional location.
XML sitemap generated by filesystem scan on every request, scoped to the request host. Customer-facing pages at the root and under /de/ are included; admin, tenant-scoped, auth-flow and PWA-support files are filtered out by a deny list that mirrors the robots.txt Disallow set. The result is cached in-process for 60 seconds per canonical host.
<lastmod> — the HTML file's mtime, yyyy-MM-dd. Drift between disk and sitemap is bounded by the 60‑second cache.
<changefreq> + <priority> — heuristic per filename.
<xhtml:link rel="alternate"> — EN ↔ DE alternate emitted whenever a matching /de/<name> file is present on disk.
Canonical host resolution
App_Data/<host>/appsettings.json → Site:CanonicalHost wins if present. Lets an operator point a sitemap at www.example.com instead of phone.example.com.
Otherwise Request.Url.Host — the host the crawler asked on.
Example
curl -i https://phone.codeb.io/sitemap.xml
The on-disk file scan is the source of truth, not a hand-maintained sitemap list. Drop a new .html file at the root and within one minute it appears in /sitemap.xml — subject to the deny list. Conversely, anything on the deny list cannot leak into a sitemap by accident.
Need an admin endpoint? Admin-only and OIDC Bearer-gated routes
are documented inside the admin UI itself (visible only to signed-in admins
on this host). The public API set on this page is the surface you can
integrate against without provisioning a CodeB user.