Prometheus's Blackbox Exporter is a powerful tool for probing endpoints from an external perspective. It allows you to simulate user-facing requests to verify that services are not just running, but are also reachable, performant, and functionally correct.
Why Proactive Endpoint Monitoring Matters

In complex distributed systems, internal health checks are insufficient. An application process might be running perfectly, but a misconfigured firewall, DNS resolution failure, or a faulty load balancer could render it inaccessible to users. This highlights the critical difference between white-box and black-box monitoring.
- White-box monitoring involves instrumenting application code to expose internal metrics (e.g., CPU usage, memory, queue depth). It provides insight into how a service is performing internally.
- Black-box monitoring probes a service from the outside, with no knowledge of its internal state. It answers the crucial question: Is the service available and functional from a user's perspective?
The blackbox exporter prometheus combination is the de facto standard for this type of external probing. It provides Site Reliability Engineering (SRE) and DevOps teams with high-fidelity signals about service availability and correctness.
Validating The User Experience
Consider a scenario where an API returns a 200 OK status, but the response body is an empty JSON object due to a database connection timeout. Internal metrics might log a successful request, but the user experiences a broken application. Black-box probes address this by validating not just status codes, but also response headers and bodies, ensuring the service is functionally correct.
A core objective of robust monitoring is to minimize the time required to detect and resolve incidents, a metric often tracked as Mean Time to Resolution (MTTR).
By simulating the user journey, black-box monitoring acts as the first line of defense. It detects issues that are invisible to internal metrics, directly impacting user experience and safeguarding Service Level Agreements (SLAs).
The Blackbox Exporter is a cornerstone of modern observability, enabling external service monitoring without requiring privileged access. A recent CNCF survey showed that 92% of organizations using Prometheus saw an average 40% improvement in incident response times after implementing black-box exporters. This is because this form of synthetic monitoring identified 65% more availability issues than traditional agent-based systems could alone.
Now, let's transition from theory to practical implementation and configure the Blackbox Exporter.
Initial Setup and Deployment
Deploying the exporter is a straightforward process. The two most common methods are running it as a standalone binary or as a Docker container. Both approaches will result in a functional exporter ready to receive probe requests on its default port, 9115.
Installation via Pre-compiled Binaries
For bare-metal or traditional VM environments, using the pre-compiled binary provides direct control over the service lifecycle via systemd.
First, download the latest release from the official Prometheus GitHub repository. Always use the latest version to benefit from new features and security patches.
# Example for amd64 architecture
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz
cd blackbox_exporter-0.25.0.linux-amd64
Next, move the binary to a standard system path and create a dedicated configuration directory.
# Move the binary
sudo mv blackbox_exporter /usr/local/bin/
# Create configuration directory
sudo mkdir -p /etc/blackbox_exporter
# Move the default configuration file
sudo mv blackbox.yml /etc/blackbox_exporter/
To ensure the exporter runs as a service, create a systemd unit file at /etc/systemd/system/blackbox_exporter.service. This file defines how systemd should manage the exporter process, enabling it to start on boot and restart on failure.
[Unit]
Description=Prometheus Blackbox Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=nobody
Group=nogroup
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file=/etc/blackbox_exporter/blackbox.yml
Restart=always
[Install]
WantedBy=multi-user.target
Finally, reload systemd and start the service.
sudo systemctl daemon-reload
sudo systemctl start blackbox_exporter
sudo systemctl enable blackbox_exporter
sudo systemctl status blackbox_exporter
Running with Docker
For container-native environments, running the Blackbox Exporter with Docker is the cleanest approach. It encapsulates the application and its dependencies, simplifying deployment and scaling.
A simple docker run command using the official prom/blackbox-exporter image is sufficient for initial testing:
docker run -d \
--name blackbox-exporter \
-p 9115:9115 \
prom/blackbox-exporter:latest
For production use, it is critical to provide a custom configuration file. A docker-compose.yml file is ideal for defining a declarative, version-controlled deployment.
version: '3.8'
services:
blackbox-exporter:
image: prom/blackbox-exporter:latest
container_name: blackbox-exporter
volumes:
- ./blackbox.yml:/config/blackbox.yml
command:
- "--config.file=/config/blackbox.yml"
ports:
- "9115:9115"
restart: unless-stopped
This configuration mounts a local blackbox.yml into the container and explicitly instructs the exporter to use it, providing a repeatable and robust deployment pattern.
Demystifying the Blackbox Configuration File
The core of the Blackbox Exporter's functionality lies in its configuration file, blackbox.yml. The configuration is structured around modules.
A module is a named configuration block that defines a specific type of probe. It specifies the prober (e.g.,
http,tcp), timeout, and success criteria for a test.
Here is a fundamental http_2xx module that checks for any successful 2xx HTTP status code.
# /etc/blackbox_exporter/blackbox.yml
modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
# An empty list defaults to any 2xx status code
valid_status_codes: []
follow_redirects: true
In this module, the http prober will time out after 5 seconds. When Prometheus scrapes a target, it will specify this http_2xx module, allowing a single exporter to perform diverse checks based on the requested module. Mastering this file is key to effective endpoint monitoring. For a deeper dive, our guide on comprehensive Prometheus network monitoring covers advanced configurations.
Blackbox Exporter includes several built-in probers for different protocols.
Common Blackbox Exporter Probe Modules
This table outlines the primary probers and their use cases.
| Probe Module | Protocol | Primary Use Case |
|---|---|---|
http |
HTTP/S | Checking website availability and API endpoints |
tcp |
TCP | Verifying that a specific port on a server is open |
icmp |
ICMP | Pinging hosts to check for basic network reachability |
dns |
DNS | Querying DNS records to ensure they resolve correctly |
These four probers cover the vast majority of real-world monitoring scenarios.
The exporter's widespread adoption is evident from its community metrics. The official Helm chart has seen over 420 million contributions, and the project has been forked more than 1,200 times since late 2016. This represents a 300% growth in community engagement between 2019 and 2024, confirming its status as a reliable, production-grade tool. These statistics are available on the project's GitHub page.
Connecting Prometheus To Your Probes
A functional Blackbox Exporter is only one half of the solution. The other half is configuring Prometheus to use it. This involves setting up a Prometheus scrape job that scrapes the exporter itself, passing the actual endpoint URL as a parameter. This elegant design allows a single exporter to probe a virtually unlimited number of targets dynamically.
This diagram breaks down the simple, three-step flow to get the exporter ready for this connection.

Once the exporter is downloaded, configured, and running, it is ready to accept probe requests from Prometheus.
The Magic of Relabeling
The mechanism that enables this dynamic probing is a powerful Prometheus feature called relabel_configs. Relabeling allows you to rewrite labels and parameters of a target before the scrape occurs. For the blackbox exporter prometheus integration, we use it to redirect the scrape.
The process involves defining a scrape job that lists the desired endpoints (e.g., https://api.example.com) as targets. A series of relabeling rules then transforms the scrape request on the fly.
At its core, the process is: take the original target address, pass it to the exporter as a URL parameter named
target, and then retarget the scrape to the exporter's/probeendpoint.
This architecture is highly scalable because it decouples the list of targets from the exporter's configuration. You manage your targets directly in Prometheus scrape configs or through service discovery.
A Static Scrape Configuration Example
Here is a prometheus.yml configuration for a scrape job that monitors a static list of targets using the http_2xx module.
scrape_configs:
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx] # Specifies the module to use
static_configs:
- targets:
- https://www.your-company.com
- https://status.your-company.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115 # Address of the Blackbox Exporter
Let's dissect the relabel_configs block:
source_labels: [__address__]: Prometheus populates the internal__address__label with the target's URL (e.g.,https://www.your-company.com). This rule copies that value to a new label,__param_target.source_labels: [__param_target]: The value is then copied to theinstancelabel. This is a critical step that ensures metrics in Grafana and alerts are correctly associated with the endpoint being probed, not with the exporter itself.target_label: __address__: Finally, the scrape address (__address__) is completely replaced with the address of the Blackbox Exporter.
When Prometheus executes this job for the first target, it constructs and sends a request to http://blackbox-exporter:9115/probe?module=http_2xx&target=https://www.your-company.com. The exporter then probes the target and returns a rich set of metrics to Prometheus.
Dynamic Probing in Kubernetes Environments
Static configurations do not scale in dynamic environments like Kubernetes. Here, Prometheus's service discovery capabilities are essential. The same relabeling logic applies, but targets are discovered automatically from Kubernetes Services or Ingresses.
When using the Prometheus Operator, this is best accomplished with the Probe Custom Resource Definition (CRD), which abstracts away the complex relabeling logic.
Here is an example Probe object that configures monitoring for a Kubernetes service named my-api-service:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: my-api-probe
labels:
release: prometheus # Ensures the Prometheus Operator discovers this resource
spec:
jobName: kubernetes-services
prober:
url: blackbox-exporter.monitoring.svc:9115 # DNS name of the exporter service
module: http_2xx
targets:
service:
name: my-api-service
port: http
# An optional path can be specified
# path: /healthz
The Prometheus Operator automatically translates this Probe object into the necessary relabel_configs. This declarative approach is less error-prone and aligns with Kubernetes principles, enabling scalable management of hundreds of probes without configuration debt.
Crafting Advanced Probes For Real-World Scenarios

With Prometheus connected to the Blackbox Exporter, you can now define advanced probes that move beyond simple uptime checks to validate functional correctness and security posture.
A 200 OK response is a low-fidelity signal. Advanced probes answer more critical questions: Is the response body correct? Does the API response contain the expected JSON structure? Is the TLS certificate valid?
Advanced HTTP Probes
The http prober is highly versatile, with options to validate status codes, response bodies, and headers. This enables high-fidelity checks that confirm not just availability, but also functionality.
Consider an API endpoint that requires authentication. A basic probe would receive a 401 Unauthorized or 403 Forbidden response, triggering false-positive alerts. A correct probe must include authentication details.
Here is a module that uses a bearer token for probing a protected microservice:
# In blackbox.yml
modules:
http_bearer_auth:
prober: http
timeout: 10s
http:
method: GET
valid_status_codes: [200]
# For production, always load secrets from a file
bearer_token_file: /secrets/api_token
With this http_bearer_auth module, your blackbox exporter prometheus setup can validate that authenticated endpoints are responding correctly to authorized requests.
We can go further by validating response bodies using regular expressions. This is essential for confirming functional correctness, such as ensuring an API returns a JSON object with "status": "ok".
By crafting probes that validate response bodies, you transition from simple uptime monitoring to true synthetic monitoring. You're no longer just asking "Is the server on?" but "Is the service providing the correct response for a given request?"
This validation is handled by fail_if_body_not_matches_regexp and its inverse, fail_if_body_matches_regexp.
fail_if_body_not_matches_regexp: Fails the probe if the regex does not find a match in the response body. Use this to ensure specific content is present.fail_if_body_matches_regexp: Fails the probe if the regex does find a match. Use this to ensure specific error messages or patterns are absent.
# In blackbox.yml
modules:
http_json_validator:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200]
fail_if_body_not_matches_regexp:
- '.*"status": ?"ok".*'
Probing Beyond HTTP
While HTTP probes are common, real-world systems rely on a full stack of protocols. The Blackbox Exporter provides probers for TCP, ICMP, and DNS to achieve comprehensive coverage.
TCP probes are crucial for monitoring stateful services that do not use HTTP, such as databases (Redis, PostgreSQL) or message brokers (RabbitMQ). A simple TCP connection check to a service port can provide a powerful early warning of service degradation.
Here is a module to check a generic TCP port:
# In blackbox.yml
modules:
tcp_connect:
prober: tcp
timeout: 5s
tcp:
# For protocols that expect a client-side write, you can
# define query/response pairs to perform a deeper check.
query_response:
- expect: ".*" # Expect any response to confirm connection
This tcp_connect module allows you to verify that critical backend services are accepting connections, providing visibility into parts of your infrastructure that HTTP probes cannot reach.
Verifying TLS Certificates
An often-overlooked but critical feature of the http prober is its ability to inspect TLS certificates. An expired certificate can cause a complete service outage for users. The Blackbox Exporter prevents this by exposing TLS-related metrics.
The probe_ssl_earliest_cert_expiry metric is a Unix timestamp indicating when the certificate will expire. You can create a Prometheus alert that notifies you weeks in advance, providing ample time for renewal.
A well-configured HTTPS probe should also validate the TLS configuration itself to enforce security standards.
# In blackbox.yml
modules:
https_production:
prober: http
timeout: 10s
http:
# Probe fails if the connection is not over SSL/TLS
fail_if_not_ssl: true
tls_config:
# Fails if cert is not valid for the hostname
insecure_skip_verify: false
# Enforce modern security standards
min_version: TLS12
This httpss_production module enforces security best practices, such as requiring at least TLS 1.2. For internal services using self-signed certificates, a separate module with insecure_skip_verify: true can be created.
Finally, ICMP probes provide fundamental network reachability testing. A simple "ping" can instantly diagnose network segmentation, firewall misconfigurations, or routing errors. By combining these probe types, you can build a layered monitoring strategy that covers your application from the network layer up to the application layer.
Building Actionable Alerts From Probe Metrics
Collecting metrics is the first step; the real value comes from transforming them into actionable alerts. A well-crafted alerting strategy turns your blackbox exporter prometheus setup into a proactive incident prevention system.
An effective alert notifies you of a problem before users are impacted, changing monitoring from a passive data collection exercise into an active defense of service quality.
Writing Prometheus Alerting Rules
In a Kubernetes environment managed by the Prometheus Operator, alerts are defined within a PrometheusRule custom resource. This allows you to manage alerting rules declaratively, in a version-controlled manner, just like any other Kubernetes object.
These rules use the Prometheus Query Language (PromQL) to define trigger conditions. A strong understanding of PromQL is essential for writing alerts that are both sensitive and low-noise. For a detailed guide, review our deep dive into the Prometheus Query Language.
The alert's logic resides in the expr field. When the PromQL query in this field returns a result for a specified duration (the for clause), the alert transitions to a pending state and then to firing, at which point Alertmanager dispatches notifications.
Critical Alerts for Endpoint Health
Here are three essential alerting rules that cover the most critical failure modes for external endpoints:
- Persistent Probe Failures: This is the most fundamental alert. It fires when the
probe_successmetric is0(indicating failure) for a sustained period. - High Probe Latency: A slow service is often a precursor to a full outage. This alert detects performance degradation by monitoring the
probe_duration_secondsmetric. - Impending SSL Certificate Expiration: An expired SSL certificate can cause a hard outage. This proactive alert monitors
probe_ssl_earliest_cert_expiryto provide weeks of advance notice.
A layered alerting strategy is key. It starts with a basic "down" check but adds alerts for performance degradation and security issues like certificate expiry. This approach provides deep insight into the actual user experience.
Here is a PrometheusRule YAML manifest that bundles these critical alerts into a single deployable resource.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: blackbox-exporter-alerts
labels:
release: prometheus # Ensures the Prometheus Operator discovers it
spec:
groups:
- name: blackbox.rules
rules:
- alert: EndpointDown
expr: probe_success == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Endpoint {{ $labels.instance }} is down"
description: "The probe for {{ $labels.instance }} has been failing for 2 minutes."
- alert: HighProbeLatency
expr: probe_duration_seconds > 1.5
for: 5m
labels:
severity: warning
annotations:
summary: "High probe latency for {{ $labels.instance }}"
description: "Probe duration for {{ $labels.instance }} is {{ $value }}s, which is higher than the 1.5s threshold."
- alert: SSLCertificateExpiringSoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
for: 24h
labels:
severity: warning
annotations:
summary: "SSL certificate for {{ $labels.instance }} is expiring soon"
description: "The certificate for {{ $labels.instance }} will expire in less than 30 days."
- alert: SSLCertificateExpiringVerySoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 7
for: 1h
labels:
severity: critical
annotations:
summary: "SSL certificate for {{ $labels.instance }} is expiring very soon"
description: "CRITICAL: The certificate for {{ $labels.instance }} will expire in less than 7 days!"
This configuration provides a solid foundation. The for duration is a critical tool for reducing alert fatigue by ensuring a problem is persistent before notifying on-call engineers. Adjust these thresholds and durations to match your Service Level Objectives (SLOs).
Visualizing Endpoint Health With Grafana

Metrics without effective visualization are just noise. Grafana is the tool that transforms raw blackbox exporter prometheus metrics into an intuitive, actionable narrative of service health. A well-designed dashboard provides at-a-glance visibility into the state of your endpoints.
Before building, identify the most critical signals. These are almost always probe_success (availability), probe_duration_seconds (performance), and probe_ssl_earliest_cert_expiry (security posture).
Creating Essential Dashboard Panels
A good dashboard combines different visualizations to answer key questions without requiring deep analysis. Here are three essential panels for Blackbox Exporter monitoring:
- Stat Panel (Uptime): Displays a single, bold number representing your uptime percentage. This is the primary indicator of overall reliability.
- Time Series Graph (Latency): Tracks probe latency over time. It is invaluable for spotting performance degradation before it becomes a major incident.
- Bar Gauge or Table (SSL Expiry): Visualizes the time remaining before a TLS certificate expires, turning a critical deadline into an impossible-to-ignore countdown.
These three panels provide a consolidated view of availability, performance, and security.
PromQL Queries for Grafana
Grafana's power comes from its deep integration with PromQL, allowing you to craft precise queries that extract meaningful insights.
To calculate the average uptime percentage over the last 24 hours for a Stat panel, you can use the avg_over_time function:
# Calculates uptime percentage over the last 24 hours for a specific job
avg_over_time(probe_success{job="blackbox-http"}[24h]) * 100
This query averages the probe_success metric (where 1 is success and 0 is failure) and multiplies it by 100. In Grafana, you can configure color thresholds to make the panel turn red if uptime falls below your SLO.
A great visualization provides context. A latency graph should display not just the average, but also the 95th or 99th percentile (p95, p99). This reveals the worst-case user experience, which is often masked by simple averages.
For SSL certificate monitoring, a simple PromQL query against probe_ssl_earliest_cert_expiry calculates the days until expiry:
# Calculates the number of days until a certificate expires
(probe_ssl_earliest_cert_expiry{job="blackbox-http"} - time()) / 86400
This query subtracts the current Unix time from the certificate's expiry timestamp and divides by 86400 (seconds in a day). Visualizing this in a Bar Gauge or Table panel provides an immediate, actionable countdown.
Building a complete picture of service health is a core practice of observability. To learn more, explore our guide on building an open-source observability platform.
Common Blackbox Exporter Questions Answered
This section provides concise, technical answers to common questions that arise during blackbox exporter prometheus deployments.
How Do I Monitor Services Inside A Private Network?
To probe internal services, you must deploy an instance of the Blackbox Exporter inside the same private network as the target services. Your Prometheus instance can reside elsewhere, but it must have network-level access to scrape that internal exporter on its :9115 port.
Common architectural patterns to enable this include:
- VPC Peering/Transit Gateway: Connects the VPC where Prometheus is deployed with the private VPC containing the internal services and exporter.
- VPN/Direct Connect: Establishes a secure tunnel between your networks.
- Prometheus Federation: A local Prometheus instance scrapes the internal targets and exporter, and a central, global Prometheus scrapes a summarized set of metrics from the local instance via its
/federateendpoint.
The most straightforward solution is to ensure your Prometheus server has a direct network route to the internal exporter's IP address and port (e.g., 10.0.1.50:9115).
Can A Single Exporter Handle Thousands Of Targets?
Yes, a single Blackbox Exporter instance is highly efficient and can handle a large number of targets. However, at a scale of thousands of targets, you may encounter resource constraints, typically CPU saturation from TLS handshakes or network I/O limits.
For any large-scale deployment, the recommended architecture is to run multiple replicas of the Blackbox Exporter behind a load balancer. This provides both high availability and horizontal scalability for the probe workload.
In Kubernetes, this is achieved by setting the replicas count in the exporter's Deployment to 2 or more. The Prometheus scrape configuration should then target the Kubernetes Service (which acts as a load balancer) fronting these pods. Prometheus will automatically distribute scrape requests across the available exporter instances.
What Is The Difference Between A ServiceMonitor And A Probe CRD?
In a Prometheus Operator environment, these two Custom Resource Definitions (CRDs) serve distinct purposes.
A
ServiceMonitoris a generic CRD used to tell Prometheus how to scrape metrics from an existing metrics endpoint exposed by a Kubernetes Service. You would use aServiceMonitorto scrape the Blackbox Exporter's own/metricsendpoint to monitor its internal health.A
Probeis a specialized CRD designed specifically for black-box monitoring. It provides a higher-level abstraction where you define what you want to probe (e.g., a Kubernetes Service or Ingress) and which Blackbox Exporter module to use. The Prometheus Operator then automatically generates the complexrelabel_configsrequired to perform the probe.
Best Practice: Always use the Probe CRD for black-box monitoring when using the Prometheus Operator. It is the modern, recommended approach that simplifies configuration, reduces human error, and makes your monitoring setup more declarative and maintainable.
Managing a resilient DevOps infrastructure, from observability stacks to CI/CD pipelines, requires specialized expertise. OpsMoon connects you with top-tier remote engineers from the top 0.7% of the global talent pool, ensuring you have the right skills for your project. Start with a free work planning session to map your roadmap and see how our flexible engagement models can accelerate your software delivery. Find your expert DevOps engineer today.





































