Exec
The exec scraper runs a script and scrapes the output as configuration items. This is useful for custom integrations where you need to fetch configuration from external systems using scripts.
exec-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-scraper
spec:
schedule: "@every 1h"
exec:
- type: $.type
id: $.id
name: $.name
script: |
#!/bin/bash
curl -s https://api.example.com/configs | jq '[.[] | {id: .id, type: "API::Config", name: .name, config: .}]'
Advanced Example with Git Checkout and Python
exec-backstage-catalog.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-backstage-catalog
annotations:
trace: "true"
spec:
schedule: "@every 1h"
exec:
- type: $.type
id: $.id
name: $.name
description: $.metadata.description
script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# ]
# ///
import json
import yaml
from pathlib import Path
import sys
# Target kinds to scrape
TARGET_KINDS = ['Component', 'API', 'System', 'Domain']
entities = []
catalog_path = Path('packages/catalog-model/examples')
# Check if catalog path exists
if not catalog_path.exists():
print(f"Error: Catalog path {catalog_path} does not exist", file=sys.stderr)
sys.exit(1)
# Recursively find all YAML files
for yaml_file in catalog_path.rglob('*.yaml'):
try:
with open(yaml_file, 'r') as f:
# Handle multi-document YAML files
docs = list(yaml.safe_load_all(f))
for data in docs:
# Skip if not a valid Backstage entity
if not isinstance(data, dict) or 'kind' not in data or 'metadata' not in data:
continue
kind = data.get('kind', '')
# Filter by target kinds
if kind not in TARGET_KINDS:
continue
# Extract metadata and spec
metadata = data.get('metadata', {})
spec = data.get('spec', {})
# Build tags list from multiple sources
tags = []
# Add metadata.tags
if 'tags' in metadata and isinstance(metadata['tags'], list):
tags.extend(metadata['tags'])
# Add lifecycle as tag
if 'lifecycle' in spec:
tags.append(f"lifecycle:{spec['lifecycle']}")
# Add type as tag
if 'type' in spec:
tags.append(f"type:{spec['type']}")
# Add labels as tags (key:value format)
if 'labels' in metadata and isinstance(metadata['labels'], dict):
for key, value in metadata['labels'].items():
tags.append(f"{key}:{value}")
# Build entity object
entity = {
'id': metadata.get('name', ''),
'type': f"Backstage::{kind}",
'name': metadata.get('name', ''),
'metadata': metadata,
'spec': spec,
'tags': tags,
'source_file': str(yaml_file.relative_to(catalog_path))
}
entities.append(entity)
except Exception as e:
# Log error but continue processing other files
print(f"Error parsing {yaml_file}: {e}", file=sys.stderr)
continue
# Output JSON array
print(json.dumps(entities, indent=2))
checkout:
url: https://github.com/backstage/backstage
branch: master
depth: 1 # https://github.com/flanksource/duty/issues/1688
tags:
- name: lifecycle
jsonpath: $.spec.lifecycle
- name: type
jsonpath: $.spec.type
- name: domain
jsonpath: $.spec.domain
- name: owner
jsonpath: $.spec.owner
# properties:
# - name: type
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.type
# - name: lifecycle
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.lifecycle
# - name: source_file
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.source_file
transform:
exclude:
- jsonpath: $.tags
- jsonpath: $.source_file
relationship:
# Component/API/System/Domain -> Owner (Team)
- filter: has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# Component/System -> Domain
- filter: (config_type == 'Backstage::Component' || config_type == 'Backstage::System') && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()
# Component -> System (belongs to)
- filter: config_type == 'Backstage::Component' && has(config.spec.system)
expr: |
[{
"type": "Backstage::System",
"name": config.spec.system
}].toJSON()
# Component -> Dependencies (dependsOn)
- filter: config_type == 'Backstage::Component' && has(config.spec.dependsOn)
expr: |
config.spec.dependsOn.map(dep, {
"type": dep.startsWith("component:") ? "Backstage::Component" :
dep.startsWith("resource:") ? "Backstage::Resource" :
dep.startsWith("api:") ? "Backstage::API" :
"Backstage::Component",
"name": dep.contains(":") ? dep.split(":")[1] : dep
}).toJSON()
# Component -> APIs consumed (consumesApis / apiConsumedBy)
- filter: config_type == 'Backstage::Component' && has(config.spec.consumesApis)
expr: |
config.spec.consumesApis.map(api, {
"type": "Backstage::API",
"name": api.contains(":") ? api.split(":")[1] : api
}).toJSON()
# API -> Owner
- filter: config_type == 'Backstage::API' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# System -> Owner
- filter: config_type == 'Backstage::System' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# System -> Domain
- filter: config_type == 'Backstage::System' && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()
# Domain -> Owner
- filter: config_type == 'Backstage::Domain' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
Scraper
| Field | Description | Scheme | Required |
|---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 60 minutes. | string | |
full | Set to true to extract changes and access logs from scraped configurations. Defaults to false. | bool | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
exec | Specifies the list of Exec configurations to scrape. | []Exec | |
logLevel | Specify the level of logging. | string |
Exec
| Field | Description | Scheme |
|---|---|---|
script* | The script to execute. The script should output JSON to stdout. |
|
artifacts | Artifacts to collect after execution | []Artifact |
checkout | Checkout a git repository before running the script | |
connections | Connections for AWS/GCP/Azure credential injection | ExecConnections |
env | Environment variables to set during execution | |
setup | Install runtime dependencies before execution | |
labels | Labels for each config item. |
|
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
transform | Transform configs after they've been scraped |
Setup
The setup field allows you to automatically install runtime environments before executing your script. This is useful when your script requires specific versions of Bun or Python.
| Field | Description | Scheme |
|---|---|---|
bun | Install Bun runtime | |
python | Install Python runtime via uv |
RuntimeSetup
| Field | Description | Scheme |
|---|---|---|
version | Version to install (e.g., "1.3.5" for Bun, "3.10.19" for Python, or "latest") |
|
Python with Inline Dependencies
For Python scripts, you can declare dependencies inline using PEP 723 inline script metadata. The uv package manager is used to manage Python environments.
exec:
- script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# "requests",
# ]
# ///
import yaml
import json
# Your script logic here
print(json.dumps([{"id": "1", "type": "Example", "name": "test"}]))
setup:
python:
version: "3.10.19"
Mapping
Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items.
You can achieve this by using mappings in your custom scraper configuration.
| Field | Description | Scheme |
|---|---|---|
id* | A static value or JSONPath expression to use as the ID for the resource. |
|
name* | A static value or JSONPath expression to use as the name for the resource. |
|
type* | A static value or JSONPath expression to use as the type for the resource. |
|
class | A static value or JSONPath expression to use as the class for the resource. |
|
createFields | A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
deleteFields | A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
description | A static value or JSONPath expression to use as the description for the resource. |
|
format | Format of config item, defaults to JSON, available options are JSON, properties. See Formats |
|
health | A static value or JSONPath expression to use as the health of the config item. |
|
items | A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item. | |
status | A static value or JSONPath expression to use as the status of the config item. |
|
timestampFormat | A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339) |
|
Formats
JSON
The scraper stores config items as jsonb fields in PostgreSQL.
Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.
When you display the config, the UI automatically converts the JSON data to YAML for improved readability.
XML / Properties
The scraper stores non-JSON files as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
You can still access non-JSON content in scripts using config.content.
The UI formats and renders XML appropriately.
Full Scraper Output
When you enable full: true, custom scrapers can return complex objects containing config data, changes, access logs, and external entities.
See the Custom Scraper page for the full output schema, shorthand keys, external entity schemas, and alias resolution.