Exec
The exec scraper runs a script and scrapes the output as configuration items. This is useful for custom integrations where you need to fetch configuration from external systems using scripts.
exec-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-scraper
spec:
schedule: "@every 1h"
exec:
- type: $.type
id: $.id
name: $.name
script: |
#!/bin/bash
curl -s https://api.example.com/configs | jq '[.[] | {id: .id, type: "API::Config", name: .name, config: .}]'
Advanced Example with Git Checkout and Python
exec-backstage-catalog.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-backstage-catalog
annotations:
trace: 'true'
spec:
schedule: '@every 1h'
exec:
- type: $.type
id: $.id
name: $.name
description: $.metadata.description
setup:
python:
version: latest
script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# ]
# ///
import json
import yaml
from pathlib import Path
import sys
# Target kinds to scrape
TARGET_KINDS = ['Component', 'API', 'System', 'Domain']
entities = []
catalog_path = Path('packages/catalog-model/examples')
# Check if catalog path exists
if not catalog_path.exists():
print(f"Error: Catalog path {catalog_path} does not exist", file=sys.stderr)
sys.exit(1)
# Recursively find all YAML files
for yaml_file in catalog_path.rglob('*.yaml'):
try:
with open(yaml_file, 'r') as f:
# Handle multi-document YAML files
docs = list(yaml.safe_load_all(f))
for data in docs:
# Skip if not a valid Backstage entity
if not isinstance(data, dict) or 'kind' not in data or 'metadata' not in data:
continue
kind = data.get('kind', '')
# Filter by target kinds
if kind not in TARGET_KINDS:
continue
# Extract metadata and spec
metadata = data.get('metadata', {})
spec = data.get('spec', {})
# Build tags list from multiple sources
tags = []
# Add metadata.tags
if 'tags' in metadata and isinstance(metadata['tags'], list):
tags.extend(metadata['tags'])
# Add lifecycle as tag
if 'lifecycle' in spec:
tags.append(f"lifecycle:{spec['lifecycle']}")
# Add type as tag
if 'type' in spec:
tags.append(f"type:{spec['type']}")
# Add labels as tags (key:value format)
if 'labels' in metadata and isinstance(metadata['labels'], dict):
for key, value in metadata['labels'].items():
tags.append(f"{key}:{value}")
# Build entity object
entity = {
'id': metadata.get('name', ''),
'type': f"Backstage::{kind}",
'name': metadata.get('name', ''),
'metadata': metadata,
'spec': spec,
'tags': tags,
'source_file': str(yaml_file.relative_to(catalog_path))
}
entities.append(entity)
except Exception as e:
# Log error but continue processing other files
print(f"Error parsing {yaml_file}: {e}", file=sys.stderr)
continue
# Output JSON array
print(json.dumps(entities, indent=2))
checkout:
url: https://github.com/backstage/backstage
branch: master
# depth: 1 # https://github.com/flanksource/duty/issues/1688
tags:
- name: lifecycle
jsonpath: $.spec.lifecycle
- name: type
jsonpath: $.spec.type
- name: domain
jsonpath: $.spec.domain
- name: owner
jsonpath: $.spec.owner
# properties:
# - name: type
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.type
# - name: lifecycle
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.lifecycle
# - name: source_file
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.source_file
transform:
exclude:
- jsonpath: $.tags
- jsonpath: $.source_file
relationship:
# Component/API/System/Domain -> Owner (Team)
- filter: has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# Component/System -> Domain
- filter: (config_type == 'Backstage::Component' || config_type == 'Backstage::System') && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()
# Component -> System (belongs to)
- filter: config_type == 'Backstage::Component' && has(config.spec.system)
expr: |
[{
"type": "Backstage::System",
"name": config.spec.system
}].toJSON()
# Component -> Dependencies (dependsOn)
- filter: config_type == 'Backstage::Component' && has(config.spec.dependsOn)
expr: |
config.spec.dependsOn.map(dep, {
"type": dep.startsWith("component:") ? "Backstage::Component" :
dep.startsWith("resource:") ? "Backstage::Resource" :
dep.startsWith("api:") ? "Backstage::API" :
"Backstage::Component",
"name": dep.contains(":") ? dep.split(":")[1] : dep
}).toJSON()
# Component -> APIs consumed (consumesApis / apiConsumedBy)
- filter: config_type == 'Backstage::Component' && has(config.spec.consumesApis)
expr: |
config.spec.consumesApis.map(api, {
"type": "Backstage::API",
"name": api.contains(":") ? api.split(":")[1] : api
}).toJSON()
# API -> Owner
- filter: config_type == 'Backstage::API' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# System -> Owner
- filter: config_type == 'Backstage::System' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
# System -> Domain
- filter: config_type == 'Backstage::System' && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()
# Domain -> Owner
- filter: config_type == 'Backstage::Domain' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()
Scraper
| Field | Description | Scheme | Required |
|---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 60 minutes. | string | |
full | Set to true to extract changes and access logs from scraped configurations. Defaults to false. | bool | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
exec | Specifies the list of Exec configurations to scrape. | []Exec | |
logLevel | Specify the level of logging. | string |
Exec
| Field | Description | Scheme |
|---|---|---|
script* | The script to execute. The script should output JSON to stdout. |
|
artifacts | Artifacts to collect after execution | []Artifact |
checkout | Checkout a git repository before running the script | |
connections | Connections for AWS/GCP/Azure credential injection | ExecConnections |
env | Environment variables to set during execution | |
setup | Install runtime dependencies before execution | |
labels | Labels for each config item. |
|
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
transform | Transform configs after they've been scraped |
Setup
The setup field allows you to automatically install runtime environments before executing your script. This is useful when your script requires specific versions of Bun or Python.
| Field | Description | Scheme |
|---|---|---|
bun | Install Bun runtime | |
python | Install Python runtime via uv |
RuntimeSetup
| Field | Description | Scheme |
|---|---|---|
version | Version to install (e.g., "1.3.5" for Bun, "3.10.19" for Python, or "latest") |
|
Python with Inline Dependencies
For Python scripts, you can declare dependencies inline using PEP 723 inline script metadata. The uv package manager is used to manage Python environments.
exec:
- script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# "requests",
# ]
# ///
import yaml
import json
# Your script logic here
print(json.dumps([{"id": "1", "type": "Example", "name": "test"}]))
setup:
python:
version: "3.10.19"
Mapping
Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items.
You can achieve this by using mappings in your custom scraper configuration.
| Field | Description | Scheme |
|---|---|---|
id* | A static value or JSONPath expression to use as the ID for the resource. |
|
name* | A static value or JSONPath expression to use as the name for the resource. |
|
type* | A static value or JSONPath expression to use as the type for the resource. |
|
class | A static value or JSONPath expression to use as the class for the resource. |
|
createFields | A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
deleteFields | A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
description | A static value or JSONPath expression to use as the description for the resource. |
|
format | Format of config item, defaults to JSON, available options are JSON, properties. See Formats |
|
health | A static value or JSONPath expression to use as the health of the config item. |
|
items | A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item. | |
status | A static value or JSONPath expression to use as the status of the config item. |
|
timestampFormat | A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339) |
|
Formats
JSON
The scraper stores config items as jsonb fields in PostgreSQL.
Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.
When you display the config, the UI automatically converts the JSON data to YAML for improved readability.
XML / Properties
The scraper stores non-JSON files as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
You can still access non-JSON content in scripts using config.content.
The UI formats and renders XML appropriately.
Extracting Changes & Access Logs
When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.