Skip to main content

Exec

The exec scraper runs a script and scrapes the output as configuration items. This is useful for custom integrations where you need to fetch configuration from external systems using scripts.

exec-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-scraper
spec:
schedule: "@every 1h"
exec:
- type: $.type
id: $.id
name: $.name
script: |
#!/bin/bash
curl -s https://api.example.com/configs | jq '[.[] | {id: .id, type: "API::Config", name: .name, config: .}]'
Advanced Example with Git Checkout and Python
exec-backstage-catalog.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: exec-backstage-catalog
annotations:
trace: 'true'
spec:
schedule: '@every 1h'
exec:
- type: $.type
id: $.id
name: $.name
description: $.metadata.description
setup:
python:
version: latest
script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# ]
# ///

import json
import yaml
from pathlib import Path
import sys

# Target kinds to scrape
TARGET_KINDS = ['Component', 'API', 'System', 'Domain']

entities = []
catalog_path = Path('packages/catalog-model/examples')

# Check if catalog path exists
if not catalog_path.exists():
print(f"Error: Catalog path {catalog_path} does not exist", file=sys.stderr)
sys.exit(1)

# Recursively find all YAML files
for yaml_file in catalog_path.rglob('*.yaml'):
try:
with open(yaml_file, 'r') as f:
# Handle multi-document YAML files
docs = list(yaml.safe_load_all(f))

for data in docs:
# Skip if not a valid Backstage entity
if not isinstance(data, dict) or 'kind' not in data or 'metadata' not in data:
continue

kind = data.get('kind', '')

# Filter by target kinds
if kind not in TARGET_KINDS:
continue

# Extract metadata and spec
metadata = data.get('metadata', {})
spec = data.get('spec', {})

# Build tags list from multiple sources
tags = []

# Add metadata.tags
if 'tags' in metadata and isinstance(metadata['tags'], list):
tags.extend(metadata['tags'])

# Add lifecycle as tag
if 'lifecycle' in spec:
tags.append(f"lifecycle:{spec['lifecycle']}")

# Add type as tag
if 'type' in spec:
tags.append(f"type:{spec['type']}")

# Add labels as tags (key:value format)
if 'labels' in metadata and isinstance(metadata['labels'], dict):
for key, value in metadata['labels'].items():
tags.append(f"{key}:{value}")

# Build entity object
entity = {
'id': metadata.get('name', ''),
'type': f"Backstage::{kind}",
'name': metadata.get('name', ''),
'metadata': metadata,
'spec': spec,
'tags': tags,
'source_file': str(yaml_file.relative_to(catalog_path))
}

entities.append(entity)

except Exception as e:
# Log error but continue processing other files
print(f"Error parsing {yaml_file}: {e}", file=sys.stderr)
continue

# Output JSON array
print(json.dumps(entities, indent=2))

checkout:
url: https://github.com/backstage/backstage
branch: master
# depth: 1 # https://github.com/flanksource/duty/issues/1688

tags:
- name: lifecycle
jsonpath: $.spec.lifecycle
- name: type
jsonpath: $.spec.type
- name: domain
jsonpath: $.spec.domain
- name: owner
jsonpath: $.spec.owner

# properties:
# - name: type
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.type
# - name: lifecycle
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.spec.lifecycle
# - name: source_file
# filter: config_type startsWith 'Backstage::'
# jsonpath: $.source_file

transform:
exclude:
- jsonpath: $.tags
- jsonpath: $.source_file

relationship:
# Component/API/System/Domain -> Owner (Team)
- filter: has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()

# Component/System -> Domain
- filter: (config_type == 'Backstage::Component' || config_type == 'Backstage::System') && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()

# Component -> System (belongs to)
- filter: config_type == 'Backstage::Component' && has(config.spec.system)
expr: |
[{
"type": "Backstage::System",
"name": config.spec.system
}].toJSON()

# Component -> Dependencies (dependsOn)
- filter: config_type == 'Backstage::Component' && has(config.spec.dependsOn)
expr: |
config.spec.dependsOn.map(dep, {
"type": dep.startsWith("component:") ? "Backstage::Component" :
dep.startsWith("resource:") ? "Backstage::Resource" :
dep.startsWith("api:") ? "Backstage::API" :
"Backstage::Component",
"name": dep.contains(":") ? dep.split(":")[1] : dep
}).toJSON()

# Component -> APIs consumed (consumesApis / apiConsumedBy)
- filter: config_type == 'Backstage::Component' && has(config.spec.consumesApis)
expr: |
config.spec.consumesApis.map(api, {
"type": "Backstage::API",
"name": api.contains(":") ? api.split(":")[1] : api
}).toJSON()

# API -> Owner
- filter: config_type == 'Backstage::API' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()

# System -> Owner
- filter: config_type == 'Backstage::System' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()

# System -> Domain
- filter: config_type == 'Backstage::System' && has(config.spec.domain)
expr: |
[{
"type": "Backstage::Domain",
"name": config.spec.domain
}].toJSON()

# Domain -> Owner
- filter: config_type == 'Backstage::Domain' && has(config.spec.owner)
expr: |
[{
"type": "Backstage::Team",
"name": config.spec.owner
}].toJSON()

Scraper

FieldDescriptionSchemeRequired
scheduleSpecify the interval to scrape in cron format. Defaults to every 60 minutes.string
fullSet to true to extract changes and access logs from scraped configurations. Defaults to false.bool
retentionSettings for retaining changes, analysis and scraped itemsRetention
execSpecifies the list of Exec configurations to scrape.[]Exec
logLevelSpecify the level of logging.string

Exec

FieldDescriptionScheme
script*

The script to execute. The script should output JSON to stdout.

string

artifacts

Artifacts to collect after execution

[]Artifact

checkout

Checkout a git repository before running the script

GitConnection

connections

Connections for AWS/GCP/Azure credential injection

ExecConnections

env

Environment variables to set during execution

[]EnvVar

setup

Install runtime dependencies before execution

Setup

labels

Labels for each config item.

map[string]string

properties

Custom templatable properties for the scraped config items.

[]ConfigProperty

tags

Tags for each config item. Max allowed: 5

[]ConfigTag

transform

Transform configs after they've been scraped

Transform

Setup

The setup field allows you to automatically install runtime environments before executing your script. This is useful when your script requires specific versions of Bun or Python.

FieldDescriptionScheme
bun

Install Bun runtime

RuntimeSetup

python

Install Python runtime via uv

RuntimeSetup

RuntimeSetup

FieldDescriptionScheme
version

Version to install (e.g., "1.3.5" for Bun, "3.10.19" for Python, or "latest")

string

Python with Inline Dependencies

For Python scripts, you can declare dependencies inline using PEP 723 inline script metadata. The uv package manager is used to manage Python environments.

exec:
- script: |
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyyaml",
# "requests",
# ]
# ///

import yaml
import json
# Your script logic here
print(json.dumps([{"id": "1", "type": "Example", "name": "test"}]))
setup:
python:
version: "3.10.19"

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

FieldDescriptionScheme
id*

A static value or JSONPath expression to use as the ID for the resource.

string or JSONPath

name*

A static value or JSONPath expression to use as the name for the resource.

string or JSONPath

type*

A static value or JSONPath expression to use as the type for the resource.

string or JSONPath

class

A static value or JSONPath expression to use as the class for the resource.

string or JSONPath

createFields

A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

deleteFields

A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

description

A static value or JSONPath expression to use as the description for the resource.

string or JSONPath

format

Format of config item, defaults to JSON, available options are JSON, properties. See Formats

string

health

A static value or JSONPath expression to use as the health of the config item.

string or JSONPath

items

A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.

JSONPath

status

A static value or JSONPath expression to use as the status of the config item.

string or JSONPath

timestampFormat

A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)

string

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.