Exec

The exec scraper runs a script and scrapes the output as configuration items. This is useful for custom integrations where you need to fetch configuration from external systems using scripts.

exec-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: exec-scraper
spec:
  schedule: "@every 1h"
  exec:
    - type: $.type
      id: $.id
      name: $.name
      script: |
        #!/bin/bash
        curl -s https://api.example.com/configs | jq '[.[] | {id: .id, type: "API::Config", name: .name, config: .}]'

Advanced Example with Git Checkout and Python

exec-backstage-catalog.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: exec-backstage-catalog
  annotations:
    trace: "true"
spec:
  schedule: "@every 1h"
  exec:
    - type: $.type
      id: $.id
      name: $.name
      description: $.metadata.description
      script: |
        #!/usr/bin/env python3
        # /// script
        # dependencies = [
        #   "pyyaml",
        # ]
        # ///

        import json
        import yaml
        from pathlib import Path
        import sys

        # Target kinds to scrape
        TARGET_KINDS = ['Component', 'API', 'System', 'Domain']

        entities = []
        catalog_path = Path('packages/catalog-model/examples')

        # Check if catalog path exists
        if not catalog_path.exists():
            print(f"Error: Catalog path {catalog_path} does not exist", file=sys.stderr)
            sys.exit(1)

        # Recursively find all YAML files
        for yaml_file in catalog_path.rglob('*.yaml'):
            try:
                with open(yaml_file, 'r') as f:
                    # Handle multi-document YAML files
                    docs = list(yaml.safe_load_all(f))

                for data in docs:
                    # Skip if not a valid Backstage entity
                    if not isinstance(data, dict) or 'kind' not in data or 'metadata' not in data:
                        continue

                    kind = data.get('kind', '')

                    # Filter by target kinds
                    if kind not in TARGET_KINDS:
                        continue

                    # Extract metadata and spec
                    metadata = data.get('metadata', {})
                    spec = data.get('spec', {})

                    # Build tags list from multiple sources
                    tags = []

                    # Add metadata.tags
                    if 'tags' in metadata and isinstance(metadata['tags'], list):
                        tags.extend(metadata['tags'])

                    # Add lifecycle as tag
                    if 'lifecycle' in spec:
                        tags.append(f"lifecycle:{spec['lifecycle']}")

                    # Add type as tag
                    if 'type' in spec:
                        tags.append(f"type:{spec['type']}")

                    # Add labels as tags (key:value format)
                    if 'labels' in metadata and isinstance(metadata['labels'], dict):
                        for key, value in metadata['labels'].items():
                            tags.append(f"{key}:{value}")

                    # Build entity object
                    entity = {
                        'id': metadata.get('name', ''),
                        'type': f"Backstage::{kind}",
                        'name': metadata.get('name', ''),
                        'metadata': metadata,
                        'spec': spec,
                        'tags': tags,
                        'source_file': str(yaml_file.relative_to(catalog_path))
                    }

                    entities.append(entity)

            except Exception as e:
                # Log error but continue processing other files
                print(f"Error parsing {yaml_file}: {e}", file=sys.stderr)
                continue

        # Output JSON array
        print(json.dumps(entities, indent=2))

      checkout:
        url: https://github.com/backstage/backstage
        branch: master
        depth: 1 # https://github.com/flanksource/duty/issues/1688

      tags:
        - name: lifecycle
          jsonpath: $.spec.lifecycle
        - name: type
          jsonpath: $.spec.type
        - name: domain
          jsonpath: $.spec.domain
        - name: owner
          jsonpath: $.spec.owner

      # properties:
      #   - name: type
      #     filter: config_type startsWith 'Backstage::'
      #     jsonpath: $.spec.type
      #   - name: lifecycle
      #     filter: config_type startsWith 'Backstage::'
      #     jsonpath: $.spec.lifecycle
      #   - name: source_file
      #     filter: config_type startsWith 'Backstage::'
      #     jsonpath: $.source_file

      transform:
        exclude:
          - jsonpath: $.tags
          - jsonpath: $.source_file

        relationship:
          # Component/API/System/Domain -> Owner (Team)
          - filter: has(config.spec.owner)
            expr: |
              [{
                "type": "Backstage::Team",
                "name": config.spec.owner
              }].toJSON()

          # Component/System -> Domain
          - filter: (config_type == 'Backstage::Component' || config_type == 'Backstage::System') && has(config.spec.domain)
            expr: |
              [{
                "type": "Backstage::Domain",
                "name": config.spec.domain
              }].toJSON()

          # Component -> System (belongs to)
          - filter: config_type == 'Backstage::Component' && has(config.spec.system)
            expr: |
              [{
                "type": "Backstage::System",
                "name": config.spec.system
              }].toJSON()

          # Component -> Dependencies (dependsOn)
          - filter: config_type == 'Backstage::Component' && has(config.spec.dependsOn)
            expr: |
              config.spec.dependsOn.map(dep, {
                "type": dep.startsWith("component:") ? "Backstage::Component" :
                        dep.startsWith("resource:") ? "Backstage::Resource" :
                        dep.startsWith("api:") ? "Backstage::API" :
                        "Backstage::Component",
                "name": dep.contains(":") ? dep.split(":")[1] : dep
              }).toJSON()

          # Component -> APIs consumed (consumesApis / apiConsumedBy)
          - filter: config_type == 'Backstage::Component' && has(config.spec.consumesApis)
            expr: |
              config.spec.consumesApis.map(api, {
                "type": "Backstage::API",
                "name": api.contains(":") ? api.split(":")[1] : api
              }).toJSON()

          # API -> Owner
          - filter: config_type == 'Backstage::API' && has(config.spec.owner)
            expr: |
              [{
                "type": "Backstage::Team",
                "name": config.spec.owner
              }].toJSON()

          # System -> Owner
          - filter: config_type == 'Backstage::System' && has(config.spec.owner)
            expr: |
              [{
                "type": "Backstage::Team",
                "name": config.spec.owner
              }].toJSON()

          # System -> Domain
          - filter: config_type == 'Backstage::System' && has(config.spec.domain)
            expr: |
              [{
                "type": "Backstage::Domain",
                "name": config.spec.domain
              }].toJSON()

          # Domain -> Owner
          - filter: config_type == 'Backstage::Domain' && has(config.spec.owner)
            expr: |
              [{
                "type": "Backstage::Team",
                "name": config.spec.owner
              }].toJSON()

Scraper

Field	Description	Scheme
`schedule`	Specify the interval to scrape in cron format. Defaults to every 60 minutes.	`string`
`full`	Set to `true` to extract changes and access logs from scraped configurations. Defaults to `false`.	`bool`
`retention`	Settings for retaining changes, analysis and scraped items	`Retention`
`exec`	Specifies the list of Exec configurations to scrape.	`[]Exec`
`logLevel`	Specify the level of logging.	`string`

Exec

Field	Description	Scheme
`script*`	The script to execute. The script should output JSON to stdout.	`string`
`artifacts`	Artifacts to collect after execution	[]Artifact
`checkout`	Checkout a git repository before running the script	GitConnection
`connections`	Connections for AWS/GCP/Azure credential injection	ExecConnections
`env`	Environment variables to set during execution	[]EnvVar
`setup`	Install runtime dependencies before execution	Setup
`labels`	Labels for each config item.	`map[string]string`
`properties`	Custom templatable properties for the scraped config items.	`[]ConfigProperty`
`tags`	Tags for each config item. Max allowed: 5	`[]ConfigTag`
`transform`	Transform configs after they've been scraped	`Transform`

Setup

The setup field allows you to automatically install runtime environments before executing your script. This is useful when your script requires specific versions of Bun or Python.

Field	Description	Scheme
`bun`	Install Bun runtime	RuntimeSetup
`python`	Install Python runtime via uv	RuntimeSetup

RuntimeSetup

Field	Description	Scheme
`version`	Version to install (e.g., "1.3.5" for Bun, "3.10.19" for Python, or "latest")	`string`

Python with Inline Dependencies

For Python scripts, you can declare dependencies inline using PEP 723 inline script metadata. The uv package manager is used to manage Python environments.

exec:
  - script: |
      #!/usr/bin/env python3
      # /// script
      # dependencies = [
      #   "pyyaml",
      #   "requests",
      # ]
      # ///
      
      import yaml
      import json
      # Your script logic here
      print(json.dumps([{"id": "1", "type": "Example", "name": "test"}]))
    setup:
      python:
        version: "3.10.19"

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

Field	Description	Scheme
`id*`	A static value or JSONPath expression to use as the ID for the resource.	`string` or JSONPath
`name*`	A static value or JSONPath expression to use as the name for the resource.	`string` or JSONPath
`type*`	A static value or JSONPath expression to use as the type for the resource.	`string` or JSONPath
`class`	A static value or JSONPath expression to use as the class for the resource.	`string` or JSONPath
`createFields`	A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`deleteFields`	A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`description`	A static value or JSONPath expression to use as the description for the resource.	`string` or JSONPath
`format`	Format of config item, defaults to JSON, available options are JSON, properties. See Formats	`string`
`health`	A static value or JSONPath expression to use as the health of the config item.	`string` or JSONPath
`items`	A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.	JSONPath
`status`	A static value or JSONPath expression to use as the status of the config item.	`string` or JSONPath
`timestampFormat`	A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)	`string`

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.

Scraper​

Exec​

Setup​

RuntimeSetup​

Python with Inline Dependencies​

Mapping​

Formats​

JSON​

XML / Properties​

Extracting Changes & Access Logs​

Scraper

Exec

Setup

RuntimeSetup

Python with Inline Dependencies

Mapping

Formats

JSON

XML / Properties

Extracting Changes & Access Logs