File
The file scraper is used to create config items from files in a local folder (or git). This can be used to track changes in files like /etc/hosts or /etc/passwd, or for service metadata stored in git.
See Kubernetes Files for scraping files inside running kubernetes pods.
file-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: file-git-scraper
spec:
file:
- type: $.kind
id: $.metadata.name
url: github.com/flanksource/canary-checker?ref=076cf8b888f2dbaca26a7cc98a4153c154220a22
paths:
- fixtures/minimal/http_pass.yaml
Scraper
| Field | Description | Scheme | Required |
|---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 60 minutes. | string | |
full | Set to true to extract changes and access logs from scraped configurations. Defaults to false. | bool | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
file | Specifies the list of File configurations to scrape. | []File | |
logLevel | Specify the level of logging. | string |
File
| Field | Description | Scheme |
|---|---|---|
paths* | Specify paths to configuration(s) for scraping | []glob |
connection | Connection name to use for URL credentials |
|
format | File format (e.g., json, yaml) |
|
icon | Icon for the scraped config items |
|
ignore | Patterns to ignore when scraping files |
|
url | Specify URL e.g github repository containing the configuration(s) |
|
labels | Labels for each config item. |
|
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
transform | Transform configs after they've been scraped |
Mapping
Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items.
You can achieve this by using mappings in your custom scraper configuration.
| Field | Description | Scheme |
|---|---|---|
id* | A static value or JSONPath expression to use as the ID for the resource. |
|
name* | A static value or JSONPath expression to use as the name for the resource. |
|
type* | A static value or JSONPath expression to use as the type for the resource. |
|
class | A static value or JSONPath expression to use as the class for the resource. |
|
createFields | A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
deleteFields | A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
description | A static value or JSONPath expression to use as the description for the resource. |
|
format | Format of config item, defaults to JSON, available options are JSON, properties. See Formats |
|
health | A static value or JSONPath expression to use as the health of the config item. |
|
items | A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item. | |
status | A static value or JSONPath expression to use as the status of the config item. |
|
timestampFormat | A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339) |
|
Formats
JSON
The scraper stores config items as jsonb fields in PostgreSQL.
Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.
When you display the config, the UI automatically converts the JSON data to YAML for improved readability.
XML / Properties
The scraper stores non-JSON files as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
You can still access non-JSON content in scripts using config.content.
The UI formats and renders XML appropriately.
Full Scraper Output
When you enable full: true, custom scrapers can return complex objects containing config data, changes, access logs, and external entities.
See the Custom Scraper page for the full output schema, shorthand keys, external entity schemas, and alias resolution.