File
The file
scraper is used to create config items from files in a local folder (or git). This can be used to track changes in files like /etc/hosts
or /etc/passwd
, or for service metadata stored in git.
See Kubernetes Files for scraping files inside running kubernetes pods.
file-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: file-git-scraper
spec:
file:
- type: $.kind
id: $.metadata.name
url: github.com/flanksource/canary-checker?ref=076cf8b888f2dbaca26a7cc98a4153c154220a22
paths:
- fixtures/minimal/http_pass.yaml
Scraper
Field | Description | Scheme | Required |
---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 60 minutes. | string | |
full | Set to true to extract changes from scraped configurations. Defaults to false . | bool | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
file | Specifies the list of File configurations to scrape. | []File |
File
Field | Description | Scheme |
---|---|---|
id* | A deterministic or natural id for the resource |
|
paths* | Specify paths to configuration(s) for scraping | |
type* | e.g. |
|
url | Specify URL e.g github repository containing the configuration(s) |
|
class |
| |
createFields | Identify the created time for a resource (if different to scrape time). If multiple fields are specified, the first non-empty value will be used |
|
deleteFields | Identify when a config item was deleted. If multiple fields are specified, the first non-empty value will be used |
|
format | Format of config item e.g. |
|
items | Extract multiple config items from this array | |
labels | Labels for each config item. | |
name |
| |
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
timestampFormat | Format to parse timestamps in | |
transform | Transform configs after they've been scraped |
Mapping
Custom scrapers require defining the id
, type
, and class
for each scraped item. For example, when scraping a file containing a JSON array, where each array element represents a config item, you need to specify the id
, type
, and config class
for these items. Achieve this by utilizing mappings in your custom scraper configuration.
Field | Description | Scheme | Required |
---|---|---|---|
items | A path pointing to an array, each item is created as a separate config item, all other JSONPath is evaluated from the new items root | JSONPath | true |
id | ID for the config item | JSONPath | true |
type | Type for the config item | JSONPath | true |
class | Class for the config item. (Defaults to type ) | JSONPath | |
name | Name for the config item | JSONPath | |
format | Format of the config source. Defaults to json | json , xml or properties See Formats | |
createFields | Fields to use to determine the items created date, if not specified or the field is not found, defaults to scrape time | []JSONPath | |
deleteFields | Fields to use to determine when an item was deleted if not specified or the field is not found, defaults to scrape time of when the item is no longer detected | []JSONPath | |
timestampFormat | Timestamp format of createFields and deleteFields . (Default 2006-01-02T15:04:05Z07:00) | time.Format | |
full | Scrape result includes the full metadata of a config, including possible changes, See Change Extraction | bool |
Formats
JSON
Config items are stored as jsonb
fields in PostgreSQL.
The JSON used is typically returned by a resource provider. e.g. kubectl get -o json
or aws --output=json
.
When displaying the config, the UI will automatically convert the JSON data to YAML for improved readability.
XML / Properties
Non JSON files are stored as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
Non JSON content can still be accessed in scripts using config.content
The UI formats and render XML appropriately.
Change Extraction
Custom scrapers can also be used to ingest changes from external systems, by using the full
option. In this example, the scraped JSON contains the actual config under config
and a list of changes under changes
.
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: file-scraper
spec:
full: true
file:
- type: Car
id: $.reg_no
paths:
- fixtures/data/car_changes.json
{
"reg_no": "A123",
"config": {
"meta": "this is the actual config that'll be stored."
},
"changes": [
{
"action": "drive",
"summary": "car color changed to blue",
"unrelated_stuff": 123
}
]
}
Since full=true
, Config DB
extracts the config
and changes
from the scraped JSON config, the resulting config is:
{
"meta": "this is the actual config that'll be stored."
}
and the following new config change is recorded:
{
"action": "drive",
"summary": "car color changed to blue",
"unrelated_stuff": 123
}