HTTP
The HTTP scraper allows you to collect data from HTTP endpoints and APIs. It supports various authentication methods and data transformation capabilities.
aws-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: lastfm-scraper
spec:
http:
- type: 'LastFM::Singer'
name: '$.name'
id: '$.url'
env:
- name: api_key
valueFrom:
secretKeyRef:
name: lastfm
key: API_KEY
url: 'http://ws.audioscrobbler.com/2.0/?method=chart.gettopartists&api_key={{.api_key}}&format=json'
transform:
expr: |
dyn(config).artists.artist.map(item, item).toJSON()
| Field | Description | Scheme | Required |
|---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 15 minutes. | Cron | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
http | Specifies the list of HTTP configurations to scrape. | []HTTP | |
logLevel | Specify the level of logging. | string | |
full | Set to true to extract changes and access logs from scraped configurations. Defaults to false. | bool |
HTTP Configuration
| Field | Description | Scheme |
|---|---|---|
url* | The URL to send the HTTP request to. Must include the scheme (http:// or https://) |
|
bearer | Bearer token for authentication | |
body | Request body for POST/PUT requests |
|
connection | Reference to a pre-configured HTTP connection. Use this to reuse connection settings across multiple scrapers |
|
digest | Enable Digest authentication, a more secure alternative to Basic authentication | boolean |
env | Environment variables to be used in the templating | |
headers | HTTP headers to include in the request | |
method | HTTP method to use (GET, POST, etc.) |
|
ntlm | Enable Windows NTLM authentication protocol. Typically used in corporate environments | boolean |
ntlmv2 | Enable NTLMv2 authentication protocol, a more secure version of NTLM | boolean |
oauth.clientID | OAuth 2.0 client identifier | |
oauth.clientSecret | OAuth 2.0 client secret | |
oauth.params | Additional OAuth 2.0 parameters to include in the token request |
|
oauth.scopes | List of OAuth 2.0 scopes |
|
oauth.tokenURL | OAuth 2.0 token endpoint URL |
|
password | Password for Basic or Digest authentication. | |
tls.ca | Custom Certificate Authority (CA) certificate for TLS verification. Used for self-signed or internal certificates | |
tls.cert | Client TLS certificate for mutual TLS authentication (mTLS) | |
tls.handshakeTimeout | Maximum time to wait for TLS handshake completion. Example: "30s", "1m" | |
tls.insecureSkipVerify | Skip TLS certificate verification. Use with caution - only enable in trusted environments or for testing | boolean |
tls.key | Private key corresponding to the client TLS certificate for mTLS | |
username | Username for Basic or Digest authentication. | |
labels | Labels for each config item. |
|
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
transform | Transform configs after they've been scraped |
Mapping
Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items.
You can achieve this by using mappings in your custom scraper configuration.
| Field | Description | Scheme |
|---|---|---|
id* | A static value or JSONPath expression to use as the ID for the resource. |
|
name* | A static value or JSONPath expression to use as the name for the resource. |
|
type* | A static value or JSONPath expression to use as the type for the resource. |
|
class | A static value or JSONPath expression to use as the class for the resource. |
|
createFields | A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
deleteFields | A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
description | A static value or JSONPath expression to use as the description for the resource. |
|
format | Format of config item, defaults to JSON, available options are JSON, properties. See Formats |
|
health | A static value or JSONPath expression to use as the health of the config item. |
|
items | A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item. | |
status | A static value or JSONPath expression to use as the status of the config item. |
|
timestampFormat | A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339) |
|
Formats
JSON
The scraper stores config items as jsonb fields in PostgreSQL.
Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.
When you display the config, the UI automatically converts the JSON data to YAML for improved readability.
XML / Properties
The scraper stores non-JSON files as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
You can still access non-JSON content in scripts using config.content.
The UI formats and renders XML appropriately.
Extracting Changes & Access Logs
When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.