HTTP
The HTTP scraper allows you to collect data from HTTP endpoints and APIs. It supports various authentication methods and data transformation capabilities.
aws-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: lastfm-scraper
spec:
http:
- type: 'LastFM::Singer'
name: '$.name'
id: '$.url'
env:
- name: api_key
valueFrom:
secretKeyRef:
name: lastfm
key: API_KEY
url: 'http://ws.audioscrobbler.com/2.0/?method=chart.gettopartists&api_key={{.api_key}}&format=json'
transform:
expr: |
dyn(config).artists.artist.map(item, item).toJSON()
| Field | Description | Scheme | Required |
|---|---|---|---|
schedule | Specify the interval to scrape in cron format. Defaults to every 15 minutes. | Cron | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
http | Specifies the list of HTTP configurations to scrape. | []HTTP | |
logLevel | Specify the level of logging. | string | |
full | Set to true to extract changes and access logs from scraped configurations. Defaults to false. | bool |
HTTP Configuration
| Field | Description | Scheme |
|---|---|---|
url* | The URL to send the HTTP request to. Must include the scheme (http:// or https://) |
|
bearer | Bearer token for authentication | |
body | Request body for POST/PUT requests |
|
connection | Reference to a pre-configured HTTP connection. Use this to reuse connection settings across multiple scrapers |
|
digest | Enable Digest authentication, a more secure alternative to Basic authentication | boolean |
env | Environment variables to be used in the templating | |
headers | HTTP headers to include in the request | |
method | HTTP method to use (GET, POST, etc.) |
|
ntlm | Enable Windows NTLM authentication protocol. Typically used in corporate environments | boolean |
ntlmv2 | Enable NTLMv2 authentication protocol, a more secure version of NTLM | boolean |
oauth.clientID | OAuth 2.0 client identifier | |
oauth.clientSecret | OAuth 2.0 client secret | |
oauth.params | Additional OAuth 2.0 parameters to include in the token request |
|
oauth.scopes | List of OAuth 2.0 scopes |
|
oauth.tokenURL | OAuth 2.0 token endpoint URL |
|
password | Password for Basic or Digest authentication. | |
tls.ca | Custom Certificate Authority (CA) certificate for TLS verification. Used for self-signed or internal certificates | |
tls.cert | Client TLS certificate for mutual TLS authentication (mTLS) | |
tls.handshakeTimeout | Maximum time to wait for TLS handshake completion. Example: "30s", "1m" | |
tls.insecureSkipVerify | Skip TLS certificate verification. Use with caution - only enable in trusted environments or for testing | boolean |
tls.key | Private key corresponding to the client TLS certificate for mTLS | |
username | Username for Basic or Digest authentication. | |
labels | Labels for each config item. |
|
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
transform | Transform configs after they've been scraped |
Mapping
Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items.
You can achieve this by using mappings in your custom scraper configuration.
| Field | Description | Scheme |
|---|---|---|
id* | A static value or JSONPath expression to use as the ID for the resource. |
|
name* | A static value or JSONPath expression to use as the name for the resource. |
|
type* | A static value or JSONPath expression to use as the type for the resource. |
|
class | A static value or JSONPath expression to use as the class for the resource. |
|
createFields | A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
deleteFields | A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used. | []jsonpath |
description | A static value or JSONPath expression to use as the description for the resource. |
|
format | Format of config item, defaults to JSON, available options are JSON, properties. See Formats |
|
health | A static value or JSONPath expression to use as the health of the config item. |
|
items | A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item. | |
status | A static value or JSONPath expression to use as the status of the config item. |
|
timestampFormat | A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339) |
|
Formats
JSON
The scraper stores config items as jsonb fields in PostgreSQL.
Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.
When you display the config, the UI automatically converts the JSON data to YAML for improved readability.
XML / Properties
The scraper stores non-JSON files as JSON using:
{ 'format': 'xml', 'content': '<root>..</root>' }
You can still access non-JSON content in scripts using config.content.
The UI formats and renders XML appropriately.
Full Scraper Output
When you enable full: true, custom scrapers can return complex objects containing config data, changes, access logs, and external entities.
See the Custom Scraper page for the full output schema, shorthand keys, external entity schemas, and alias resolution.