HTTP

The HTTP scraper allows you to collect data from HTTP endpoints and APIs. It supports various authentication methods and data transformation capabilities.

aws-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: lastfm-scraper
spec:
  http:
    - type: 'LastFM::Singer'
      name: '$.name'
      id: '$.url'
      env:
        - name: api_key
          valueFrom:
            secretKeyRef:
              name: lastfm
              key: API_KEY
      url: 'http://ws.audioscrobbler.com/2.0/?method=chart.gettopartists&api_key={{.api_key}}&format=json'
      transform:
        expr: |
          dyn(config).artists.artist.map(item, item).toJSON()

Field	Description	Scheme
`schedule`	Specify the interval to scrape in cron format. Defaults to every 15 minutes.	Cron
`retention`	Settings for retaining changes, analysis and scraped items	`Retention`
`http`	Specifies the list of HTTP configurations to scrape.	`[]HTTP`
`logLevel`	Specify the level of logging.	`string`
`full`	Set to `true` to extract changes and access logs from scraped configurations. Defaults to `false`.	`bool`

HTTP Configuration

Field	Description	Scheme
`url*`	The URL to send the HTTP request to. Must include the scheme (http:// or https://)	`string`
`bearer`	Bearer token for authentication	EnvVar
`body`	Request body for POST/PUT requests	`string`
`connection`	Reference to a pre-configured HTTP connection. Use this to reuse connection settings across multiple scrapers	`string`
`digest`	Enable Digest authentication, a more secure alternative to Basic authentication	boolean
`env`	Environment variables to be used in the templating	[]EnvVar
`headers`	HTTP headers to include in the request	`map[string]EnvVar`
`method`	HTTP method to use (GET, POST, etc.)	`string`
`ntlm`	Enable Windows NTLM authentication protocol. Typically used in corporate environments	boolean
`ntlmv2`	Enable NTLMv2 authentication protocol, a more secure version of NTLM	boolean
`oauth.clientID`	OAuth 2.0 client identifier	EnvVar
`oauth.clientSecret`	OAuth 2.0 client secret	EnvVar
`oauth.params`	Additional OAuth 2.0 parameters to include in the token request	`[map[string]string]`
`oauth.scopes`	List of OAuth 2.0 scopes	`[]string`
`oauth.tokenURL`	OAuth 2.0 token endpoint URL	`string`
`password`	Password for Basic or Digest authentication.	EnvVar
`tls.ca`	Custom Certificate Authority (CA) certificate for TLS verification. Used for self-signed or internal certificates	EnvVar
`tls.cert`	Client TLS certificate for mutual TLS authentication (mTLS)	EnvVar
`tls.handshakeTimeout`	Maximum time to wait for TLS handshake completion. Example: "30s", "1m"	Duration
`tls.insecureSkipVerify`	Skip TLS certificate verification. Use with caution - only enable in trusted environments or for testing	boolean
`tls.key`	Private key corresponding to the client TLS certificate for mTLS	EnvVar
`username`	Username for Basic or Digest authentication.	EnvVar
`labels`	Labels for each config item.	`map[string]string`
`properties`	Custom templatable properties for the scraped config items.	`[]ConfigProperty`
`tags`	Tags for each config item. Max allowed: 5	`[]ConfigTag`
`transform`	Transform configs after they've been scraped	`Transform`

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

Field	Description	Scheme
`id*`	A static value or JSONPath expression to use as the ID for the resource.	`string` or JSONPath
`name*`	A static value or JSONPath expression to use as the name for the resource.	`string` or JSONPath
`type*`	A static value or JSONPath expression to use as the type for the resource.	`string` or JSONPath
`class`	A static value or JSONPath expression to use as the class for the resource.	`string` or JSONPath
`createFields`	A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`deleteFields`	A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`description`	A static value or JSONPath expression to use as the description for the resource.	`string` or JSONPath
`format`	Format of config item, defaults to JSON, available options are JSON, properties. See Formats	`string`
`health`	A static value or JSONPath expression to use as the health of the config item.	`string` or JSONPath
`items`	A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.	JSONPath
`status`	A static value or JSONPath expression to use as the status of the config item.	`string` or JSONPath
`timestampFormat`	A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)	`string`

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.

HTTP Configuration​

Mapping​

Formats​

JSON​

XML / Properties​

Extracting Changes & Access Logs​