Skip to main content

SQL

The SQL scraper executes queries against relational databases and creates configuration items from each row. It supports PostgreSQL, MySQL, and SQL Server, enabling you to:

  • Scrape application configurations stored in databases
  • Track database schema, users, roles, and permissions
  • Ingest audit logs and change history
  • Create relationships between database entities
  • Monitor access patterns and compliance

Use Cases

  • Application Config Management: Track application settings stored in database tables
  • Database Inventory: Catalog databases, schemas, tables, and views
  • IAM/RBAC Tracking: Monitor database users, roles, and permission assignments
  • Audit & Compliance: Ingest change logs and access history for compliance reporting
  • Multi-tenant Monitoring: Track tenant configurations across shared databases

Database-Specific Pages

Configuration

FieldDescriptionSchemeRequired
logLevelSpecify the level of loggingstring
scheduleScrape interval in cron format. Defaults to every 60 minutesCron
fullExtract changes, access logs, users, groups from scraped configurationsbool
retentionSettings for retaining changes, analysis and scraped itemsRetention
sqlList of SQL configurations to scrape[]SQL

SQL

FieldDescriptionScheme
query*

SQL query to execute. Each row becomes a config item

string

auth.password

EnvVar

auth.username

EnvVar

connection

The connection url or name to use`

Connection

driver

Database driver: postgres, mysql, or sqlserver

string

url

If connection is specified and it also includes a url, this field will take precedence

string

labels

Labels for each config item.

map[string]string

properties

Custom templatable properties for the scraped config items.

[]ConfigProperty

tags

Tags for each config item. Max allowed: 5

[]ConfigTag

transform

Transform configs after they've been scraped

Transform

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

FieldDescriptionScheme
id*

A static value or JSONPath expression to use as the ID for the resource.

string or JSONPath

name*

A static value or JSONPath expression to use as the name for the resource.

string or JSONPath

type*

A static value or JSONPath expression to use as the type for the resource.

string or JSONPath

class

A static value or JSONPath expression to use as the class for the resource.

string or JSONPath

createFields

A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

deleteFields

A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

description

A static value or JSONPath expression to use as the description for the resource.

string or JSONPath

format

Format of config item, defaults to JSON, available options are JSON, properties. See Formats

string

health

A static value or JSONPath expression to use as the health of the config item.

string or JSONPath

items

A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.

JSONPath

status

A static value or JSONPath expression to use as the status of the config item.

string or JSONPath

timestampFormat

A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)

string

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

When you enable full: true, custom scrapers can ingest changes and access logs from external systems by separating the config data from change events in your source.

Transform

The transform section allows you to enrich scraped configurations with relationships, locations, aliases, and to extract changes and access data.

Transform Scripts

Transform scripts modify the scraped data before creating config items. Available script types:

TypeDescription
javascriptJavaScript with lodash (_) available
exprCEL expressions
jsonpathJSONPath expressions
gotemplateGo template syntax

When full: true is set, scripts receive the entire config structure and can return multiple config items with changes, relationships, and access data.

transform-example.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: app-configs
spec:
sql:
- connection: connection://postgres/production
type: Application
id: $.app_id
name: $.app_name
transform:
full: true
script:
javascript: |+
// Transform each row into a full config item
var results = [];
for (var i = 0; i < config.rows.length; i++) {
var row = config.rows[i];
results.push({
id: row.app_id,
name: row.app_name,
config: {
version: row.version,
environment: row.environment,
settings: JSON.parse(row.settings_json)
},
changes: row.changes || [],
aliases: [row.app_code, row.legacy_id]
});
}
JSON.stringify(results)
query: |
SELECT
app_id,
app_name,
app_code,
legacy_id,
version,
environment,
settings_json,
(
SELECT json_agg(json_build_object(
'change_type', change_type,
'summary', description,
'created_at', changed_at,
'created_by', changed_by
))
FROM app_changes
WHERE app_changes.app_id = apps.app_id
AND changed_at > NOW() - INTERVAL '24 hours'
) as changes
FROM apps

Changes

To scrape change history from your database, use full: true and include a changes array in your transformed output.

Change Fields

FieldDescriptionType
external_change_idUnique identifier for the changestring
actionAction type: create, update, deletestring
change_typeCategory of change (e.g., ConfigUpdate, SchemaChange)string
summaryHuman-readable descriptionstring
severitySeverity level: info, low, medium, high, criticalstring
sourceOrigin of the changestring
created_byUser or system that made the changestring
created_atTimestamp of the changetime
detailsAdditional structured datamap[string]any
diffDiff content showing what changedstring

Change Mapping & Exclusion

Use transform.changes to map or exclude changes based on their properties:

change-mapping.yaml
transform:
changes:
# Map certain change types to specific severities
mapping:
- filter: change_type == 'SECURITY_UPDATE'
severity: high
type: SecurityChange
- filter: change_type == 'CONFIG_UPDATE' && details.auto_generated == true
severity: info
action: ignore # Ignore auto-generated changes
- filter: change_type == 'DELETE'
action: delete # Mark config as deleted
summary: "Config deleted: {{.summary}}"

# Exclude noisy changes
exclude:
- change_type == 'HEARTBEAT'
- source == 'monitoring-system'

Relationships

Create relationships between config items using transform.relationship. Relationships can link configs within the same scraper or across different scrapers.

Relationship Configuration

FieldDescription
idMatch by config ID (static or CEL expression)
external_idMatch by external ID (static or CEL expression)
nameMatch by name (static or CEL expression)
typeMatch by config type (static or CEL expression)
filterCEL expression to select which configs this relationship applies to
parentIf true, matched configs become parents of the filtered configs
exprCEL expression returning a list of relationship selectors

Relationship Example

relationships.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: database-with-relationships
spec:
sql:
- connection: connection://postgres/production
type: Database::Table
id: $.table_id
name: $.table_name
transform:
relationship:
# Link tables to their parent schema
- filter: config_type == 'Database::Table'
type:
expr: "'Database::Schema'"
name:
expr: config.schema_name
parent: true

# Link tables to the database they belong to
- filter: config_type == 'Database::Table'
type:
expr: "'Database'"
external_id:
expr: config.database_id
parent: true

# Dynamic relationships via CEL expression
- filter: config_type == 'Database::Table' && has(config.foreign_keys)
expr: |
config.foreign_keys.map(fk, {
"type": "Database::Table",
"external_id": fk.referenced_table_id
})
query: |
SELECT
t.table_id,
t.table_name,
t.schema_name,
t.database_id,
(
SELECT json_agg(json_build_object(
'referenced_table_id', fk.referenced_table_id,
'column', fk.column_name
))
FROM foreign_keys fk
WHERE fk.table_id = t.table_id
) as foreign_keys
FROM tables t

Locations & Aliases

Assign geographic or organizational locations and alternative names to config items.

LocationOrAlias Configuration

FieldDescription
typeMatch expression for config types (e.g., Database::*, Application)
filterCEL expression to further filter which configs this applies to
valuesList of location/alias values (supports Go templates)
withParentInherit location/alias from parent config
locations-aliases.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: geo-aware-configs
spec:
sql:
- connection: connection://postgres/production
type: Application
id: $.app_id
name: $.app_name
transform:
locations:
- type: Application
filter: config.region != ''
values:
- "{{.config.region}}"
- "{{.config.datacenter}}"

- type: Database::*
filter: config.availability_zone != ''
values:
- "{{.config.availability_zone}}"

aliases:
- type: Application
values:
- "{{.config.app_code}}"
- "{{.config.legacy_id}}"
- "app-{{.config.tenant_id}}-{{.config.app_id}}"
query: |
SELECT
app_id,
app_name,
app_code,
legacy_id,
tenant_id,
region,
datacenter,
availability_zone
FROM applications

External Users, Groups & Roles

Scrape IAM/RBAC data to track who has access to your configurations. This enables access reviews, compliance reporting, and security monitoring.

Data Structures

ExternalUser

FieldDescription
nameDisplay name of the user
account_idUnique account identifier
user_typeType of user (e.g., human, service, system)
emailEmail address
aliasesAlternative identifiers

ExternalGroup

FieldDescription
nameGroup name
account_idUnique group identifier
group_typeType of group (e.g., role, team, department)
aliasesAlternative identifiers

ExternalRole

FieldDescription
nameRole name
account_idUnique role identifier
role_typeType of role (e.g., database, application, admin)
descriptionRole description
aliasesAlternative identifiers

Scraping Users and Roles

users-roles.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: database-iam
spec:
full: true
sql:
- connection: connection://postgres/production
type: Postgres::Database
id: $.database_name
transform:
full: true
script:
javascript: |+
var db = config.database;
var result = {
id: db.name,
name: db.name,
config: {
version: db.version,
encoding: db.encoding
},
// External users from database roles
external_users: db.users.map(function(u) {
return {
name: u.username,
account_id: u.username,
user_type: u.is_service ? 'service' : 'human',
email: u.email,
aliases: [u.oid]
};
}),
// External groups (role groups)
external_groups: db.role_groups.map(function(g) {
return {
name: g.group_name,
account_id: g.group_name,
group_type: 'role_group',
aliases: [g.oid]
};
}),
// External roles
external_roles: db.roles.map(function(r) {
return {
name: r.role_name,
account_id: r.role_name,
role_type: 'database_role',
description: r.description,
aliases: [r.oid]
};
}),
// User-group memberships
external_user_groups: db.memberships.map(function(m) {
return {
external_user_id: m.user_id,
external_group_id: m.group_id
};
})
};
JSON.stringify([result])
query: |
SELECT json_build_object(
'name', current_database(),
'version', version(),
'encoding', pg_encoding_to_char(encoding),
'users', (
SELECT json_agg(json_build_object(
'username', usename,
'oid', usesysid::text,
'is_service', usename LIKE 'svc_%',
'email', NULL
))
FROM pg_user
),
'role_groups', (
SELECT json_agg(json_build_object(
'group_name', rolname,
'oid', oid::text
))
FROM pg_roles
WHERE rolcanlogin = false
),
'roles', (
SELECT json_agg(json_build_object(
'role_name', rolname,
'oid', oid::text,
'description', pg_catalog.shobj_description(oid, 'pg_authid')
))
FROM pg_roles
),
'memberships', (
SELECT json_agg(json_build_object(
'user_id', m.member::regrole::text,
'group_id', m.roleid::regrole::text
))
FROM pg_auth_members m
)
) as database

Config Access & Access Logs

Track which users/groups have access to which configurations and monitor access patterns.

ConfigAccess

Links users, groups, or roles to specific config items:

FieldDescription
config_idID of the config item
external_user_idUser who has access
external_group_idGroup that has access
external_role_idRole that grants access

ConfigAccessLog

Records access events for compliance and auditing:

FieldDescription
config_idID of the accessed config
external_user_idUser who accessed the config
created_atTimestamp of access
mfaWhether MFA was used
propertiesAdditional access metadata
access-tracking.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: access-audit
spec:
full: true
sql:
- connection: connection://postgres/production
type: Application
id: $.app_id
transform:
full: true
script:
javascript: |+
var results = [];
for (var i = 0; i < config.rows.length; i++) {
var row = config.rows[i];
results.push({
id: row.app_id,
name: row.app_name,
config: row.settings,
// Config access permissions
config_access: row.permissions.map(function(p) {
return {
external_user_id: p.user_id,
external_role_id: p.role_id
};
}),
// Access logs from audit table
access_logs: row.access_logs.map(function(log) {
return {
external_user_id: log.user_id,
created_at: log.accessed_at,
mfa: log.mfa_used,
properties: {
ip_address: log.ip_address,
action: log.action,
user_agent: log.user_agent
}
};
})
});
}
JSON.stringify(results)
query: |
SELECT
a.app_id,
a.app_name,
a.settings,
(
SELECT json_agg(json_build_object(
'user_id', p.user_id,
'role_id', p.role_id
))
FROM app_permissions p
WHERE p.app_id = a.app_id
) as permissions,
(
SELECT json_agg(json_build_object(
'user_id', l.user_id,
'accessed_at', l.accessed_at,
'mfa_used', l.mfa_used,
'ip_address', l.ip_address,
'action', l.action,
'user_agent', l.user_agent
))
FROM access_audit_log l
WHERE l.resource_id = a.app_id
AND l.accessed_at > NOW() - INTERVAL '7 days'
) as access_logs
FROM applications a

Comprehensive Example

This example demonstrates all features working together: scraping application configs with users, roles, relationships, locations, changes, and access tracking.

comprehensive-sql-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: comprehensive-app-scraper
spec:
schedule: "*/30 * * * *" # Every 30 minutes
full: true
retention:
changes:
count: 100
staleItemAge: 7d
sql:
- connection: connection://postgres/production
type: Application
id: $.app_id
name: $.app_name
labels:
environment: production
managed-by: config-db
tags:
- name: team
jsonpath: $.team_name
- name: criticality
jsonpath: $.criticality
transform:
full: true

# Assign locations based on region
locations:
- type: Application
filter: config.region != ''
values:
- "{{.config.region}}"
- "{{.config.datacenter}}"

# Create aliases for lookups
aliases:
- type: Application
values:
- "{{.config.app_code}}"
- "{{.config.legacy_id}}"

# Define relationships
relationship:
# Link to parent organization
- filter: config_type == 'Application'
type:
value: Organization
name:
expr: config.org_name
parent: true

# Link to dependent services
- filter: config_type == 'Application' && has(config.dependencies)
expr: |
config.dependencies.map(dep, {
"type": "Application",
"external_id": dep.app_id
})

# Map and filter changes
changes:
mapping:
- filter: change_type == 'SECURITY'
severity: high
- filter: change_type == 'DEPLOYMENT'
type: Deployment
exclude:
- change_type == 'HEALTHCHECK'

script:
javascript: |+
var results = [];
for (var i = 0; i < config.rows.length; i++) {
var row = config.rows[i];
results.push({
id: row.app_id,
name: row.app_name,
config: {
version: row.version,
environment: row.environment,
settings: JSON.parse(row.settings_json || '{}'),
region: row.region,
datacenter: row.datacenter,
app_code: row.app_code,
legacy_id: row.legacy_id,
org_name: row.org_name,
dependencies: row.dependencies,
team_name: row.team_name,
criticality: row.criticality
},
// Changes from audit log
changes: (row.changes || []).map(function(c) {
return {
external_change_id: c.change_id,
change_type: c.change_type,
summary: c.description,
severity: c.severity || 'info',
created_by: c.changed_by,
created_at: c.changed_at,
details: c.details
};
}),
// Users with access
external_users: (row.users || []).map(function(u) {
return {
name: u.display_name,
account_id: u.user_id,
user_type: u.user_type,
email: u.email
};
}),
// Roles
external_roles: (row.roles || []).map(function(r) {
return {
name: r.role_name,
account_id: r.role_id,
role_type: 'application',
description: r.description
};
}),
// Access permissions
config_access: (row.permissions || []).map(function(p) {
return {
external_user_id: p.user_id,
external_role_id: p.role_id
};
}),
// Recent access logs
access_logs: (row.access_logs || []).map(function(l) {
return {
external_user_id: l.user_id,
created_at: l.accessed_at,
mfa: l.mfa_used,
properties: {
ip: l.ip_address,
action: l.action
}
};
})
});
}
JSON.stringify(results)

query: |
SELECT
a.app_id,
a.app_name,
a.app_code,
a.legacy_id,
a.version,
a.environment,
a.settings_json,
a.region,
a.datacenter,
o.org_name,
t.team_name,
a.criticality,

-- Dependencies
(
SELECT json_agg(json_build_object('app_id', d.dependency_app_id))
FROM app_dependencies d
WHERE d.app_id = a.app_id
) as dependencies,

-- Recent changes
(
SELECT json_agg(json_build_object(
'change_id', c.change_id,
'change_type', c.change_type,
'description', c.description,
'severity', c.severity,
'changed_by', c.changed_by,
'changed_at', c.changed_at,
'details', c.details
) ORDER BY c.changed_at DESC)
FROM app_changes c
WHERE c.app_id = a.app_id
AND c.changed_at > NOW() - INTERVAL '24 hours'
) as changes,

-- Users with access
(
SELECT json_agg(DISTINCT json_build_object(
'user_id', u.user_id,
'display_name', u.display_name,
'user_type', u.user_type,
'email', u.email
))
FROM users u
JOIN app_permissions p ON p.user_id = u.user_id
WHERE p.app_id = a.app_id
) as users,

-- Available roles
(
SELECT json_agg(json_build_object(
'role_id', r.role_id,
'role_name', r.role_name,
'description', r.description
))
FROM app_roles r
WHERE r.app_id = a.app_id
) as roles,

-- Permission assignments
(
SELECT json_agg(json_build_object(
'user_id', p.user_id,
'role_id', p.role_id
))
FROM app_permissions p
WHERE p.app_id = a.app_id
) as permissions,

-- Access audit logs
(
SELECT json_agg(json_build_object(
'user_id', l.user_id,
'accessed_at', l.accessed_at,
'mfa_used', l.mfa_used,
'ip_address', l.ip_address,
'action', l.action
) ORDER BY l.accessed_at DESC)
FROM access_logs l
WHERE l.resource_id = a.app_id
AND l.accessed_at > NOW() - INTERVAL '7 days'
LIMIT 100
) as access_logs

FROM applications a
LEFT JOIN organizations o ON o.org_id = a.org_id
LEFT JOIN teams t ON t.team_id = a.team_id
WHERE a.is_active = true

See Also