Scraping

Control how graph8 scrapes websites and which pages to include or exclude.

Scraping Settings

graph8 can scrape website data to enrich contact and company records. Configure global scraping preferences for your organization.

Global Preferences

Go to Settings → Scraping
Configure your scraping behavior:
- Enable/Disable — toggle scraping on or off for your organization
- Concurrency — number of simultaneous scrape requests
- Delay — wait time between requests to the same domain

Skip Rules

Define which URLs or patterns should be excluded from scraping.

Adding Skip Rules

Click Add Rule
Choose the rule type:
- Exact URL — skip a specific page
- URL Pattern — skip pages matching a pattern (e.g., /blog/*)
- Domain — skip an entire domain
Enter the URL or pattern
Save

Common Skip Patterns

Login and authentication pages
Terms of service and legal pages
Internal tool URLs
Social media profile pages
Pages with sensitive information

Managing Rules

View all active skip rules in the rules list
Toggle rules on/off without deleting them
Edit patterns as your needs change
Delete rules that are no longer needed

Allowed Domains

Restrict scraping to specific domains for targeted data collection.

Whitelist Mode

When enabled, graph8 only scrapes domains you’ve explicitly allowed:

Toggle Whitelist Mode on
Add domains to the allowed list
Only these domains are scraped

When disabled, all domains are scraped except those matching skip rules.

Rate Limits

Control scraping speed to avoid overwhelming target websites.

Settings

Requests per second — maximum scrape requests per second per domain
Concurrent connections — maximum simultaneous connections
Retry attempts — how many times to retry a failed scrape

graph8 automatically respects robots.txt directives on target websites.

Data Handling

Configure how scraped data maps to your records.

Field Mapping

Scraped data can populate:

Company fields — website description, industry, technologies, employee count
Contact fields — job title, social profiles, bio

Data Quality

graph8 deduplicates scraped data against existing records
New data only fills empty fields unless you enable overwrite mode
All scraped data is logged for audit purposes

Frequently Asked Questions

Does graph8 respect robots.txt?

Yes. graph8 follows robots.txt directives by default. Pages disallowed by robots.txt are not scraped.

What happens if a scrape fails?

Failed scrapes are retried based on your retry settings. After all retries are exhausted, the URL is logged as failed and skipped until the next scheduled run.

Can I schedule scraping?

Scraping runs as needed (e.g., when a new company is added or during enrichment). Recurring schedules can be configured for batch operations.

Will scraping affect my website’s performance?

graph8 only scrapes external websites (your prospects’ sites), not your own. Rate limits ensure scraping doesn’t overload target sites.

Tip: Start with conservative rate limits and increase gradually based on your needs.