Skip to content

Scraping

Control how graph8 scrapes websites and which pages to include or exclude.

Scraping Settings

graph8 can scrape website data to enrich contact and company records. Configure global scraping preferences for your organization.

Global Preferences

  1. Go to Settings → Scraping
  2. Configure your scraping behavior:
    • Enable/Disable — toggle scraping on or off for your organization
    • Concurrency — number of simultaneous scrape requests
    • Delay — wait time between requests to the same domain

Skip Rules

Define which URLs or patterns should be excluded from scraping.

Adding Skip Rules

  1. Click Add Rule
  2. Choose the rule type:
    • Exact URL — skip a specific page
    • URL Pattern — skip pages matching a pattern (e.g., /blog/*)
    • Domain — skip an entire domain
  3. Enter the URL or pattern
  4. Save

Common Skip Patterns

  • Login and authentication pages
  • Terms of service and legal pages
  • Internal tool URLs
  • Social media profile pages
  • Pages with sensitive information

Managing Rules

  • View all active skip rules in the rules list
  • Toggle rules on/off without deleting them
  • Edit patterns as your needs change
  • Delete rules that are no longer needed

Allowed Domains

Restrict scraping to specific domains for targeted data collection.

Whitelist Mode

When enabled, graph8 only scrapes domains you’ve explicitly allowed:

  1. Toggle Whitelist Mode on
  2. Add domains to the allowed list
  3. Only these domains are scraped

When disabled, all domains are scraped except those matching skip rules.

Rate Limits

Control scraping speed to avoid overwhelming target websites.

Settings

  • Requests per second — maximum scrape requests per second per domain
  • Concurrent connections — maximum simultaneous connections
  • Retry attempts — how many times to retry a failed scrape

graph8 automatically respects robots.txt directives on target websites.

Data Handling

Configure how scraped data maps to your records.

Field Mapping

Scraped data can populate:

  • Company fields — website description, industry, technologies, employee count
  • Contact fields — job title, social profiles, bio

Data Quality

  • graph8 deduplicates scraped data against existing records
  • New data only fills empty fields unless you enable overwrite mode
  • All scraped data is logged for audit purposes

Frequently Asked Questions

Does graph8 respect robots.txt?

Yes. graph8 follows robots.txt directives by default. Pages disallowed by robots.txt are not scraped.

What happens if a scrape fails?

Failed scrapes are retried based on your retry settings. After all retries are exhausted, the URL is logged as failed and skipped until the next scheduled run.

Can I schedule scraping?

Scraping runs as needed (e.g., when a new company is added or during enrichment). Recurring schedules can be configured for batch operations.

Will scraping affect my website’s performance?

graph8 only scrapes external websites (your prospects’ sites), not your own. Rate limits ensure scraping doesn’t overload target sites.


Tip: Start with conservative rate limits and increase gradually based on your needs.