Scraping
Control how graph8 scrapes websites and which pages to include or exclude.
Scraping Settings
graph8 can scrape website data to enrich contact and company records. Configure global scraping preferences for your organization.
Global Preferences
- Go to Settings → Scraping
- Configure your scraping behavior:
- Enable/Disable — toggle scraping on or off for your organization
- Concurrency — number of simultaneous scrape requests
- Delay — wait time between requests to the same domain
Skip Rules
Define which URLs or patterns should be excluded from scraping.
Adding Skip Rules
- Click Add Rule
- Choose the rule type:
- Exact URL — skip a specific page
- URL Pattern — skip pages matching a pattern (e.g.,
/blog/*) - Domain — skip an entire domain
- Enter the URL or pattern
- Save
Common Skip Patterns
- Login and authentication pages
- Terms of service and legal pages
- Internal tool URLs
- Social media profile pages
- Pages with sensitive information
Managing Rules
- View all active skip rules in the rules list
- Toggle rules on/off without deleting them
- Edit patterns as your needs change
- Delete rules that are no longer needed
Allowed Domains
Restrict scraping to specific domains for targeted data collection.
Whitelist Mode
When enabled, graph8 only scrapes domains you’ve explicitly allowed:
- Toggle Whitelist Mode on
- Add domains to the allowed list
- Only these domains are scraped
When disabled, all domains are scraped except those matching skip rules.
Rate Limits
Control scraping speed to avoid overwhelming target websites.
Settings
- Requests per second — maximum scrape requests per second per domain
- Concurrent connections — maximum simultaneous connections
- Retry attempts — how many times to retry a failed scrape
graph8 automatically respects robots.txt directives on target websites.
Data Handling
Configure how scraped data maps to your records.
Field Mapping
Scraped data can populate:
- Company fields — website description, industry, technologies, employee count
- Contact fields — job title, social profiles, bio
Data Quality
- graph8 deduplicates scraped data against existing records
- New data only fills empty fields unless you enable overwrite mode
- All scraped data is logged for audit purposes
Frequently Asked Questions
Does graph8 respect robots.txt?
Yes. graph8 follows robots.txt directives by default. Pages disallowed by robots.txt are not scraped.
What happens if a scrape fails?
Failed scrapes are retried based on your retry settings. After all retries are exhausted, the URL is logged as failed and skipped until the next scheduled run.
Can I schedule scraping?
Scraping runs as needed (e.g., when a new company is added or during enrichment). Recurring schedules can be configured for batch operations.
Will scraping affect my website’s performance?
graph8 only scrapes external websites (your prospects’ sites), not your own. Rate limits ensure scraping doesn’t overload target sites.
Tip: Start with conservative rate limits and increase gradually based on your needs.