Integrating Crawlio in Your Node.js App: A Practical Guide Using the SDK
If you're working on a Node.js project and need a dependable way to scrape or crawl web pages, Crawlio’s JavaScript SDK provides a concise, no-frills interface to get the job done. It’s built for developers who want predictable behavior, clear error handling, and enough flexibility to deal with real-world websites—including those that require interaction before scraping.
This guide walks through integrating Crawlio into a Node.js app using the crawlio.js
SDK. It covers basic usage, advanced workflow automation, and how Crawlio compares in practice to tools like Firecrawl.
📦 Installing the SDK
npm install crawlio.js
🔑 Setup
Before using the SDK, you'll need an API key from Crawlio. To create one:
- Go to Dashboard
- Navigate to the API Keys section in dashboard.
- Generate a new key and copy it securely.
Once you have your key :
import Crawlio from 'crawlio.js'
const client = new Crawlio({ apiKey: process.env.CRAWLIO_API_KEY })
🧹 Scraping a Page
const result = await client.scrape({ url: 'https://example.com' })
console.log(result.html)
The scrape
method returns the full HTML content along with optional metadata, discovered URLs, and a Markdown version if requested:
const result = await client.scrape({
url: 'https://example.com',
markdown: true,
returnUrls: true,
includeOnly: ['main', 'article']
})
🗺️ Crawling a Site
const crawl = await client.crawl({
url: 'https://example.com',
count: 10,
sameSite: true,
patterns: ['/blog', '/docs']
})
You can track progress with:
const status = await client.crawlStatus(crawl.id)
const data = await client.crawlResults(crawl.id)
🧰 Advanced Use: Workflow Automation
Many sites rely on JavaScript to render or reveal content. Crawlio supports a workflow
field in the scrape
method to automate browser interactions before scraping begins.
Example:
const result = await client.scrape({
url: 'https://example.com',
workflow: [
{ type: 'wait', duration: 1500 },
{ type: 'scroll', selector: '#comments' },
{ type: 'click', selector: '#loadMore' },
{ type: 'wait', duration: 1000 },
{ type: 'screenshot', selector: '#comments', id: 'comments-ss' },
{ type: 'eval', script: 'document.title' }
]
})
Results include any screenshots and eval outputs:
result.screenshots['comments-ss'] // Screenshot URL
result.evaluation['5'].result // "Page Title"
This is especially useful for single-page apps or lazy-loaded content that would otherwise be missed in a static scrape.
🧱 Building Blocks for Scraping Infrastructure
Crawlio isn’t trying to replace browser automation libraries or all-in-one scraping platforms—it focuses on providing clean, structured access to web content with enough flexibility to cover real-world use cases.
Whether you're:
-
extracting articles for a personal blog aggregator,
-
indexing content for internal search,
-
or automating price tracking on multiple domains,
Crawlio’s SDK gives you a solid foundation with minimal setup.
🧪 Error Handling
Crawlio’s SDK exports several error types, which can be caught for precise handling:
try {
await client.scrape({ url: 'https://example.com' })
} catch (err) {
if (err instanceof CrawlioAuthenticationError) {
console.error('Invalid API key')
} else if (err instanceof CrawlioRateLimit) {
console.warn('Rate limit exceeded')
} else {
console.error(err)
}
}
🧵 Final Thoughts
Crawlio doesn’t try to be everything—it’s designed to be a solid, composable tool for scraping and crawling from code. If you’re building a content pipeline, search tool, or automation service using Node.js, the SDK gives you a reliable interface for working with the web at scale.
It’s worth noting that advanced features like the workflow
system are powerful but currently under-documented. Until that improves, reviewing examples (like the one above) or experimenting directly is the best way to understand what’s possible.
Scrape Smarter, Not Harder
Get the web data you need — without the headaches.
Start with zero setup. Use Crawlio’s API to scrape dynamic pages, search results, or full sites in minutes.