← All repositories

firecrawlfirecrawl

84,034 stars6,093 forksTypeScriptagpl-3.02 views
firecrawl.dev

Firecrawl

Features

  • Autonomous Web AgentsInterpret natural language prompts to perform complex data gathering tasks by navigating the web and making independent decisions to locate specific information across multiple sources.
  • Autonomous Web ResearchersAutomating complex information gathering tasks by allowing agents to navigate, map, and extract data from websites without manual intervention.
  • Agentic Web BrowsingEquipping AI agents with the ability to perform live web searches and interact with pages to solve real-time information retrieval problems.
  • Web Access InterfacesInitialize standardized command-line interfaces to expose web-browsing capabilities to local development environments or coding agents for external network communication.
  • Application Integration SDKsConnect web data extraction capabilities to backend services or agent loops by utilizing software development kits and API keys for programmatic access.
  • Automated Workflow GeneratorsRoute requests to specialized workflow skills to automate data collection and synthesis for creating finished artifacts like research briefs or SEO audits.
  • LLM-Ready Data ExtractorsA data extraction engine that converts unstructured web content into clean, structured formats optimized for large language model ingestion.
  • LLM Data Preparation ToolsConverting unstructured web content into clean, structured formats like Markdown or JSON to feed directly into large language models.
  • LLM-Driven Data ExtractorsTransforms unstructured HTML into clean, semantic markdown or structured JSON by leveraging large language models for intelligent content parsing.
  • Web Content ScrapersExtract content from a single URL and convert it into structured formats like markdown to prepare raw data for use in downstream applications or processing pipelines.
  • Web Search APIsRetrieve relevant links and content based on natural language queries by executing search requests with configurable result limits to gather targeted information from across the internet.
  • Web Data ConnectorsConnect web crawling and scraping capabilities to AI agents and automation platforms by utilizing pre-built connectors and server-side tools for data ingestion.
  • Web Data PipelinesConnecting live web data sources to backend services and automation workflows using standardized APIs and protocols for consistent ingestion.
  • Browser Automation InterfacesScrape a page, then interact with it using AI prompts or code. ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.scrape("https://amazon.com") scrape_id = result.metadata.sc
  • Website CrawlersDiscover all URLs on a website instantly. ```bash curl -X POST 'https://api.firecrawl.dev/v2/map' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"url": "https://firecrawl.dev"}'
  • Autonomous Web CrawlersA recursive navigation service that maps site structures and traverses domains to aggregate comprehensive datasets from multiple interconnected pages.
  • Distributed Crawling InfrastructuresA scalable architecture for executing large-scale web data collection tasks across private or managed environments with built-in concurrency and error management.
  • Large-Scale Domain CrawlersSystematically discovering and indexing entire domains to retrieve comprehensive datasets for training, analysis, or content migration projects.
  • Web CrawlersCrawl an entire website and get content from all pages. ```bash curl -X POST 'https://api.firecrawl.dev/v2/crawl' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https:/
  • Web Scraping APIsGet LLM-ready data from any website — markdown, JSON, screenshots, and more. ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.scrape('firecrawl.dev') ``` <details> <summar
  • Autonomous Research AgentsRetrieve structured data from target URLs by providing prompts and output schemas to automate complex research tasks without manual navigation.
  • Headless Browser OrchestratorsExecutes dynamic page rendering and interaction by controlling isolated browser instances to capture content from JavaScript-heavy web applications.
  • Recursive Web CrawlersFollow links up to a specified depth to retrieve structured data from multiple pages while automatically waiting for completion to ensure all content is captured.
  • Batch ScrapersScrape multiple URLs at once: ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") job = app.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", "https://firecrawl.dev/pr
  • Batch Web ScrapersExtract content from multiple web pages simultaneously by providing a list of URLs to convert unstructured web data into structured formats like markdown.
  • Crawl API EndpointsCrawl API Endpoint — a named example documented in this learning resource.
  • Screenshot Capture ServicesScrape Screenshot Capability — a named example documented in this learning resource.
  • Distributed Crawl CoordinationScales web discovery across multiple nodes by partitioning URL frontiers and managing concurrency limits to ensure efficient site indexing.
  • Model Context Protocol IntegrationsConnect AI-powered clients to web data sources using a standardized protocol to facilitate seamless tool integration and real-time information retrieval.
  • Asynchronous Job QueuesManages long-running crawling and scraping tasks by decoupling request submission from execution via persistent background worker processes.
  • Web Data Service IntegrationsConnect web scraping and data cleaning tools to automation workflows and agentic frameworks using standardized protocols to ensure consistent data ingestion across diverse development environments.
  • Agentic Browsing InterfacesA programmatic layer that exposes web navigation and interaction capabilities to autonomous agents through standardized protocols and tool definitions.
  • Stateful Session PersistenceMaintains browser context and authentication state across multiple interactions to enable complex navigation flows and multi-step web tasks.