Olostep provides a programmatic layer to access and interact with the web. Fetch clean, ready-to-use data for your AI from any website in 1–5 seconds, and scale up to 100K parallel requests in minutes. You can:
- Scrape pages with various output formats including structured data
- Batch scrape hundreds of websites simultaneously
- Crawl websites with intelligent filtering and depth control
- Create sitemaps even for websites with hundreds of thousands of pages
- Get answers from web pages using AI-powered extraction
- Breaking Changes in v1.0.0
- Installation
- Quick Start
- API Reference
- Error Handling
- Advanced Features
- Olostep via API vs SDK
- Contributing
- License
Important: Version 1.0.0 introduces breaking changes from v0.x.x. Please update your code accordingly.
Based on user feedback, release 1.0.0 will not only include more features but also provide a homogenized API that emulates patterns you find in other SDKs you might already be using. This introduces the following three types of breaking changes:
- client class renaming,
- and endpoint namespace standardization (plural)
- endpoint method standardization (create)
# Change: The async client now uses the "Async" prefix much like other SDKs of similar design. We also took the opportunity to drop the "Client" part
# Old
from olostep import OlostepClient, SyncOlostepClient
# New
from olostep import AsyncOlostep, Olostep
# So
# OlostepClient -> AsyncOlostep and
# SyncOlostepClient -> Olostep# Change: All endpoint namespaces are now plural for consistency. Also, sitemap was renamed to maps.
# Old
result = (await) client.scrape.ENDPOINT
batch = (await) client.batch.ENDPOINT
crawl = (await) client.crawl.ENDPOINT
sitemap = (await) client.sitemap.ENDPOINT
content = (await) client.retrieve.ENDPOINT
answers = (await) client.answers.ENDPOINT
# New
result = (await) client.scrapes.ENDPOINT
batch = (await) client.batches.ENDPOINT
crawl = (await) client.crawls.ENDPOINT
maps = (await) client.maps.ENDPOINT
content = (await) client.retrieve.ENDPOINT
answers = (await) client.answers.ENDPOINT# Change: All creation methods now use .create() for consistency. Previously, batch and crawl used .start(). Scrape, answers, and sitemap/maps already used .create() and remain unchanged.
# Old
batch = (await) client.batch.start(urls)
crawl = (await) client.crawl.start(url)
# New
batch = (await) client.batches.create(urls)
crawl = (await) client.crawls.create(url)pip install olostepgit clone https://github.com/olostep-api/olostep-py.git
cd olostep-python
pip install -e .- Python 3.11+
- API key from olostep.com
The SDK provides two client options depending on your use case:
|
Best for: Scripts, beginners, and simple use cases where you prefer blocking operations. The sync client provides a simpler, blocking interface that's easier to get started with if you're new to async/await. |
Best for: Performance-critical applications, backend services, and handling many concurrent requests. The async client provides non-blocking operations and is the recommended choice for production applications that need high throughput. |
Both SDK clients provide the same clean, pythonic interface organized into logical namespaces:
| Namespace | Purpose | Key Methods |
|---|---|---|
scrapes |
Single URL extraction | create(), get() |
batches |
Multi-URL processing | create(), info(), items() |
crawls |
Website traversal | create(), info(), pages() |
maps |
Link extraction | create(), urls() |
answers |
AI-powered extraction | create(), get() |
retrieve |
Content retrieval | get() |
Each operation returns stateful objects with ergonomic methods for follow-up operations.
Catch all SDK errors using the base exception class:
from olostep import Olostep, Olostep_BaseError
client = Olostep(api_key="your-api-key")
try:
result = client.scrapes.create(url_to_scrape="https://example.com")
except Olostep_BaseError as e:
print(f"Error has occurred: {type(e).__name__}")
print(f"Error message: {e}")For detailed error handling information, including the full exception hierarchy and granular error handling options, see Error Handling.
The SDK automatically retries on transient errors (network issues, temporary server problems) based on the RetryStrategy configuration. You can customize the retry behavior by passing a RetryStrategy instance when creating the client:
from olostep import Olostep, RetryStrategy
retry_strategy = RetryStrategy(
max_retries=3,
initial_delay=1.0,
jitter_min=0.2,
jitter_max=0.8
)
client = Olostep(api_key="your-api-key", retry_strategy=retry_strategy)
result = client.scrapes.create("https://example.com")For detailed retry configuration options and best practices, see Retry Strategy.
The SDK intelligently handles various input formats for maximum convenience:
from olostep import Olostep, Country
client = Olostep(api_key="your-api-key")
# Formats: string, list, or enum
client.scrapes.create(url_to_scrape="https://example.com", formats="html")
client.scrapes.create(url_to_scrape="https://example.com", formats=["html", "markdown"])
# Countries: case-insensitive strings or enums
client.scrapes.create(url_to_scrape="https://example.com", country="us")
client.scrapes.create(url_to_scrape="https://example.com", country=Country.US)
# Lists: single values or lists
client.batches.create(urls="https://example.com") # Single URL
client.batches.create(urls=["https://a.com", "https://b.com"]) # Multiple URLsfrom olostep import Olostep, Format, Country, WaitAction, FillInputAction
client = Olostep(api_key="your-api-key")
# Full control over scraping behavior
result = client.scrapes.create(
url_to_scrape="https://news.google.com/",
wait_before_scraping=3000,
formats=[Format.HTML, Format.MARKDOWN],
remove_css_selectors=["script", ".popup"],
actions=[
WaitAction(milliseconds=1500),
FillInputAction(selector="searchbox", value="olostep")
],
parser="@olostep/google-news",
country=Country.US,
remove_images=True
)from olostep import Olostep, Country
client = Olostep(api_key="your-api-key")
batch = client.batches.create([
{"url": "https://www.google.com/search?q=python", "custom_id": "search_1"},
{"url": "https://www.google.com/search?q=javascript", "custom_id": "search_2"},
{"url": "https://www.google.com/search?q=typescript", "custom_id": "search_3"}
],
country=Country.US,
parser="@olostep/google-search"
)
# Process results by custom ID
# When using a parser, retrieve JSON content instead of HTML
for item in batch.items():
if item.custom_id == "search_2":
content = item.retrieve(["json"])
print(f"Search result: {content.json_content}")from olostep import Olostep
client = Olostep(api_key="your-api-key")
# Crawl with intelligent filtering
crawl = client.crawls.create(
start_url="https://www.bbc.com",
max_pages=1000,
max_depth=3,
include_urls=["/articles/**", "/news/**"],
exclude_urls=["/ads/**", "/tracking/**"],
include_external=False,
include_subdomain=True,
)
for page in crawl.pages():
content = page.retrieve(["html"])
print(f"Crawled: {page.url}")from olostep import Olostep
client = Olostep(api_key="your-api-key")
# Extract all links with advanced filtering
maps = client.maps.create(
url="https://www.bbc.com",
include_subdomain=True,
include_urls=["/articles/**", "/news/**"],
exclude_urls=["/ads/**", "/tracking/**"]
)
# Get filtered URLs
urls = []
for url in maps.urls():
urls.append(url)
print(f"Found {len(urls)} relevant URLs")from olostep import Olostep
client = Olostep(api_key="your-api-key")
# First create an answer
created_answer = client.answers.create(
task="What is the main topic of https://example.com?"
)
# Then retrieve it using the ID
answer = client.answers.get(answer_id=created_answer.id)
print(f"Answer: {answer.answer}")from olostep import Olostep
client = Olostep(api_key="your-api-key")
# Get content by retrieve ID
result = client.retrieve.get(retrieve_id="ret_123")
# Get multiple formats
result = client.retrieve.get(retrieve_id="ret_123", formats=["html", "markdown", "text", "json"])The Olostep Python SDK offers significant advantages over direct API usage:
| Feature | SDK Advantage | Direct API |
|---|---|---|
| Interface | Discoverable dot-notation namespaces | REST endpoints |
| Type Safety | Full Pydantic validation & type hints | Manual validation |
| State Management | Stateful return objects with rich __repr__s |
Manual state tracking |
| Error Handling | Automatic retries & connection management | Manual implementation |
| Pagination | Elegant (async) for loops |
Manual cursor handling |
| Input Coercion | Smart type coercion & validation | Manual data preparation |
| Performance | Async-first with sync facade | HTTP request overhead |
- Developer Experience: Type-safe, ergonomic Python API with intelligent input handling
- Production Ready: Automatic retries, connection pooling, and error recovery
- AI-Optimized: Clean, structured data extraction perfect for LLM applications
- Enterprise Scale: Handle 100K+ concurrent requests with robust error handling
- Future-Proof: Built on modern Python patterns (async/await, type hints, dataclasses)
Enable logging to debug issues:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("olostep")
logger.setLevel(logging.INFO) # Use DEBUG for verbose outputLog Levels: INFO (recommended), DEBUG (verbose), WARNING, ERROR
| Variable | Description | Default |
|---|---|---|
OLOSTEP_API_KEY |
Your API key | Required |
OLOSTEP_BASE_API_URL |
API base URL | https://api.olostep.com/v1 |
OLOSTEP_API_TIMEOUT |
Request timeout (seconds) | 150 |
We love contributions! Here's how you can help improve the Olostep Python SDK:
git clone https://github.com/olostep-api/olostep-py/issues.git
cd olostep-python
pip install -e ".[dev]"# Run all tests
pytest
# Run specific test categories
pytest tests/unit/
# DO NOT RUN THE API CONTRACT TEST!# Format code
ruff format olostep/
# Lint code
ruff check olostep/
mypy olostep/- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-enhancement) - Make your changes with comprehensive tests
- Run the full test suite
- Update documentation for API changes
- Submit a pull request with a clear description
- Follow PEP 8 and existing code style
- Add type hints for new functions
- Write comprehensive tests for new functionality
- Update documentation for API changes
- Use conventional commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ by the Olostep team