Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions docs/01_introduction/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,16 @@ import CodeBlock from '@theme/CodeBlock';

import IntroductionExample from '!!raw-loader!./code/01_introduction.py';

The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It gives you everything you need to build an Actor and run it both locally and on the [Apify platform](https://docs.apify.com/platform), including:

- **Actor lifecycle management** — initialization, graceful shutdown, status messages, rebooting, and metamorphing.
- **Storage access** — datasets, key-value stores, and request queues, with automatic local emulation when running outside the platform.
- **Actor input** — convenient access to the Actor input, including automatic decryption of secret fields.
- **Events & state persistence** — react to platform events (system info, migration, abort) and persist state across migrations and restarts.
- **Proxy management** — Apify Proxy and custom proxies, with session and tiered-proxy support.
- **Platform interaction** — start, call, and abort other Actors and tasks, create webhooks, and reach the full Apify API client.
- **Monetization** — charge users with the pay-per-event pricing model.
- **Framework integrations** — first-class support for [Crawlee](../guides/crawlee) and [Scrapy](../guides/scrapy).

<CodeBlock className="language-python">
{IntroductionExample}
Expand All @@ -29,7 +38,7 @@ Explore the Guides section in the sidebar for a deeper understanding of the SDK'

## Installation

The Apify SDK for Python requires Python version 3.10 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:
The Apify SDK for Python requires Python version 3.11 or above. It is typically installed when you create a new Actor project using the [Apify CLI](https://docs.apify.com/cli). To install it manually in an existing project, use:

```bash
pip install apify
Expand Down
7 changes: 4 additions & 3 deletions docs/01_introduction/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@ The Actor's runtime dependencies are specified in the `requirements.txt` file, w
The Actor's source code is in the `src` folder. This folder contains two important files:

- `main.py` - which contains the main function of the Actor
- `__main__.py` - which is the entrypoint of the Actor package setting up the Actor [logger](../concepts/logging) and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
- `__main__.py` - which is the entrypoint of the Actor package, executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).

<Tabs>
<TabItem value="main.py" label="main.py" default>
<CodeBlock className="language-python">
{MainExample}
</CodeBlock>
</TabItem>
<TabItem value="__main__.py" label="__main.py__">
<TabItem value="__main__.py" label="__main__.py">
<CodeBlock className="language-python">
{UnderscoreMainExample}
</CodeBlock>
Expand All @@ -85,7 +85,7 @@ To learn more about the features of the Apify SDK and how to use them, check out

- [Actor lifecycle](../concepts/actor-lifecycle)
- [Actor input](../concepts/actor-input)
- [Working with storages](../concepts/storages)
- [Storages](../concepts/storages)
- [Actor events & state persistence](../concepts/actor-events)
- [Proxy management](../concepts/proxy-management)
- [Interacting with other Actors](../concepts/interacting-with-other-actors)
Expand All @@ -94,6 +94,7 @@ To learn more about the features of the Apify SDK and how to use them, check out
- [Logging](../concepts/logging)
- [Actor configuration](../concepts/actor-configuration)
- [Pay-per-event monetization](../concepts/pay-per-event)
- [Storage clients](../concepts/storage-clients)

### Guides

Expand Down
2 changes: 1 addition & 1 deletion docs/02_concepts/01_actor_lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,4 @@ Update the status only when the user's understanding of progress changes - avoid

## Conclusion

This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="">reference docs</ApiLink>, [guides](https://docs.apify.com/sdk/python/docs/guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
This page has presented the full Actor lifecycle: initialization, execution, error handling, rebooting, shutdown and status messages. You've seen how the SDK supports both context-based and manual control patterns. For deeper dives, explore the <ApiLink to="class/Actor">`Actor` API reference</ApiLink>, [guides](../guides/beautifulsoup-httpx), and [platform documentation](https://docs.apify.com/platform).
6 changes: 5 additions & 1 deletion docs/02_concepts/02_actor_input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import ApiLink from '@theme/ApiLink';

The Actor gets its [input](https://docs.apify.com/platform/actors/running/input) from the input record in its default [key-value store](https://docs.apify.com/platform/storage/key-value-store).

To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It will get the input record key from the Actor configuration, read the record from the default key-value store,and decrypt any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).
To access it, instead of reading the record manually, you can use the <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> convenience method. It will get the input record key from the Actor configuration, read the record from the default key-value store, and decrypt any [secret input fields](https://docs.apify.com/platform/actors/development/secret-input).

For example, if an Actor received a JSON input with two fields, `{ "firstNumber": 1, "secondNumber": 2 }`, this is how you might process it:

Expand All @@ -34,4 +34,8 @@ The Apify platform supports [secret input fields](https://docs.apify.com/platfor

No special handling is needed in your code — when you call <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, encrypted fields are automatically decrypted using the Actor's private key, which is provided by the platform via environment variables. You receive the plaintext values directly.

## Conclusion

This page has shown how to read Actor input with <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, how to load URL sources with <ApiLink to="class/ApifyRequestList">`ApifyRequestList`</ApiLink>, and how secret input fields are decrypted automatically when you read them.

For more details on Actor input and how to define input schemas, see the [Actor input](https://docs.apify.com/platform/actors/running/input) and [input schema](https://docs.apify.com/platform/actors/development/input-schema) documentation on the Apify platform.
16 changes: 10 additions & 6 deletions docs/02_concepts/03_storages.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
id: storages
title: Working with storages
title: Storages
description: Use datasets, key-value stores, and request queues to persist Actor data.
---

Expand Down Expand Up @@ -45,11 +45,11 @@ Each dataset item, key-value store record, or request in a request queue is then

When developing locally, opening any storage will by default use local storage. To change this behavior and to use remote storage you have to use `force_cloud=True` argument in <ApiLink to="class/Actor#open_dataset">`Actor.open_dataset`</ApiLink>, <ApiLink to="class/Actor#open_request_queue">`Actor.open_request_queue`</ApiLink> or <ApiLink to="class/Actor#open_key_value_store">`Actor.open_key_value_store`</ApiLink>. Proper use of this argument allows you to work with both local and remote storages.

Calling another remote Actor and accessing its default storage is typical use-case for using `force-cloud=True` argument to open remote Actor's storages.
Calling another remote Actor and accessing its default storage is a typical use-case for using `force_cloud=True` argument to open remote Actor's storages.

### Local storage persistence

By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before the running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.
By default, the storage contents are persisted across multiple Actor runs. To clean up the Actor storages before running the Actor, use the `--purge` flag of the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command of the Apify CLI.

```bash
apify run --purge
Expand Down Expand Up @@ -106,8 +106,8 @@ To get an iterator of the data, you can use the <ApiLink to="class/Dataset#itera
### Exporting items

You can also export the dataset items into a key-value store, as either a CSV or a JSON record,
using the <ApiLink to="class/Dataset#export_to_csv">`Dataset.export_to_csv`</ApiLink>
or <ApiLink to="class/Dataset#export_to_json">`Dataset.export_to_json`</ApiLink> method.
using the <ApiLink to="class/Dataset#export_to">`Dataset.export_to`</ApiLink> method with the
`content_type` argument set to `'csv'` or `'json'`.

<RunnableCodeBlock className="language-python" language="python">
{DatasetExportsExample}
Expand Down Expand Up @@ -183,6 +183,10 @@ To check if all the requests in the queue are handled, you can use the <ApiLink

## Storage clients

Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. If you want to learn more about how storage clients work, the available implementations, or how to configure them, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients). The Apify-specific clients are available in the `apify.storage_clients` module.
Behind the scenes, the SDK uses storage clients to communicate with the storage backend. The appropriate client is selected automatically based on the runtime environment — on the Apify platform, data is persisted via the Apify API, while local runs use the filesystem. For most use cases, you don't need to think about storage clients at all. To learn about the available implementations, how to switch between a single and shared request queue, or how to configure a custom client, see the [Storage clients](./storage-clients) page. For a deeper look at how storage clients work internally, see the [Crawlee storage clients guide](https://crawlee.dev/python/docs/guides/storage-clients).

## Conclusion

This page has covered the three storage types — datasets, key-value stores, and request queues — how they are emulated on the local filesystem, how to open named and unnamed storages, and how to read from and write to each through the `Actor` shortcuts and the storage classes.

For comprehensive information about storage on the Apify platform, see the [storage documentation](https://docs.apify.com/platform/storage), including the pages on [datasets](https://docs.apify.com/platform/storage/dataset), [key-value stores](https://docs.apify.com/platform/storage/key-value-store), and [request queues](https://docs.apify.com/platform/storage/request-queue).
32 changes: 18 additions & 14 deletions docs/02_concepts/04_actor_events.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o

## Event types

A listener can optionally receive a single argument — a Pydantic model with the event's data. The table below lists the events, the type of that data object, and when each event is emitted.

<table>
<thead>
<tr>
Expand All @@ -25,25 +27,23 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
<tbody>
<tr>
<td><code>SYSTEM_INFO</code></td>
<td><pre>{`{
"created_at": datetime,
"cpu_current_usage": float,
"mem_current_bytes": int,
"is_cpu_overloaded": bool
}`}
</pre></td>
<td><ApiLink to="class/EventSystemInfoData"><code>EventSystemInfoData</code></ApiLink></td>
<td>
<p>This event is emitted regularly and it indicates the current resource usage of the Actor.</p>
The <code>is_cpu_overloaded</code> argument indicates whether the current CPU usage is higher than <code>Config.max_used_cpu_ratio</code>
<p>Emitted regularly to report the Actor's current resource usage. The
<code>cpu_info.used_ratio</code> field reports the fraction of CPU currently in use
(a float between <code>0.0</code> and <code>1.0</code>), and <code>memory_info.current_size</code>
reports the current memory usage. Compare <code>cpu_info.used_ratio</code> against
<code>Configuration.max_used_cpu_ratio</code> to detect CPU overload.</p>
</td>
</tr>
<tr>
<td><code>MIGRATING</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventMigratingData"><code>EventMigratingData</code></ApiLink></td>
<td>
<p>Emitted when the Actor running on the Apify platform
is going to be <a href="https://docs.apify.com/platform/actors/development/state-persistence#what-is-a-migration">migrated</a>
{' '}to another worker server soon.</p>
{' '}to another worker server soon. The <code>time_remaining</code> field reports how much time
the Actor has left before it is force-migrated.</p>
You can use it to persist the state of the Actor so that once it is executed again on the new server,
it doesn't have to start over from the beginning.
Once you have persisted the state of your Actor, you can call <ApiLink to="class/Actor#reboot">`Actor.reboot`</ApiLink>
Expand All @@ -52,7 +52,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>ABORTING</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventAbortingData"><code>EventAbortingData</code></ApiLink></td>
<td>
When a user aborts an Actor run on the Apify platform,
they can choose to abort gracefully to allow the Actor some time before getting killed.
Expand All @@ -61,7 +61,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>PERSIST_STATE</code></td>
<td><pre>{`{ "is_migrating": bool }`}</pre></td>
<td><ApiLink to="class/EventPersistStateData"><code>EventPersistStateData</code></ApiLink></td>
<td>
<p>Emitted in regular intervals (by default 60 seconds) to notify the Actor that it should persist its state,
in order to avoid repeating all work when the Actor restarts.</p>
Expand All @@ -73,7 +73,7 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
</tr>
<tr>
<td><code>EXIT</code></td>
<td><code>None</code></td>
<td><ApiLink to="class/EventExitData"><code>EventExitData</code></ApiLink></td>
<td>
Emitted by the SDK (not the platform) when the Actor is about to exit. You can use this event to perform final cleanup tasks,
such as closing external connections or sending notifications, before the Actor shuts down.
Expand Down Expand Up @@ -103,4 +103,8 @@ You can optionally specify a `key` (the key-value store key under which the stat
{UseStateExample}
</RunnableCodeBlock>

## Conclusion

This page has described the events emitted during a run — `SYSTEM_INFO`, `MIGRATING`, `ABORTING`, `PERSIST_STATE`, and `EXIT` — how to handle them with <ApiLink to="class/Actor#on">`Actor.on`</ApiLink>, and how to persist state automatically with <ApiLink to="class/Actor#use_state">`Actor.use_state`</ApiLink>.

For more details on platform events and state persistence, see the [system events](https://docs.apify.com/platform/actors/development/programming-interface/system-events) and [state persistence](https://docs.apify.com/platform/actors/development/state-persistence) documentation on the Apify platform.
10 changes: 7 additions & 3 deletions docs/02_concepts/05_proxy_management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The Apify SDK provides built-in proxy management through the <ApiLink to="class/

If you want to use Apify Proxy locally, make sure that you run your Actors via the Apify CLI and that you are [logged in](https://docs.apify.com/cli/docs/installation#login-with-your-apify-account) with your Apify account in the CLI.

### Using Apify proxy
### Using Apify Proxy

<RunnableCodeBlock className="language-python" language="python">
{ApifyProxyExample}
Expand All @@ -38,7 +38,7 @@ If you want to use Apify Proxy locally, make sure that you run your Actors via t

All your proxy needs are managed by the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class. You create an instance using the <ApiLink to="class/Actor#create_proxy_configuration">`Actor.create_proxy_configuration()`</ApiLink> method. Then you generate proxy URLs using the <ApiLink to="class/ProxyConfiguration#new_url">`ProxyConfiguration.new_url()`</ApiLink> method.

### Apify proxy vs. your own proxies
### Apify Proxy vs. your own proxies

The `ProxyConfiguration` class covers both Apify Proxy and custom proxy URLs, so that you can easily switch between proxy providers. However, some features of the class are available only to Apify Proxy users, mainly because Apify Proxy is what one would call a super-proxy. It's not a single proxy server, but an API endpoint that allows connection through millions of different IP addresses. So the class essentially has two modes: Apify Proxy or Your proxy.

Expand All @@ -54,7 +54,7 @@ When no `session_id` is provided, your custom proxy URLs are rotated round-robin
{ProxyRotationExample}
</RunnableCodeBlock>

### Apify proxy configuration
### Apify Proxy configuration

With Apify Proxy, you can select specific proxy groups to use, or countries to connect from. For even finer control, you can also target a specific subdivision (e.g. a US state) using the `subdivision_code` parameter alongside `country_code`. This allows you to get better proxy performance after some initial research.

Expand Down Expand Up @@ -120,4 +120,8 @@ Make sure you have the `httpx` library installed:
pip install httpx
```

## Conclusion

This page has explained how to manage proxies with the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> class — using Apify Proxy or your own servers, keeping sessions sticky across requests, configuring tiered proxy rotation, and feeding proxy settings from Actor input.

For full details on proxy configuration options, see the <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> API reference and the [Apify Proxy documentation](https://docs.apify.com/proxy).
4 changes: 4 additions & 0 deletions docs/02_concepts/06_interacting_with_other_actors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,8 @@ When you set `gracefully=True`, the platform sends `ABORTING` and `PERSIST_STATE
{InteractingAbortExample}
</RunnableCodeBlock>

## Conclusion

This page has shown how to interact with other Actors from your code — starting a run with <ApiLink to="class/Actor#start">`Actor.start`</ApiLink>, waiting for it to finish with <ApiLink to="class/Actor#call">`Actor.call`</ApiLink> or <ApiLink to="class/Actor#call_task">`Actor.call_task`</ApiLink>, transforming a run with <ApiLink to="class/Actor#metamorph">`Actor.metamorph`</ApiLink>, and stopping one with <ApiLink to="class/Actor#abort">`Actor.abort`</ApiLink>.

For the full list of methods for interacting with other Actors, see the <ApiLink to="class/Actor">`Actor`</ApiLink> API reference. For more details on running Actors and Actor tasks on the platform, see the [Actors](https://docs.apify.com/platform/actors) and [Actor tasks](https://docs.apify.com/platform/actors/tasks) documentation.
Loading
Loading