Scrape
It’s simple to use: you only need to submit your access_key and a url url of a webpage. The API will return the content of the webpage.
Getting started
REST
The Scrape API, like all of ScreenshotMAX’s APIs, is organized around REST. It is designed to use predictable, resource-oriented URL’s and to use HTTP status codes to indicate errors.
HTTPS
The Scrape API requires all communications to be secured TLS 1.2 or greater.
API Versions
All of ScreenshotMAX’s APIs are versioned. The Scrape API is currently on Version 1.
Your Access Key
Your access key is your unique authentication key to be used to access ScreenshotMAX APIs.
You have to include your access key in the request body as a JSON object.
You can also use the X-Access-Key header to pass your access key.
You can find your access key in your account dashboard.
Base URL
https://api.screenshotmax.com/v1/scrape
Validation endpoint
ScreenshotMAX’s Scrape API simply requires your unique access key and url to be passed in the URL. The API will return the content of the webpage.
POST https://api.screenshotmax.com/v1/scrape
{
"access_key": "YOUR_ACCESS_KEY",
"url": "https://example.com"
}
This was a successful request, so the API returned a 200 OK response. The content of the webpage is returned in the body of the response.
Request parameters
access_keystringBodyRequiredYour unique access key. You can find your access key in your account dashboard.
urlstringBodyRequiredThe URL of the webpage you want to rendering of. Must be a valid URL and accessible from the internet. If the URL contains a querystring, it must be URL-encoded.
For example, https://example.com/test?param=1 should be passed as https%3A%2F%2Fexample.com%2Ftest%3Fparam%3D1.
formatstringBodyDefault: htmlThe format of the screenshot. Available formats are html, md.
The html format returns the HTML content of the webpage, while the md format returns the content in Markdown format.
js_enabledboolBodyDefault: trueWhether to enable JavaScript on the page. If set to false, the API will return the HTML content of the page without executing any JavaScript.
gpu_renderingboolBodyDefault: falseWhether to use GPU rendering. Only available for scale paid plan.
capture_beyond_viewportboolBodyDefault: falseWhether to capture content beyond the viewport.
viewport_devicestringBodyThe device type for the viewport.
viewport_widthnumberBodyDefault: 1280The width of the viewport in pixels.
viewport_heightnumberBodyDefault: 1080The height of the viewport in pixels.
viewport_landscapeboolBodyWhether the viewport should be in landscape mode.
viewport_has_touchboolBodyWhether the viewport has touch capabilities.
viewport_mobileboolBodyWhether the viewport is a mobile device.
device_scale_factornumberBodyThe device scale factor for the viewport.
block_annoyancestringBodyDefault: cookies_bannerThe annoyance to block. Options include none, cookies_banner, ads, tracking.
block_ressourcesstringBodyThe resources to block. Options include document, stylesheet, image, media, font, script, texttrack, xhr, fetch, eventsource, websocket, manifest and other.
media_typestringBodyDefault: screenThe media type for the rendering. Options include screen and print.
vision_deficiencystringBodyThe vision deficiency for the rendering. Options include reduced_contrast, blurred_vision, deuteranopia, achromatopsia.
dark_modeboolBodyDefault: falseWhether to use dark mode for the rendering.
reduced_motionboolBodyDefault: falseWhether to reduce motion for the rendering.
geolocation_accuracynumberBodyThe accuracy of the geolocation in meters. Minimum is 0. Maximum is 1000.
geolocation_latitudenumberBodyThe latitude of the geolocation. Minimum is -90. Maximum is 90.
geolocation_longitudenumberBodyThe longitude of the geolocation. Minimum is -180. Maximum is 180.
media_typestringBodyDefault: screenThe media type for the rendering. Options include screen and print.
attachment_namestringBodyThe name of the attachment, without the extension filename. This is the name that will be used when downloading the response.
Extension will be automatically added based on the format parameter.
timezonestringBodyThe time zone for the request. This allows you to simulate different time zones. Available time zones from the IANA Time Zone Database.
authorizationstringBodyThe authorization header to use for the request. This should be a base64-encoded string (e.g., for Basic Auth, encode "username:password" using base64). This allows you to authenticate with the webpage before capturing the content.
user_agentstringBodyThe user agent to use for the request. This allows you to simulate different browsers and devices.
cookiesstring[]BodyThe cookies to use for the request. This allows you to simulate different sessions and states.
Example: cookies=name=value; name2=value2.
headersstring[]BodyThe headers to use for the request. This allows you to simulate different requests and responses.
Example: headers=header1:value1; header2:value2.
ip_locationstringBodyThe IP location to use for the request. This allows you to simulate requests from different countries by routing them through proxy servers with corresponding IP addresses. This feature is only available on scale paid plan.
Supported locations:
- United States (
us) - China (
cn) - Europe (
eu) (random EU country) - Canada (
ca) - Mexico (
mx) - United Kingdom (
gb) - Germany (
de) - France (
fr) - Switzerland (
ch) - India (
in) - Japan (
jp) - South Korea (
kr) - Russia (
ru) - Brazil (
br) - Australia (
au)
proxystringBodyThe proxy to use for the request. This allows you to route the request through a different IP address.
The proxy must be in the format http://username:password@host:port or https://username:password@host:port.
bypass_cspboolBodyDefault: falseWhether to bypass the Content Security Policy (CSP) of the webpage. This allows you to capture content of webpages with strict CSPs.
delaynumberBodyDefault: 0The delay in seconds before rendering. This allows you to wait for specific elements to load before capturing the content. Maximum is 30.
timeoutnumberBodyDefault: 30The timeout in seconds for the rendering. This allows you to set a maximum time for the request to complete. Maximum is 30.
wait_untilstring[]BodyDefault: ['domcontentloaded']The conditions to wait for before rendering. This allows you to ensure that specific elements are loaded before capturing the content. Available options include:
load: Wait for the load event to be fired.domcontentloaded: Wait for the DOMContentLoaded event to be fired.networkidle0: Wait for no network connections for at least 500 ms.networkidle2: Wait for no more than 2 network connections to be active for at least 500 ms.
metadata_iconboolBodyDefault: falseWhether to include the metadata icon in the response. This allows you to capture the favicon of the webpage. The link of the icon will be included in the header X-Screenshotmax-Metadata-Icon.
metadata_titleboolBodyDefault: falseWhether to include the metadata title in the response. This allows you to capture the title of the webpage. The title will be included in the header X-Screenshotmax-Metadata-Title.
metadata_fontsboolBodyDefault: falseWhether to include the metadata fonts in the response. This allows you to capture the fonts used on the webpage. The fonts will be included in the header X-Screenshotmax-Metadata-Fonts.
metadata_hashboolBodyDefault: falseWhether to include the metadata hash in the response. This allows you to capture the hash of the webpage. The hash will be included in the header X-Screenshotmax-Metadata-Hash.
metadata_statusboolBodyDefault: falseWhether to include the metadata status in the response. This allows you to capture the HTTP status code of the webpage. The status code will be included in the header X-Screenshotmax-Metadata-Status.
metadata_headersboolBodyDefault: falseWhether to include the metadata headers in the response. This allows you to capture the headers of the webpage. The headers will be included in the header X-Screenshotmax-Metadata-Headers.
cacheboolBodyDefault: falseWhether to store the content of the rendering in the cache. This allows you to store the rendered content for a specified time-to-live (TTL) period.
cache_ttlnumberBodyDefault: 604800The time-to-live (TTL) for the cache in seconds. This allows you to set a maximum time for the cached resources to be valid. Maximum is 30 days in seconds (2592000).
asyncboolBodyDefault: falseWhether to use asynchronous processing for the request. This allows you to capture screenshots without blocking the request.
webhook_urlstringBodyThe callback URL for asynchronous processing. This allows you to receive the response via a webhook.
The webhook will be triggered when the response is ready.
The webhook URL must be a valid URL and must be accessible from the internet.
The webhook URL must be HTTPS and must support the POST method.
More information about webhooks can be found in the async & webhook documentation.
webhook_signedboolBodyDefault: trueIndicates whether the webhook request should be signed. Enabling this option allows you to verify the authenticity of incoming webhook requests. For more details, refer to the async & webhook documentation.
Response and error codes
Error Codes
Whenever you make a request that fails for some reason, an error is returned also in the JSON format. The errors include an error code and description, which you can find in detail below.
| Code | Type | Details |
|---|---|---|
| 200 | OK | The request was successful. |
| 400 | Bad request | The request was malformed or invalid. |
| 401 | Unauthorized | The request was rejected due to an invalid access key or missing signature when signed requests are enabled. |
| 403 | Forbidden | The signature provided is invalid. Occurs when signed requests are enabled. |
| 402 | Payment Required | Access denied due to an unpaid invoice. Applies to paid plans. |
| 423 | Locked | The request was denied due to insufficient quota. |
| 429 | Too Many Requests | The rate limit has been exceeded (too many requests per minute). |
| 500 | Internal server error | The request failed due to an internal server error. |