The Crawl Tool API lets you access data about crawled sites programmatically.

Home Public API

The Public API

Sometimes you may not want to use The Crawl Tool to access your data, but may want to access it programmatically instead. It's your data - so why not? The Crawl Tool public API allows you to do exactly that.

Forming Requests

All requests need to pass your The Crawl Tool API key as the bearer token on the request. You can find this by logging into The Crawl Tool, dropping down the menu with your name at the top left, and selecting settings. The API key is on the settings screen (currently under the title "Wordpress Connection" but this will change soon). Or email support@thecrawltool.com for assistance.

The API is a JSON rest API. The API endpoint is https://www.thecrawltool.com/public_api/<function>

Followed by the function. We currently have three functions:

  • getProjects

  • getReports

  • getReportRows

Each API request costs 1 credit from your The Crawl Tool account to make (this nominal amount is just to encourage efficiency in making requests)

getProjects

The getProjects call will return the projects on the account that matches the API key in the bearer token.

curl -X GET https://www.thecrawltool.com/public_api/getProjects -H 'Authorization: Bearer <api key here>' -H 'Accept: application/json'

Will return json like:

{"success":true,"data":[{"id":x,"name":"Sitename","baseurl":"https:\/\/somesitenamehere","crawling":0},{"id":x,"name":"Sitename","baseurl":"https:\/\/somesitenamehere\/","crawling":0}]}

The key fields are:
id - the project id is a numeric identification for this project.

name - the name of the project

baseurl - the project base url (the url it starts crawling from)

crawling - 0 or 1. If this is set to 1 then the project is currently crawling. Report data may be changing whilst crawling, you should probably wait!

getReports

The getReports call will return the reports available on a given project id (from getProjects). The caller must pass the project_id in the request json body.

curl -X POST https://www.thecrawltool.com/public_api/getReports -H 'Authorization: Bearer <api key here>' -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"project_id": <id here>}'

Will return json like:

{"success":true,"data":[{"id":x,"name":"Broken Links"},{"id":x,"name":"Meta Descriptions"},{"id":20,"name":"Meta Keywords"}]}

The key fields are:

id - the report id is a numeric identifier for the report

name - the name of the report as it appears on the dropdown at the top in The Crawl Tool.

getReportRows

The getReportRows call will return the rows from a report for a given report id. The caller must pass the report_id in the request json body. The return has a limit of 100000 rows return. The caller can optionally pass the page parameter in the request json body.

curl -X POST https://www.thecrawltool.com/public_api/getReportRows -H 'Authorization: Bearer <api key here>' -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"report_id": <report id here>}'

Will return json like:

{"success":true,"data":{"rows":[{"cell_1":"Source URL","cell_2":"Broken Link","cell_3":"Anchor","cell_4":"","cell_5":"","cell_6":"","cell_7":"","cell_8":"","cell_9":"","cell_10":"","cell_11":"","cell_12":"","cell_13":"","cell_14":"","cell_15":"","cell_16":"","cell_17":"","cell_18":"","cell_19":"","cell_20":""},{"cell_1":"data1","cell_2":"data2","cell_3":"data3","cell_4":"","cell_5":"","cell_6":"","cell_7":"","cell_8":"","cell_9":"","cell_10":"","cell_11":"","cell_12":"","cell_13":"","cell_14":"","cell_15":"","cell_16":"","cell_17":"","cell_18":"","cell_19":"","cell_20":""}],"total_rows":3,"current_page":1,"total_pages":1}}

The key field are:

Total rows, current_page, total_pages - allow you to calculate if you need to send the optional page parameter to fetch more rows.

Under rows - each column is labelled cell_1, cell_2, cell_3 ... cell_20 . rows contains all rows and these columns, making up the report. Each report varies by what data is in what column. The first row will always contain the headers of the report columns.

Future and Support

This API will likely expand over time, for now it should allow access to all data from your crawled projects. If you have any issues or questions, please contact support@thecrawltool.com