Structured Web Data API Reference

Our API is great way to programmatically monitor continuously updated public web sources and extract fresh structured data.

Preface

Structured web data is useful for various workflows, pipelines or platform data ingestion. New Sloth helps monitor public websites with continuously updated news, articles, blogs, insights, press releases, discussion forums, reviews, jobs, events, products, or other public updates, and extract fresh structured data feeds (JSON and XML/RSS) of source changes and updates.

Our API is a great way for developers, data teams and AI agents to programmatically define, manage and receive structured web data for reliable processing, analysis, reporting etc.

The API allows instant monitoring and data extraction from public webpages, by utilizing our powerful AI and ML capabilities to auto-detect, parse and extract relevant content as structured data.

Structured data feeds can also be created using our Web-based feed builder, which offers two modes: a simple point-and-click Visual Selector, and the more powerful HTML-tags-based Advanced Refinement.

Overview

The New Sloth API is organized around REST, which is based on HTTP application protocol, making the API model simple, high-performance, and language-agnostic.

News RSS and JSON feeds REST API architecture

Our API has resource-oriented URLs, and uses HTTP response codes to indicate an API operation status or error. We use built-in HTTP features, like HTTP authentication and HTTP verbs, which are understood by most HTTP clients and libraries.

The API endpoint is mentioned below:

https://app.newsloth.com/api/v1

API responses are returned as JSON (default) or XML, based on the request's HTTP 'Accept' header value for content negotiation (note: use 'application/json' or 'application/xml' MIME types). API request and response examples are provided below.

We can make backwards-compatible changes to the API at any time. We will release a new version of the API endpoint if any major backwards-incompatible changes are made. API related announcements are made via our newsletter (sent to all customers), on our blog and Twitter profile.

Authentication

You must authenticate your account by including your API key and secret in all API requests.

Your API key and secret can be found under your account profile, once you've signed-up for a plan that supports API access. You can change the API secret from your account at any time. Be sure to keep your API key and secret safe, as they allow access to your account data. Do not share your API key or secret in publicly accessible areas, client-side code etc. If you suspect that your API secret has been exposed, then change it immediately from your account profile.

Authentication to the API is performed via HTTP Basic Auth. Provide your API key as the basic auth username value, and the secret as the password value. Most HTTP clients allow a HTTP 'Authorization' header value, that should be set to a Base64 encoded string of your API key and secret combined with a colon.

Here's a 'curl' command line example with the HTTP 'Authorization' header, that can be translated to any programming language or library:

curl https://app.newsloth.com/api/v1/sources -H "Authorization: Basic [Base64 string of API_key:Secret]"

Here's another 'curl' command line example, that uses its '-u' flag to directly pass basic auth credentials:

curl https://app.newsloth.com/api/v1/sources -u [API_key]:[Secret]

You can also authenticate via Bearer auth, in which case, set the HTTP 'Authorization' header value to a Base64 encoded string token of your API key and secret combined with a colon:

curl https://app.newsloth.com/api/v1/sources -H "Authorization: Bearer [Base64 string of API_key:Secret]"

API requests without authentication, or with an invalid authentication, will fail.

All API requests must be made over HTTPS. Calls made over plain HTTP are insecure, and not recommended at all.

Errors

We use conventional HTTP response codes to indicate the success or failure of an API request.

Codes in the 2xx range indicate success, in general.

Codes in the 4xx range indicate a client error based on the request or data sent (e.g., authentication failure, a required parameter was not specified, a validation failed etc.). For example:

  • 400 - Bad Request: The request was unacceptable, often due to missing a required parameter or invalid data.
  • 401 - Unauthorized: No API key or secret provided, or invalid authentication credentials.
  • 404 - Not Found: The requested resource doesn't exist.
  • 415 - Unsupported Media Type: The request was unacceptable due to unsupported content type.
  • 429 - Too Many Requests: Too many requests hit the API too quickly. We recommend an exponential backoff of your requests.

Codes in the 5xx range, although rare, indicate a server error caused when our API servers fail to fulfill a request.

API access is rate-limited, but sufficient for high-throughput. Excessive API calls in a very short period, or any type of abuse, will automatically cause access to be revoked for the violating account, and eventual account termination without notice.

Sources

A source object describes a public webpage source.

The source object
Attributes

title

string
Given title/name of the source.

sourceUrl

string
(Input) URL of the source webpage.

feedRssUrl

string
(Output) Structured data feed URL. Also used as a unique identifier for the source object. Data feed is available as XML/RSS (default) or JSON (replace URL file path extension from .rss to .json)

created

timestamp
Time (UTC) at which the source object was created.
Measured in seconds since the Unix epoch.

checked

timestamp
Time (UTC) at which the source was last checked for an update.
Measured in seconds since the Unix epoch.

extractImages

boolean
Has the value 'true' if images can be extracted from the source webpage, else has the value 'false'.

merged

boolean
Has the value 'true' if the source is selected by the user to be merged in a combined 'stream' feed, else has the value 'false'.

active

boolean
Has the value 'true' if the source is active, else has the value 'false' if the source is inactive or broken.

lastCheckFailed

boolean
Has the value 'true' if the last check of the source failed (due to being repeatedly inaccessible or no articles found), else has the value 'false'.
List all sources

Retrieve a list of existing sources from the user account, as a collection of source objects, along with a total count.

Method & endpoint
GET https://app.newsloth.com/api/v1/sources
Arguments not required
Returns
A dictionary is returned, with a 'data' property that contains an array of source objects, and a 'count' property that contains the number of source objects. If no sources exist, the resulting array will be empty.
Example response
JSON

{
  "count": 78,
  "data": [
    {
      "title": "Example Source",
      "sourceUrl": "https://example.com",
      "feedRssUrl": "https://app.newsloth.com/example-com/AbCdEfGh.rss",
      "created": 1447160981,
      "checked": 1519408387,
      "extractImages": true,
      "merged": false,
      "active": true,
      "lastCheckFailed": false
    },
  ]
}
XML

<sources xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <count>78</count>
  <data>
    <source>
      <title>Example Source</title>
      <sourceUrl>https://example.com</sourceUrl>
      <feedRssUrl>https://app.newsloth.com/example-com/AbCdEfGh.rss</feedRssUrl>
      <created>1447160981</created>
      <checked>1519408387</checked>
      <extractImages>true</extractImages>
      <merged>false</merged>
      <active>true</active>
      <lastCheckFailed>false</lastCheckFailed>
    </source>
  </data>
</sources>
Create a source

Create a new source, and get its resulting object. The source will be saved in the user account, if the user plan's sources allowance permits (i.e. within limit).

For a webpage source, our data engine will use AI and machine learning to detect relevant content settings for the source from its webpage. If auto-detection cannot find any relevant articles, then the source is marked as broken, so that it can be manually edited with the feed builder.

Method & endpoint
POST https://app.newsloth.com/api/v1/sources
Arguments (body/form data)

title

required string
Given title/name for the new source.

sourceUrl

required string
URL of the new source webpage.

detectAll

optional boolean

For a webpage source, set it to 'true' if all possible item elements (title, link, summary/description, image and publication date) should be auto-detected by the data engine. Else set to 'false' (default) to only detect title and link for feed items.
Returns
A source object is returned, if the new source is successfully saved in the user account. Object's 'checked' timestamp property value will be zero (0), because the first source check will be pending.
Example request
curl https://app.newsloth.com/api/v1/sources -u [API_key]:[Secret] -d title="..." -d sourceUrl="..." -d detectAll=false
Example response
JSON

{
  "title": "Example Source",
  "sourceUrl": "https://example.com",
  "feedRssUrl": "https://app.newsloth.com/example-com/AbCdEfGh.rss",
  "created": 1447160981,
  "checked": 0,
  "extractImages": true,
  "merged": false,
  "active": true,
  "lastCheckFailed": false
}
XML

<source xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <title>Example Source</title>
  <sourceUrl>https://example.com</sourceUrl>
  <feedRssUrl>https://app.newsloth.com/example-com/AbCdEfGh.rss</feedRssUrl>
  <created>1447160981</created>
  <checked>0</checked>
  <extractImages>true</extractImages>
  <merged>false</merged>
  <active>true</active>
  <lastCheckFailed>false</lastCheckFailed>
</source>
Delete a source

Delete an existing source from the user account. Deleted sources cannot be restored.

Method & endpoint
DELETE https://app.newsloth.com/api/v1/sources
Arguments (query-string parameters)

feedRssUrl

required string
Unique feed URL (XML/RSS) for your source to be deleted, sent as a querystring parameter (see example below).
Returns
An empty body with a 'success' status code is returned if the source was found & deleted. Otherwise, this call returns an error, if the source was not found in the user account, or already deleted, or the source couldn't be deleted due to a server error.
Example request
curl -X DELETE https://app.newsloth.com/api/v1/sources?feedRssUrl=[URL] -u [API_key]:[Secret]
Contact us if you have any questions, feature requests, or if you need any help with using the API or integrating structured data feeds.