From media and brand monitoring to machine learning and security applications, up-to-date web data plays an integral role in any modern information pipeline that depends on public data. Whether you need to find the latest trends, measure sentiment for your recent product release, understand the general reaction to certain news or monitor the growth and popularity of your brand, you will have to, quickly, go through millions of relevant articles and posts from all around the web quickly without breaking the bank!
The news
table is our answer to the incredibly difficult and complex problem of finding, aggregating, and processing the latest articles, posts, and pages from all around the web. Using the news
table you will never have to run another news crawler, blog scraper, or RSS feed aggregator. Every day we visit hundreds of millions of web pages, find and cache the latest published posts and articles and provide them to you as a simple database table that you can run SQL queries against.
Example
Similar to the other tables in the Mixnode ecosystem, every row in the news
table corresponds to a page from the web identified by the url
value. Many other columns are provided to write flexible queries and help you with processing the data. For example, publication_date
corresponds to the date the page was published on, so in order to retrieve the URLs of all pages published on May 25, 2019 you could simply run the following query:
select
url
from
news
where
cast (publication_date as varchar) = '2019-05-25'
content_language
is another useful columns that allows you to narrow your queries further down based on the language of the page. If you wanted to find all the English news pages published on May 25, 2019, you could simply modify the previous query like the following:
select
url
from
news
where
cast (publication_date as varchar) = '2019-05-25'
and
content_language = 'en'
Did you only need English articles from May 25, 2019, that mention 'bitcoin' in the title? No problem!
select
url
from
news
where
cast (publication_date as varchar) = '2019-05-25'
and
content_language = 'en'
and
lower(title) like '%bitcoin%'
The news
table provides many more columns such as author_meta_tag
, description_meta_tag
, url_host
, ... Using these columns you can write and execute queries with a variety of conditions to extract data from millions of news pages and blog posts. Additionally, if you prefer other processing methods, you can always request firehose access to the news
table to use your own tools to process the data.
Give it a try!
We are incredibly excited to share the new news
table with our users and look forward to all the innovation that will be unleashed by simple, affordable access to large-scale news data. Give it a try and contact us at hi@mixnode.com if you have any questions or comments.