Every thing you always wanted to know about the Twitter API but were afraid to ask
Nowadays Twitter, feeding from million of users, is one of the largest real time information sources on the Internet For the last month I’ve been trying to find the answers for these questions:
- How many tweets are created per day?
- What data can be obtained from the Twitter API?
- What limits does the Twitter API have?
- What persistence do tweets have?
I created an infography asking this questions and a textual explanation for people to read.
The growth of Twitter users as well as the number of tweets is amazing. The tweets explosion started in 2009 and a year later the number of tweets per day has multiplied by 25. Due to this input traffic every global event is a huge test for Twitter, as happened with the World Cup a record of 3,283 TPS (Tweets Per Second) was reached, the average being 750 TPS. This amount of data makes it almost impossible for us to get this information (and to store it). We’ll need to think of some creative solutions to get the data needed.
What data can be obtained from the Twitter API?
The Streaming API allows near-real time access to various subsets of Twitter public statuses. A permanent connexion is established between a user and the Twitter servers and through an http request a continuous tweets flow is received in json format. It is possible to get some random tweets (statuses/simple) or some filter tweets by key words or by users (statuses/filter). However, the most interesting methods to get all statuses (statuses/firehose) or all statuses containing http: and https: (statuses/links) or all retweets (statuses/retweet) “is not a generally available resource”
The Search API returns tweets that match a specified query. Its depth is 7 days before. It allows filtering by source, language and location. Authentication is not required and formats provided are json o atom.
The REST API allows to access core Twitter data. This includes update timelines, status data, and user information. What ever you do on the Twitter web you could do it using the REST API. It requires authentication or not depending on type operation. It provides xml, json, rss, atom formats.
The Search API returns less information by tweet that the other two API’s. Only Id, screen_name and avatar url are given by each author.
What limits does the Twitter API have?
The streaming API is a continuous tweets flow from Twitter serves to user. The rate is conditioned by the broadband connection and the twitter server’s overload. Right now I’m receiving two streaming on two different servers at the Carlos III University and I’m measuring the rate. When I have results I’ll publish them.
In the Search API and in the REST API there is a limit of 150 requests per hour by user or by IP if the call is unauthenticated. It is important to know how to make pagination in the optimal way.
|API||Request||Maximum size per page||Total data|
|Search||search||200 tweets||1,500 tweets–|
|REST||statuses||200 tweets||3,200 tweets|
|REST||friends/ids||5,000 id users||All (*)|
|REST||followers/ids||5,000 id users||All (*)|
(*) It was tested with up to 4.5 million followers of Barack Obama
What persistence do tweets have?
Despite Tweets being stored at Twitter BB.DD. there is a temporal limitation to get them.
|API||Temporal limitation||Size limitation|
|Streaming||Only real time||–|
|Search||-7 days||1,500 last tweets|
|REST||NO||3,200 last tweets|