added calufa2011 to the missing list

added a big missing one to the list
This commit is contained in:
@philshem 2019-12-15 21:04:37 +01:00 committed by GitHub
parent f7c5f0ad33
commit 9466c9dbaf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -99,7 +99,7 @@ Lost Datasets
* burger2011 - A corpus consisting of 213 million tweets from 18.5 million users, in many different languages. Collected as part of `[John D. Burger, John C. Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 13011309] <http://www.aclweb.org/anthology/D11-1120>`_.
* calufa2011 - 200+ million tweets from 13+ million users, 173 GB uncompressed, mysql format (543 million rows). The archive.org copy has been taken down: https://archive.org/details/2011-05-calufa-twitter-sql. Several mentions on HackerNews: https://news.ycombinator.com/item?id=2633384
Other Lists
===========