For researchers

All text and media content of this project can be downloaded by anyone. Due to the significant level of interest in this dataset, you will have to cover your own bandwidth costs from an AWS account.

aws s3 cp --request-payer requester s3://adatascientist/parler/v1/post.ndjson.zip .
The text files are over 60 GB uncompressed, and media files are over 1 TB uncompressed.

The current base S3 bucket URL is s3://adatascientist/parler/v1/. The ZIP files are

  • post.ndjson.zip - all original posts (3.27 GB zipped, 7.83 GB unzipped)
  • comment.ndjson.zip - all comments on posts (13.14 GB zipped, 32.4 GB unzipped)
  • echo.ndjson.zip - all echos, where a user reposts another post (3.27 GB zipped, 7.83 GB unzipped)

Please contact me if you would like to discuss

  • other potential open research goals to list above
  • alternate ways to share this data with researchers
  • contributing any scraped data to this archive
  • ways to reduce my AWS server bill
  • comments, suggestions, project ideas, or anything else
Open Research Goals
  • Differentiate alt-right speech from conservative discourse. Is there a reliable way to differentiate a strong opinion from a call to violence? Being able to automatically distinguish between the two would allow a determination of the approximate percentage of Parler users that actuall engaged in violent rhetoric.
  • Quantify the prevalence of violent speech on Parler. Media coverage has focused intensely on the most shocking content written on Parler, which may not be truly representative of the typical discourse that took place on it.
  • Measure real-world effects from disinformation spread on Parler. For example, there may be a relation between coronavirus disinformation and hospital admissions in a community.
History

The weekend before AWS took Parler offline, an Internet activist noticed a way to legally scrape most of Parler's data, which was carried out by a worldwide group of volunteers. Thanks to the collective effort of archivists who saved this information, we can now get a clear retrospective picture on the discourse that was actually taking place.