Tip jar

If you like CaB and wish to support it, you can use PayPal or KoFi. Thank you, and I hope you continue to enjoy the site - Neil.

Buy Me a Coffee at ko-fi.com

Support CaB

Recent

Welcome to Cook'd and Bomb'd. Please login or sign up.

April 27, 2024, 12:23:24 PM

Login with username, password and session length

How to download a Twitter account's tweets

Started by Retinend, April 23, 2023, 08:59:39 AM

Previous topic - Next topic

Retinend

I'm looking for a solution for how to download circa 35 000 tweets over a two decade span from an individual account. What is the best way to download these? Ideally it would also save the pictures and videos, and record URLs and metadata such as likes and retweets - not just the plain text of the tweets.

Retinend

...there's this thing called "Twint", but I looked at the Readme.txt and I saw this:

## Requirements
- Python 3.6;
- aiohttp;
- aiodns;
- beautifulsoup4;
- cchardet;
- elasticsearch;
- pysocks;
- pandas (>=0.23.0);
- aiohttp_socks;
- schedule;
- geopy;
- fake-useragent.

## Installing

**Git:**
```bash
git clone https://github.com/twintproject/twint.git
pip3 install -r requirements.txt
```

What does this even mean? All of those things need to be installed before I install this thing? I need to install this thing via apostrophe apostrophe apostrophe "bash"?

Cerys


Consignia

No, the requirements are fetched by the Python installer, but you will need Python installed, possibly Git as well if you if you use the Git option. Once you've got those installed you can run the install commands in either cmd (the command line tool) in Windows or a terminal if you are on MacOS or Linux.

I'm just having a play with it, but it might not be up to date. The last change was 2021, and these things can easily get out of date.

The thing is, this sort of data is very valuable, and you are usually charged for being easily able to access it. So I can't think you'll find any easy options for what you want. Things like this, just pretend they are web browsers and keep requesting the web page version. But they can change the underlying web page structure or apply rate limiting, which could easily break something like this.

The only other easy option is to request your data from Twitter. But I'm guessing this isn't your account?

Retinend

#4
That's right, I am not talking about my own account, but a public figure's account.

Thanks a lot for your help. I guess I was naive to think this was a problem with a simple solution.

edit: I am not giving up, but it is discouraging that I might go through all the effort of learning how this Twint thing works, only to find out it has been rendered inoperable by Twitter.

Cerys


Consignia

You may want to try this: https://github.com/twintproject/twint-zero

It's been updated more recently, so has a better chance of working. It needs go (https://go.dev/) instead of Python, and I think it's probably a lot harder to use.

Consignia


Retinend


Cerys


Memorex MP3

I think even before Twitter started to close off the APIs in a big way it was only possible to get the most recent 5000 tweets on an account. Even that might be a challenge now


Scraping from a source that has scraped from Twitter might be the most effective way.

Consignia

Quote from: Memorex MP3 on April 23, 2023, 01:16:10 PMScraping from a source that has scraped from Twitter might be the most effective way.

Yeah, that's what Twint-zero does, I believe.

Retinend

Quote from: Consignia on April 23, 2023, 12:25:35 PMYou may want to try this: https://github.com/twintproject/twint-zero

It's been updated more recently, so has a better chance of working. It needs go (https://go.dev/) instead of Python, and I think it's probably a lot harder to use.

Merci beaucoup

If old cuntchops wants to make money he should sell this as an official Twitter feature for blue checkers.

The existence of services like this one (which I have just asked for a quote) prove that there is some demand here:

Quote from: Consignia on April 23, 2023, 12:31:53 PMThere is also this: https://www.followersanalysis.com/old-tweets But you are in the realms of having to pay for it.

Retinend

Update: the followersanalysis people got back to me and told me that they would render their service to me for this individual's 35k tweets in CSV, Excel and JSON formats for 400 USD.

touchingcloth

It's so expensive due to Twitter doing so many poorly-batched RPCs.