25 September 2023

Export your Twitter archive – and practical ways to use it

As many feared previously, under Elon Musk’s management Twitter has been in a perpetual state of disarray, to the point of having its former identity erased when Musk decided to rebrand the site as X. Currently there doesn’t seem to be a clear alternative on the horizon, despite numerous attempts with different approaches. The constant erosion of the platform, and various rumors that Musk considers removing free accounts, have made me reflect on my options when this site – inevitably – goes dark. While I obviously can’t do anything to prevent it, like any other user I can at least backup my data. I gather links and various information in my Twitter likes, so this data is particularly important to keep for future reference.

X archive ready to download

The process is fairly straightforward: on the home page in the left-hand menu under ‘More’ go to ‘Settings and privacy’, then ‘Your account’ and finally ‘Download an archive of your data’. After confirming with your password and a verification code by email, Twitter will start generating the archive and will notify you with an in-app notification when it’s available for download. You receive a .zip file which can be quite large and contains a lot of files – mine is over 900MB and has more than 12,000 files.

The question then becomes: what to do with that much data once downloaded? The archive does come with a handy overview file called ‘Your archive.html’ where you can visually navigate through various categories from your tweets and likes to ads and personalization data – Twitter doesn’t seem to know that much about me. Interestingly this archive lets you search your liked tweets, something that was never possible live on the site.

X archive HTML viewer

If you need something more in-depth, you can browse the subfolders included in the archive. I’m pretty sure ‘assets’ is only used for rendering the HTML archive visualizer, while personal data is exported to the ‘data’ folder. In my case, most of the files were images and clips attached to my tweets, gathered in the ‘tweets_media’ folder. I have no real use for these, since I haven’t uploaded personal images to Twitter; the media here are just images from the links I tweeted over the years.

The text contents of tweets are exported in a series of JSON files. To make their contents easier to parse, I used what some may consider an unlikely method: importing them into Excel via Power Query. For like.js, this results in a simple table with three columns: tweetId, fullText, and expandedUrl.

One thing to note here is that the original JSON file in the archive is not accepted by Excel with the following error: ‘DataFormat.Error: We found an unexpected character in the JSON input’. After some poking around, I managed to work around it with a small change to the JSON file:

  • at the start of the file, replace the first line window.YTD.like.part0 = [ with { "window.YTD.like.part0": [
  • at the end of the file add a closing curly bracket to match the opening bracket added at the start.
X archive edit JSON file containing likes

Once converted to Excel, it becomes more practical to query your database of likes: you can easily search or filter the fullText column for various keywords, then copy the corresponding expandedUrl to open it in a browser. As mentioned before, Twitter never had a search function for likes, so to find an older liked tweet I had to mindlessly scroll down through the entire list and wait for them to load. Naturally, it’s cumbersome to do this constantly for the search alone, but in my case I have over 7,000 liked tweets, so downloading the archive occasionally has some benefits.

X archive of likes as table in Excel

A bigger caveat is that the URLs in the export are either links to the Twitter status, or wrapped with their t.co shortener, meaning that whenever Twitter goes down (temporarily or permanently) these links will cease to work… Same thing would happen if the author deletes the tweet, makes his account private, or closes it for good. We’ve already seen how Musk can leverage the shortener to influence content on the site, as it was recently revealed that links to The New York Times were artificially throttled to discourage people from sharing and reading their articles. Unfortunately, even if you are regularly backing up your Twitter data, there’s no guarantee that it will remain usable in the future…

Post a Comment