Effortlessly Scrape Twitter Following Data with Python & Selenium

Example of Scraped Data

{
  "userId": "95092020",
  "isBlueVerified": true,
  "following": false,
  "canDm": false,
  "canMediaTag": false,
  "createdAt": "Sun Dec 06 23:33:02 +0000 2009",
  "defaultProfile": false,
  "defaultProfileImage": false,
  "description": "Best-Selling Author | Clinical Psychologist | #1 Education Podcast | Enroll to @petersonacademy now:",
  "fastFollowersCount": 0,
  "favouritesCount": 161,
  "followersCount": 5613000,
  "friendCount": 1686,
  "hasCustomTimelines": true,
  "isTranslator": false,
  "listedCount": 14572,
  "location": "",
  "mediaCount": 7318,
  "name": "Dr Jordan B Peterson",
  "normalFollowersCount": 5613000,
  "pinnedTweetIdsStr": [
    "1849105729438790067"
  ],
  "possiblySensitive": false,
  "profileImageUrlHttps": "https://pbs.twimg.com/profile_images/1407056014776614923/TKBC60e1_normal.jpg",
  "profileInterstitialType": "",
  "username": "jordanbpeterson",
  "statusesCount": 51343,
  "translatorType": "none",
  "verified": false,
  "wantRetweets": false,
  "withheldInCountries": []
}

Run Code Directly Without Setup
Our guide provides full, ready-to-use code to scrape Twitter following data seamlessly. With Python and Selenium, automate data collection and capture performance logs efficiently. Unlock Twitter insights with no extra setup required!

Step 1: Set Up Your Environment

First, install Selenium for browser automation:

pip install -r requirements.txt

Step 2: Download ChromeDriver

Download ChromeDriver for Selenium to interact with the Chrome browser. Get it here: ChromeDriver Download

Step 3: Setting Chrome Options

self.options = webdriver.ChromeOptions()
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
self.options.add_argument(f'user-agent={user_agent}')
self.options.add_argument('--disable-gpu')
self.options.add_argument('--no-sandbox')
self.options.add_argument('--disable-dev-shm-usage')
self.options.add_argument(f"--remote-debugging-port={remote_debugging_port}")


js_script_name = modify_random_canvas_js()
self.browser = self.get_browser(script_files=[js_script_name], record_network_log=True, headless=True)

Step 4: Access the Target Page

self.browser.switch_to.new_window('tab')
url = 'https://x.com/1_usd_promotion/following'
self.browser.get(url=url)

time.sleep(2)

exist_entry_id = []

self.get_network(exist_entry_id, result_list)

print(f'tweet result length = {len(result_list)}')

Step 5: Get the Browser Performance Log

performance_log = self.browser.get_log("performance")
for packet in performance_log:

    msg = packet.get("message")
    message = json.loads(packet.get("message")).get("message")
    packet_method = message.get("method")

    if "Network" in packet_method and 'Following' in msg:

        request_id = message.get("params").get("requestId")

        resp = self.browser.execute_cdp_cmd('Network.getResponseBody', {'requestId': request_id})

Step 6: Extract Data from the Response

body = resp.get("body")
body = json.loads(body)
instructions = body['data']['user']['result']['timeline']['timeline'].get('instructions', None)
if not instructions:
    continue
for instruction in instructions:
    entries = instruction.get('entries', None)

Step 7: Sample Response Data

{
  "userId": "95092020",
  "isBlueVerified": true,
  "following": false,
  "canDm": false,
  "canMediaTag": false,
  "createdAt": "Sun Dec 06 23:33:02 +0000 2009",
  "defaultProfile": false,
  "defaultProfileImage": false,
  "description": "Best-Selling Author | Clinical Psychologist | #1 Education Podcast | Enroll to @petersonacademy now:",
  "fastFollowersCount": 0,
  "favouritesCount": 161,
  "followersCount": 5613000,
  "friendCount": 1686,
  "hasCustomTimelines": true,
  "isTranslator": false,
  "listedCount": 14572,
  "location": "",
  "mediaCount": 7318,
  "name": "Dr Jordan B Peterson",
  "normalFollowersCount": 5613000,
  "pinnedTweetIdsStr": [
    "1849105729438790067"
  ],
  "possiblySensitive": false,
  "profileImageUrlHttps": "https://pbs.twimg.com/profile_images/1407056014776614923/TKBC60e1_normal.jpg",
  "profileInterstitialType": "",
  "username": "jordanbpeterson",
  "statusesCount": 51343,
  "translatorType": "none",
  "verified": false,
  "wantRetweets": false,
  "withheldInCountries": []
}

Step 8: Important Considerations

Log in to Twitter and get your Twitter Cookie. Learn How to Get Twitter Cookie
Use APIs from Apify
Get the full code from GitHub
Join our discussion group! Click Here

FAQ: Frequently Asked Questions

Q: What is Web Scraping?
Web scraping is like using a special tool to collect information from websites automatically. Imagine a robot that helps gather data from a page so you don’t have to do it manually. In this case, we're focusing on Twitter data using Python and Selenium.
Q: How do I start scraping Twitter data?
To start scraping Twitter data, you first need to set up your computer. This includes installing software called Selenium, which helps you control web browsers. Then, you download ChromeDriver, a helper tool for Google Chrome that allows Selenium to work with it.
Q: What is ChromeDriver and why do I need it?
ChromeDriver is like a translator for Selenium and Google Chrome. It helps Selenium understand how to interact with the Chrome browser. You need it so that Selenium can automate actions like clicking buttons or entering information on Twitter.
Q: What is a performance log in scraping?
A performance log is like a diary that records everything happening during your web scraping. It keeps track of all the data exchanges between your scraper (Selenium) and the Twitter page, helping you understand what requests your program is making.
Q: What should I consider before scraping Twitter?
Before scraping Twitter, you need to log in to your Twitter account and get something called an auth_token, which proves you are allowed to access Twitter's data. Also, be careful to respect Twitter's rules so you do not get blocked.
Q: How do I avoid getting blocked while scraping?
To avoid getting blocked, make sure to introduce delays between requests, rotate proxies, and avoid overwhelming Twitter's servers with too many requests in a short period.