Scrape Twitter Followers with Python and Selenium
Example of Scraped Data
{
"userId": "1710236730010349568",
"isBlueVerified": false,
"following": false,
"canDm": false,
"canMediaTag": true,
"createdAt": "Fri Oct 06 10:13:15 +0000 2023",
"defaultProfile": true,
"defaultProfileImage": true,
"description": "",
"fastFollowersCount": 0,
"favouritesCount": 456,
"followersCount": 64,
"friendCount": 7320,
"hasCustomTimelines": false,
"isTranslator": false,
"listedCount": 0,
"location": "",
"mediaCount": 0,
"name": "Paislie Dimitrov",
"normalFollowersCount": 64,
"pinnedTweetIdsStr": [],
"possiblySensitive": false,
"profileImageUrlHttps": "https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png",
"profileInterstitialType": "",
"username": "PaisliDimit",
"statusesCount": 0,
"translatorType": "none",
"verified": false,
"wantRetweets": false,
"withheldInCountries": []
}
Run Code Directly Without Setup
Our guide provides full, ready-to-use code to scrape Twitter followers data seamlessly. With Python and Selenium, automate data collection and capture performance logs efficiently. Unlock Twitter insights with no extra setup required!
Step 1: Set Up Your Environment
First, install Selenium, which will allow us to automate browser actions:
pip install -r requirements.txt
Step 2: Download ChromeDriver
You can find the corresponding chromeDriver from here download chromeDriver
Step 3: Setting Chrome Options
self.options = webdriver.ChromeOptions()
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
self.options.add_argument(f'user-agent={user_agent}')
self.options.add_argument('--disable-gpu')
self.options.add_argument('--no-sandbox')
self.options.add_argument('--disable-dev-shm-usage')
self.options.add_argument(f"--remote-debugging-port={remote_debugging_port}")
js_script_name = modify_random_canvas_js()
self.browser = self.get_browser(script_files=[js_script_name], record_network_log=True, headless=True)
Step 4: Access To The Target Page
self.browser.switch_to.new_window('tab')
url= 'https://x.com/1_usd_promotion/verified_followers'
self.browser.get(url=url)
time.sleep(2)
exist_entry_id = []
self.get_network(exist_entry_id, result_list)
print(f'tweet result length = {len(result_list)}')
Step 5: Get The Browser Performance Log
performance_log = self.browser.get_log("performance")
for packet in performance_log:
msg = packet.get("message")
message = json.loads(packet.get("message")).get("message")
packet_method = message.get("method")
if "Network" in packet_method and 'Following' in msg:
request_id = message.get("params").get("requestId")
resp = self.browser.execute_cdp_cmd('Network.getResponseBody', {'requestId': request_id})
Step 6: Extract Data From Response
body = resp.get("body")
body = json.loads(body)
instructions = body['data']['user']['result']['timeline']['timeline'].get('instructions', None)
if not instructions:
continue
for instruction in instructions:
entries = instruction.get('entries', None)
Step 7: Important Considerations
- Login to Twitter then get auth_token.Learn How to Get Auth Token
- You can use api from Apify
- You can get full code from GitHub
- Join our discussion group! Click Here
FAQ: Frequently Asked Questions
- Q: What is Web Scraping?
Web scraping is like using a special tool to collect information from websites automatically. Imagine a robot that helps gather data from a page so you don’t have to do it manually. In this case, we're focusing on Twitter data using Python and Selenium.
- Q: How do I start scraping Twitter data?
To start scraping Twitter data, you first need to set up your computer. This includes installing software called Selenium, which helps you control web browsers. Then, you download ChromeDriver, a helper tool for Google Chrome that allows Selenium to work with it.
- Q: What is ChromeDriver and why do I need it?
ChromeDriver is like a translator for Selenium and Google Chrome. It helps Selenium understand how to interact with the Chrome browser. You need it so that Selenium can automate actions like clicking buttons or entering information on Twitter.
- Q: What is a performance log in scraping?
A performance log is like a diary that records everything happening during your web scraping. It keeps track of all the data exchanges between your scraper (Selenium) and the Twitter page, helping you understand what requests your program is making.
- Q: What should I consider before scraping Twitter?
Before scraping Twitter, you need to log in to your Twitter account and get something called an auth_token, which proves you are allowed to access Twitter's data. Also, be careful to respect Twitter's rules so you do not get blocked.
- Q: How do I avoid getting blocked while scraping?
To avoid getting blocked, make sure to introduce delays between requests, rotate proxies, and avoid overwhelming Twitter's servers with too many requests in a short period.