Scrape Twitter Tweets with Python and Selenium
Example of Scraped Data
{
"type": "tweet",
"id": 1843447413824209200,
"viewCount": "51275823",
"url": "https://x.com/elonmusk/status/1843447413824209160",
"twitterUrl": "https://twitter.com/elonmusk/status/1843447413824209160",
"text": "It is a surefire way for the Dems to turn America in a one-party state, just like California",
"isQuote": true,
"retweetCount": 59493,
"replyCount": 11090,
"likeCount": 250068,
"quoteCount": 1661,
"createdAt": "Tue Oct 08 00:24:47 +0000 2024",
"lang": "en",
"quoteId": "1843379457605939258",
"bookmarkCount": 11177,
"isReply": false,
"source": "Twitter for iPhone",
"author": {
"type": "user",
"username": "elonmusk",
"url": "https://x.com/elonmusk",
"twitterUrl": "https://x.com/elonmusk",
"id": "44196397",
"name": "Elon Musk",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": "",
"profilePicture": "https://pbs.twimg.com/profile_images/1849727333617573888/HBgPUrjG_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/44196397/1726163678",
"description": "Read @America to understand why I’m supporting Trump for President",
"location": "",
"followers": 202400789,
"following": 794,
"protected": false,
"status": "",
"canDm": false,
"canMediaTag": false,
"createdAt": "Tue Jun 02 20:12:29 +0000 2009",
"advertiserAccountType": "",
"analyticsType": "",
"entities": {
"description": {
"urls": []
},
"url": {
"urls": [
{
"display_url": "TheAmericaPAC.org",
"expanded_url": "http://TheAmericaPAC.org",
"url": "https://t.co/DjyKIO6ePx",
"indices": [
0,
23
]
}
]
}
},
"fastFollowersCount": 0,
"favouritesCount": 83676,
"geoEnabled": false,
"hasCustomTimelines": true,
"hasExtendedProfile": false,
"isTranslator": false,
"mediaCount": 2637,
"profileBackgroundColor": "",
"statusesCount": 55447,
"translatorTypeEnum": "",
"withheldInCountries": [],
"affiliatesHighlightedLabel": {
"label": {
"url": {
"url": "https://twitter.com/X",
"urlType": "DeepLink"
},
"badge": {
"url": "https://pbs.twimg.com/profile_images/1683899100922511378/5lY42eHs_bigger.jpg"
},
"description": "X",
"userLabelType": "BusinessLabel",
"userLabelDisplayType": "Badge"
}
}
},
"quote": {
"type": "tweet",
"id": "1843379457605939258",
"text": "Elon Musk explains how this will be our last real election if Kamala Harris wins.\n\nEveryone must watch this. https://t.co/DoBh9qM7K7",
"retweetCount": 10725,
"replyCount": 1848,
"likeCount": 38268,
"quoteCount": 790,
"createdAt": "Mon Oct 07 19:54:45 +0000 2024",
"lang": "en",
"bookmarkCount": 5143,
"author": {
"type": "user",
"username": "EndWokeness",
"url": "https://x.com/EndWokeness",
"twitterUrl": "https://x.com/EndWokeness",
"id": "1552795969959636992",
"name": "End Wokeness",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": "",
"profilePicture": "https://pbs.twimg.com/profile_images/1563691268793946117/OedvhFeS_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/1552795969959636992/1720913469",
"description": "Fighting, exposing, and mocking wokeness. DM for submissions",
"location": "",
"followers": 3107102,
"following": 1177,
"protected": false,
"status": "",
"canDm": true,
"canMediaTag": true,
"createdAt": "Thu Jul 28 23:20:28 +0000 2022",
"advertiserAccountType": "",
"analyticsType": "",
"entities": {
"description": {
"urls": []
}
},
"fastFollowersCount": 0,
"favouritesCount": 13138,
"geoEnabled": false,
"hasCustomTimelines": true,
"hasExtendedProfile": false,
"isTranslator": false,
"mediaCount": 7219,
"profileBackgroundColor": "",
"statusesCount": 15502,
"translatorTypeEnum": "",
"withheldInCountries": [],
"affiliatesHighlightedLabel": {}
}
}
}
Run Code Directly Without Setup
Our guide provides full, ready-to-use code to scrape Twitter tweets data seamlessly. With Python and Selenium, automate data collection and capture performance logs efficiently. Unlock Twitter insights with no extra setup required!
Advanced Tweet Filtering
Utilize the advanced search capabilities of Twitter to target specific tweets that match your criteria. With the ability to filter by keywords, dates, and hashtags, you can collect precise data, ensuring your research is relevant and focused.
Step 1: Set Up Your Environment
First, install Selenium, which will allow us to automate browser actions:
pip install -r requirements.txt
Step 2: Download ChromeDriver
You can find the corresponding chromeDriver from here download chromeDriver
Step 3: Run Chrome for Testing
This step is for debugging to see the effect. If you don’t want to see it, you can skip this step.
@echo off
start C:\software\chrome-win64\chrome.exe --remote-debugging-port=9223
Step 4: Setting Chrome Options
self.options = webdriver.ChromeOptions()
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
self.options.add_argument(f'user-agent={user_agent}')
self.options.add_argument('--disable-gpu')
self.options.add_argument('--no-sandbox')
self.options.add_argument('--disable-dev-shm-usage')
self.options.add_experimental_option("debuggerAddress", "localhost:9223")
js_script_name = modify_random_canvas_js()
self.browser = self.get_browser(script_files=[js_script_name], record_network_log=True, headless=True)
Step 5: Search for Tweet Data Using Selenium
self.browser.switch_to.new_window('tab')
url = "https://x.com/explore"
self.browser.get(url=url)
search_box = self.browser.find_element(By.CSS_SELECTOR, '[data-testid="SearchBox_Search_Input"]')
search_box.send_keys(Keys.CONTROL + "a") # Select all text
search_box.send_keys(Keys.DELETE)
self.browser.implicitly_wait(20)
search_box.send_keys(search_query)
# Press Enter to submit the search
search_box.send_keys(Keys.RETURN)
self.browser.implicitly_wait(1000)
second_div = self.browser.find_element(By.CSS_SELECTOR,'[data-testid="ScrollSnap-List"] [role="presentation"]:nth-of-type(2)')
Step 6: Monitor the browser network response
performance_log = self.browser.get_log("performance")
for packet in performance_log:
msg = packet.get("message")
message = json.loads(packet.get("message")).get("message")
packet_method = message.get("method")
if "Network" in packet_method and 'SearchTimeline' in msg:
document_url = message['params'].get('documentURL')
if (not document_url) or ('&f=live' not in document_url):
continue
request_id = message.get("params").get("requestId")
Step 7: Extract Data from response
entries = json.loads(body)['data']['search_by_raw_query']['search_timeline']['timeline']['instructions'][0].get('entries', '')
if not entries:
continue
for entry in entries:
item_content = entry['content'].get('itemContent', '')
if not item_content:
continue
tweet_result = entry['content']['itemContent']['tweet_results']['result']
entry_id = entry['entryId']
Step 8: Important Considerations
- Login to Twitter then get Twitter Cookie.Learn How to Get Twitter Cookie
- You can use api from Apify
- You can get full code from GitHub
- Join our discussion group! Click Here
FAQ: Frequently Asked Questions
- Q: What is Twitter scraping with Python and Selenium?
Twitter scraping is the process of collecting data from Twitter using automated tools like Python and Selenium. These tools allow you to simulate a browser, search for tweets, and gather information without manually searching.
- Q: Why would I want to scrape Twitter data?
Scraping Twitter data can help you gather information for research, track specific topics, or analyze trends. It's useful for students, businesses, and anyone interested in understanding public opinions and discussions.
- Q: What is ChromeDriver and why do I need it?
ChromeDriver is like a translator for Selenium and Google Chrome. It helps Selenium understand how to interact with the Chrome browser. You need it so that Selenium can automate actions like clicking buttons or entering information on Twitter.
- Q: What is a performance log in scraping?
A performance log is like a diary that records everything happening during your web scraping. It keeps track of all the data exchanges between your scraper (Selenium) and the Twitter page, helping you understand what requests your program is making.
- Q: Can I run the scraping script without seeing the browser?
Yes, you can run your script in a 'headless' mode, where the browser operates in the background, so you won't see it on your screen.
- Q: Are there any tools I need to scrape Twitter?
Yes, you will need Python installed on your computer, along with the Selenium library and ChromeDriver. These tools together allow you to control the web browser and capture the data you want.