Using Selenium with Python on Quest

Selenium is a tool for automating web applications commonly used for webscraping. It works with several different web browsers (Chrome, Firefox, …) and programming languages (Java, Python, CSharp, …). To run a Selenium script that automates functionalities of web browsers, browser-specific drivers and libraries are required. Manually managing these components can be cumbersome, which lead to the development of Selenium Manager, a browser driver management tool that is included with all recent Selenium releases. Here, we demonstrate how to install and use Selenium (with Selenium Manager) in a virtual environment to run simple python webscraping scripts on Quest.

Note: While there are Chrome installations available on Quest, we recommend that users running Selenium through python use the browser driver versions automatically downloaded and cached by Selenium Manager.

 

Creating and activating a virtual environment on Quest

First, load the mamba module on Quest:

[@quser32 py_selenium_ex]$ module load mamba/23.1.0

Next, create a virtual environment with python, selenium, and whatever other packages you may need. The --prefix argument creates the virtual environment in a specified location, rather than the default (/home/<net_id>/.conda/envs/).

[@quser32 py_selenium_ex]$ mamba create --prefix ./my_selenium_env -c conda-forge python=3.11 selenium matplotlib ipykernel pandas --yes

Once the virtual environment has been created, activate it with the conda activate command. You may need to first run the command eval "$(conda shell.bash hook)" depending on whether or not you have initialized your shell to use conda.

[@quser31 py_selenium_ex]$ eval "$(conda shell.bash hook)"

(base) [@quser31 py_selenium_ex]$ conda activate ./my_selenium_env/

(~/examples/py_selenium_ex/my_selenium_env) [tdm5510@quser31 py_selenium_ex]$ 

You have now created and activated an Anaconda virtual environment including python and selenium on Quest. While this environment is activated, the specified python version and packages will be available. To deactivate the environment (and return to the base environment), run conda deactivate. For more information about Anaconda virtual environments on Quest, see this page.

 

A simple example Python script using Selenium and Chrome

To run a python script that uses Selenium to automate Chrome, it is important to include several chrome driver options that are included in the following example script. Without these options, launching the web driver on Quest may fail.

selenium_chome_ex.py

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#### optionally include to print some debugging information ####
import logging
logging.basicConfig(level=logging.DEBUG)
################################################################

## include these chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-first-run")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-metrics")
chrome_options.add_argument("--disable-translate")
chrome_options.add_argument("--bwsi")

driver = webdriver.Chrome(options = chrome_options)

################################################################

## use Selenium to do something simple
driver.get("https://www.google.com")
print(driver.title)
driver.quit()

print('Done.')

Here, the added options specify the following:

  • --headless : does not open a browser window (will not work on a remote machine like a Quest compute node without graphics forwarding)
  • --no-first-run : skips chrome first run tasks which could cause automation to fail
  • --no-sandbox : runs the process in a ‘non-sandboxed’ (less restricted) environment
  • --disable-dev-shm-usage : prevents storage of temporary Chrome files in a shared memory location (that users may not have access to on Quest)
  • --disable-metrics : prevents Chrome from collecting metrics about these processes
  • --disable-translate : disables Chrome’s translate feature, which may interfere with some automation processes
  • --bwsi : ‘browse without sign-in’ starts a guest session (the user/process will not be prompted to log into Chrome)

 

After including these Chrome options, this script simply loads Google and prints the title of the page. To automate your own Chrome processes, see Selenium’s documentation or other resources.

 

Running your Selenium script on Quest as a batch job

To run the above script (selenium_chome_ex.py) as a batch job on Quest, use the following submission script.

#!/bin/bash

#SBATCH --account=<allocation_id> ## your allocation 
#SBATCH --partition=<partition> ## e.g. short, normal, long, buyin 
#SBATCH --nodes=1 ## change this if parallelizing over multiple nodes
#SBATCH --ntasks-per-node=1 ## change this if parallelizing over multiple cpus
#SBATCH --mem=8GB ## change this if necessary
#SBATCH --time=00:25:00 ## change this if necessary
#SBATCH --output=./selen_ex_out.out ## where standard output and error are written
#SBATCH --job-name=selen_ex_job ## job name for your reference 

module purge

module load mamba/23.1.0

eval "$(conda shell.bash hook)"

conda activate ./my_selenium_env/

python ./selenium_chrome_ex.py

conda deactivate

For more information about submitting jobs on quest, see this page. To develop and run python scripts with Selenium in a Jupyter notebook, create an iPython kernel from your virtual environment following these instructions.

 

 

 

 

Was this helpful?
0 reviews
Print Article

Details

Article ID: 2598
Created
Mon 5/6/24 2:43 PM
Modified
Thu 5/9/24 3:31 PM