Selenium on Heroku

May 11, 2017

I spent a lot of time beating my head around the Selenium WebDriver in NodeJs that works locally on my Windows machine and easily deploys to Heroku. I finally got it working using PhantomJs as the headless browser. I was able to get Chrome and Firefox working locally, but not on Heroku. Here are the steps I took to get it working:

1. Install PhantomJS 2.1.1

http://phantomjs.org/download.html

2. Set up a Heroku app with an additional buildpack for PhantomJs:

https://github.com/stomita/heroku-buildpack-phantomjs

3. Set up a simple NodeJs/Express app. I am currently on NodeJS LTS 6.10.3. Selenium-WebDriver will not work on node < 6.

package.json:

{
  "name": "untitled2",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "node index.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "express": "^4.15.2",
    "selenium-webdriver": "^3.4.0"
  },
  "engines": {
    "node": "6.10.3",
    "npm": "4.5.0"
  }
}

4. Expose a simple /test endpoint on your app to do the Selenium-WebDriver "Hello World" example:

index.js


var webdriver = require('selenium-webdriver');
var express = require('express')
var app = express()

var port = process.env.PORT || 14000;
var By = webdriver.By;

app.get('/test', function (req, res) {
    var driver = new webdriver.Builder()
        .forBrowser('phantomjs')
        .build();
    driver.get('http://www.google.com/ncr');
    driver.findElement(By.name('q')).sendKeys('webdriver');
    driver.findElement(By.name('btnG')).click();
    driver.wait(function() {
        return driver.getTitle().then(function(title) {
            console.log(title);
            return title === 'webdriver - Google Search';
        });
    }, 5000).then(function() {
        res.status(200).send('Done');
    }, function(error) {
        res.status(200).send(error);
    });
    driver.quit();
});

app.listen(port, function () {
    console.log('Example app listening on port: ',port)
})

Notes:

I copied phantomjs.exe straight into my project root. It should also work if you just install it and change your PATH variables to point to the installed location of phantomjs.exe
I am using a local port of 14000. So you can run it and go to http://localhost:14000/test to try it out.
I have it set to a 5 second timeout. If it times out, the promise will return an error and output: {"name":"TimeoutError"}
When pushing to Heroku you can watch for the PhantomJS Buildpack to be installed successfully:

remote: -----> Build succeeded!

remote: -----> PhantomJS app detected

remote: -----> Extracting PhantomJS 2.1.1 binaries to /tmp/build_a6395fe7656f5bfcbfc7cfa31d3f8381/vendor/phantomjs

remote: -----> exporting PATH and LIBRARY_PATH

Comments

Kevin CurryJuly 6, 2017 at 9:22 AM
Exactly what I needed to get me started. Thanks!
ReplyDelete
Replies
Kevin CurryJuly 6, 2017 at 11:54 AM
I may have written too soon. I get:

/app/node_modules/selenium-webdriver/lib/promise.js:2634
Jul 06 11:17:08 comparity-qa app/web.1: throw error;
Jul 06 11:17:08 comparity-qa app/web.1: ^
Jul 06 11:17:08 comparity-qa app/web.1: Error: Server terminated early with status 2
Jul 06 11:17:08 comparity-qa app/web.1: at Error (native)
Jul 06 11:17:08 comparity-qa app/web.1: at earlyTermination.catch.e (/app/node_modules/selenium-webdriver/remote/index.js:252:52)
Jul 06 11:17:08 comparity-qa app/web.1: at process._tickCallback (internal/process/next_tick.js:103:7)
Jul 06 11:17:08 comparity-qa app/web.1: From: Task: WebDriver.createSession()
Jul 06 11:17:08 comparity-qa app/web.1: at Function.createSession (/app/node_modules/selenium-webdriver/lib/webdriver.js:777:24)
Jul 06 11:17:08 comparity-qa app/web.1: at Function.createSession (/app/node_modules/selenium-webdriver/phantomjs.js:220:55)
ReplyDelete
Replies
Kevin CurryJuly 9, 2017 at 7:57 AM
Your blog only talks about running on localhost and installing on Heroku. Did you get it to run on Heroku? If so, how?
ReplyDelete
Replies
UnknownAugust 1, 2017 at 1:02 PM
Alex- super informative. I know this is a bit much but hoping you can help me out.
I have the following node route using selenium and chrome driver which is working correctly and returning expected html in the console:

app.get('/google', function (req, res) {
var driver = new webdriver
.Builder()
.forBrowser('chrome')
.build();

driver.get('https://www.google.com')
driver
.manage()
.window()
.setSize(1200, 1024);
driver.wait(webdriver.until.elementLocated({xpath: '//*[@id="lst-ib"]'}));
return driver
.findElement({xpath: '//*[@id="lst-ib"]'})
.sendKeys('stackoverflow' + webdriver.Key.RETURN)
.then((html) => {
return driver
.findElement({xpath: '//*[@id="rso"]/div[1]/div/div/div/div'})
.getAttribute("innerHTML")
})
.then((result) => {
console.log(result)
})
.then(() => {
res
.status(200)
.send('ok')
});
I have also installed the phantom js driver and tested that its working by returning the URL title - it works. When I use the above exact route and replace the chrome with phantomjs I get no results returned. There are no errors - just no print out in my console. The status and result are never sent to the browser so it doesn't appear to be stepping through promise chain.

Any suggestions?
ReplyDelete
Replies
UnknownMarch 22, 2018 at 9:49 AM
I've tried to crawl livescore which is render by hltv server However i cant do it with phantomjs... can u help me ?
var express = require('express');
var router = express.Router();
var cheerio = require('cheerio');
var webdriver = require('selenium-webdriver');

// GET todayMatch
router.get('/',function (req, res) {
const {Builder, until} = require('selenium-webdriver');
let driver = new webdriver.Builder()
.withCapabilities(webdriver.Capabilities.phantomjs()
.set("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"))
.build();
const hltv_url = 'https://www.hltv.org/';

driver.get(hltv_url)
.then(() => driver.wait(until.titleIs('CS:GO News & Coverage | HLTV.org'), 1000))
.then(() => driver.executeScript("window.scrollTo(0, document.body.scrollHeight);"))
.then(() => driver.getPageSource())
.then((source) => {
const $ = cheerio.load(source);
var items = [];
$('.top-border-hide').find(".hotmatch-box.a-reset").each((_,ele) => {
items.push($(ele));
});
console.log(items[0]);
console.log(items[0].html());
console.log(items[0].text());
//Do whatever you want with the result

//console.log(item.html());
})
.then(() => {
driver.quit();
});
res.render('pages/score_api');
});
module.exports = router;
ReplyDelete
Replies
SPECULARIApril 12, 2018 at 5:50 PM
Can't understand. You said that you "copied phantomjs.exe straight into my project root". Question: for what platform need dowload phantomjs.exe for Heroku (windows, Mac OS X, Linux 64-bit, etc)?
It will works only remote? No need install it for local computer. Please explain. Thx!
ReplyDelete
Replies
UnknownApril 10, 2022 at 10:11 PM
Merkur Futur Adjustable Safety Razor - Sears
Merkur https://deccasino.com/review/merit-casino/ Futur Adjustable Safety Razor is the perfect balance 출장샵 of performance, wooricasinos.info safety, and comfort. https://septcasino.com/review/merit-casino/ Made in Solingen, Germany, this razor has a perfect sol.edu.kg balance of
ReplyDelete
Replies

Add comment

Search This Blog

Alex's Tech Blog

Selenium on Heroku

Comments

Post a Comment

Popular posts from this blog

Vue Multiselect

Angular directory structure for large projects