I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
Have a look at WWW::Mechanize (Examples at http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod). It takes your webpage as object and makes all elements accessible via methods.
For example
$m->get("https://lists.ccs.neu.edu/bin/admindb/$listname");
$m->set_visible( $password );
$m->click;
There are ports for (al least) ruby and python, too.
You can run Selenium on a headless installation on your server, e.g. by programming the actions in python using pyvirtualdisplay.
pyvirtualdisplay allows you to use a xvfb, xepher or xvnc screen so you can do screenshot (or take a remote peek to see what is going on).
On Ubuntu 12.04 install:
sudo apt-get install python-pip tightvncserver xtightvncviewer
sudo pip install selenium pyvirtualdisplay
and run the following (this is using the newer Selenium2 API, the older API is still available as well):
import subprocess
from pyvirtualdisplay import Display
from selenium import webdriver
def browse_it(port=None):
browser = webdriver.Firefox()
page = browser.get('http://unix.stackexchange.com/questions')
for question in browser.find_elements_by_class_name('question-hyperlink'):
print question.text
if port:
print '--------\nconnect using:\n vncviewer ' + \
'localhost:{}\nand click the xmessage to quit'.format(port)
subprocess.call(['xmessage', 'click to quit'])
browser.quit()
def browse_it_hidden(rfbport=5904):
with Display(backend='xvnc', rfbport=str(rfbport)) as disp:
browse_it(rfbport)
if __name__ == '__main__':
browse_it_hidden()
The xmessage prevents the browser to quit, in testing environments you would not want this. You can also call browse_it() directly to test in the foreground.
The results of Selenium's find_element.....() do not provide things like selecting the parent element of an element you just found. Something that you might expect from HTML parsing packages (I read somewhere this is on purpose).
These limitations can be kind of hassle if you do scraping of pages you have no control over. When testing your own site, just make sure you generate all of the elements that you want to test with an id or unique class so they can be selected without hassle.
You could use either of:
Basically any language that lets you query a networked resource would do...