How to change urllib User-Agent

This article explains how to change the User-Agent of urllib, a module of the standard library of Python.

Table of Contents

  1. The problem
  2. The solution
  3. More examples
  4. External links

The problem

While using Python's urllib module you may want to change the User-Agent sent by the functions for two main reasons:

  1. you want to define your own UA;
  2. you need a valid UA in order to access some sites.

The Python Library Reference said that, by default, the URLopener class sends a User-Agent header of urllib/VVV, where VVV is the urllib version number, as we can see in the following code:

>>> from urllib import URLopener
>>> URLopener.version
'Python-urllib/1.16'

Several sites (e.g. Google, Wikipedia) don't like this User-Agent and they will return an error message when you try to access their pages using urllib:

>>> from urllib import urlopen
>>> page = urlopen('http://www.google.com/search?q=python')
>>> page.read()
[…]<b>Error</b><H1>Forbidden</H1>Your client does not have permission
to get URL <code>/search?q=python</code> from this server.[…]
>>> page = urlopen('http://en.wikipedia.org/wiki/Python')
>>> page.read()
[…]Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 25 Dec 2007 15:45:20 GMT[…]

The solution

So, how can we change the User-Agent? If we don't want to change the headers using a lower-level module such as httplib, the solution is quite easy:

Applications can define their own User-Agent header by subclassing URLopener or FancyURLopener and setting the class attribute version to an appropriate string value in the subclass definition.

Let's see how it works:

>>> from urllib import FancyURLopener

>>> class MyOpener(FancyURLopener):
...   version = 'My new User-Agent'

We have defined a new class, named MyOpener, with a new UA: 'My new User-Agent'.

>>> MyOpener.version
'My new User-Agent'

However, this is not enough if we want to access Google or Wikipedia. These sites want a browser-like User-Agent, so we need to change the version with:

>>> class MyOpener(FancyURLopener):
...   version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11)
 Gecko/20071127 Firefox/2.0.0.11'

Now we can create a new instance of MyOpener and try again using the .open() method (instead of the urlopen() function):

>>> myopener = MyOpener()
>>> page = myopener.open('http://www.google.com/search?q=python')
>>> page.read()
[…]Results <b>1</b> - <b>10</b> of about <b>81,800,000</b> for <b>python</b>[…]
>>> page = myopener.open('http://en.wikipedia.org/wiki/Python')
>>> page.read()
[…]<h1 class="firstHeading">Python</h1><h3 id="siteSub">From Wikipedia, the free encyclopedia</h3>[…]

Using the methods of MyOpener we will be able to open or retrieve the pages we need, sending our User-Agent instead of the one used by urllib.

More examples

External links

Ezio Melotti - ©2007 - This work is licensed under a Creative Commons BY-NC-SA 3.0 License.