Archive for the ‘django’ Category

Python zipfile speedup tips

Saturday, October 15th, 2011

I have been working on a django project that requires large zip files to be unzipped.

At first I was just Popen’ing unzip. but its hard to track the progress of extraction, in the case of large files.

So I decided to use pythons zipfile module, and override extractall with a progress callback.

However I was very disappointed with the performance. A major slowdown compared to unzip binary.

Here was unzip performance:
time unzip -q /mnt/files/test.zip

real 0m8.880s
user 0m1.560s
sys 0m0.570s

8 seconds, not bad

This was my test script:

from zipfile import ZipFile
zf = ZipFile("/mnt/files/test.zip")
zf.extractall()

time python test.py

real 6m50.938s
user 0m2.990s
sys 0m1.010s

7 minutes!! what is going on… I scratched my head.. trying different things..
So I tried an strace.. And it was all clear.

If you pass a filename to ZipFile.. it doesnt open the file in the constructor.. oh no.

It actually saves the filename and on each extract operation, it opens the file, then closes.. for each file in the archive.

Now, on a local filesystem, this isn’t a big problem. However with a remote cifs filesystem opening a file is a lot more expensive, hence the slowdown.

So, an easy optimisation is to open the file and pass ZipFile a file descriptor.

from zipfile import ZipFile
zf = ZipFile(open("/mnt/files/test.zip","r"))
zf.extractall()

time python test.py

real 0m10.071s
user 0m2.550s
sys 0m0.690s

Bingo, just ~10% slower than unzip.

If you are using python 2.6, and easy optimisation is to use unzip.py from python 2.7, it has many optimisations with regard to large files in the archive.

Citrix Xenoscope

Tuesday, April 19th, 2011

Hi All

Just a quick note that if you’re at Citrix synergy, you’ll get a chance to get a preview of a piece of software me and my team have been developing here in Dublin.

Basically it’s a tool that should help troubleshoot problems with XenServer installations.

Ian will be giving a workshop in Synergy San Francisco.

It’s a really fun and useful product and I hope you like it!

Logo...


Copyright © 2018 All Rights Reserved.
No computers were harmed in the 0.308 seconds it took to produce this page.

dmarkey.com