Ticket #2227 (new defect)

Opened 9 years ago

Last modified 9 years ago

RAM and disk efficiency of transforms

Reported by: gracinet Owned by: gracinet
Priority: P2 Milestone: CPS 3.5.7
Component: PortalTransforms Version:
Severity: major Keywords:
Cc:

Description

Took a brief tour of the code while thinking about #1940, so this is to be confirmed.

Anyway, it looks as if PortalTransform were very RAM inefficient.

First of all, it seems to expect the data as a whole in memory. If true, even for data coming from OFS.Image.File, this is already wrong.

Then, it makes temporary files by writing them in one chunk to launch the command doing the actual convertion work (libtransforms.commandtransform.initialize_tmpdir)

In the case of data coming from DiskFile or TramlineFile this is also a big waste of disk time on FSes supporting symbolic links.

Fortunately, data is passed all along with options. This is the way it works with the file name (remind that correct extension is needed by many programs). We could allow data being a file object or being represented by a file system path, and take the most efficient approach in context (e.g. if command works from FS, use symlinks except for Windows).

Nowadays we have bug reports and spec documents full of screenshots, powerpoint presentation weighing over 100 MBs and huge PDFs full of scans, and CPSTramline can handle that, so this is not superfluous.

Change History

comment:1 Changed 9 years ago by gracinet

Current status: almost done, in branches of PortalTransforms? (main beef) and CPSSchemas (client side) Implementation may be problematic on Windows, but on my Debian box, it works. Unit tests still have to be adapted.

With #2224 done, CPSTramline should be able to cope directly (and store large results in its own way).

The downside is that our PortalTransform? has now become a real fork. Submission to upstream is an option (if they like it).

comment:2 Changed 9 years ago by gracinet

  • Milestone changed from CPS 3.5.2 to CPS 3.5.3

This needs to be tested more thoroughly before being released, and we'd like to reach 3.5.2 sooner than that, so it'll stay in branches for now

Note: See TracTickets for help on using tickets.