Ticket #2205 (closed feature: fixed)

Opened 9 years ago

Last modified 9 years ago

Generic batch or archive upload system

Reported by: gracinet Owned by: gracinet
Priority: P1 Milestone: CPS 3.5.2
Component: CPS (global) Version: unspecified
Severity: normal Keywords:
Cc:

Description

This is about performing a big upload in one shot and automatically create documents from there

Simple use cases:

  • populate image gallery from a zip
  • upload a bunch of attached files

This must stay generic, because some custom projects might need it. We know one in which that would be "populate a recording object from a bunch of mp3's"

We need

  1. big upload handling (zip file is ok). CPSTramline capable, of course.
  2. mapping of an individual file's type (mime) to the type of doc to create
  3. performing of creation

Change History

comment:1 Changed 9 years ago by gracinet

There's actually already a mechanism for that in CPSDocument.createFile (author: Dragos Ivan), and I just tested that it still works with an Image Gallery. It's done through a File Field, with a write_process_expr.

A few shortcomings though:

  1. CPSTramline should take care of this one, too. For that, tramline widgets need a new property to avoid permanent storage in tramline repository of the big upload.
  2. the mapping file type -> document type is hardcoded.
  3. again, CPSTramline would need a new method for TramlineFile objects creation from scratch.
  4. the upload is an option after creation, and that's not noticeable enough
  1. means that this can currently be really harsh on server RAM. But tomorrow, CPSTramline will also come with a progress bar.
  1. should be done by relaying to a dedicated tool
  1. means that it can be hard on ZODB (imagine people uploading tens of archives full of 3 MB images).
  1. is a classical problem : at the time the expression is been called, the proxy is not set yet.

An easy workaround would be to provide a new action, displaying another edit form with the ziparchiveuploader layout only.

As for the site I was thinking of, issues 2 and 3 are a problem (audio files).

comment:2 Changed 9 years ago by gracinet

  • Component changed from CPS (global) to CPSDocument

Issue 2. now known as #2208 (no tool, after all)

comment:3 Changed 9 years ago by gracinet

  • Component changed from CPSDocument to CPS (global)

Point 3 mostly done: config not there yet by default. Also still lacks TramlineImage and Big Image though.

Direct file creation in tramline repository should be very collision safe (write first, choose filename and link second), but CPSTramline is probably less and less Windows-friendly (it's been documented almost from the beginning).

Point 1 done: RAM usage problem for zip file upload and extraction is confirmed as a side-effect of testing point 3. I tried a 20 MB archive to populate a workspace, and that led after a lont waiting time to a painful out-of-memory (testing rig is a tight Debian lenny VM). After tramlinisation of the upload widget, it worked and extraction is incredibly fast (felt instantaenous while monitoring the upload itself). Details: there's a new function in CPSUtil.file to get a python file handler from a File object, used by CPSDocument.createFile. In case of TramlineFile or DiskFile (or any OFS.File subclass providing a getFileHandler method), this leads a direct handle to the actual (FS) file. Image resizing in CPSCore (#2204) should also leverage it. The zip file itself is not kept (new Transient Tramline File widget does that).

Everything pushed in future-3.5.2 branches, and will require naturally require extensive real-life testing.

comment:4 Changed 9 years ago by gracinet

First real-life results (viral-prod.com)

  • as expected computeDependantFields machinery works as if this was a normal creation (in that case, extracting ID3 metadata)
  • we lack a disambiguation system (the file name is been used as local id, but we flatten everything)
  • process stops at first name collision (fixed: loop proceeds, first wins)
  • name collision if systematic for zips made on MacOS X because of what's in the __MACOSX subdirectory (fixed: skipping it)

comment:5 Changed 9 years ago by gracinet

Side note: obviously this createFile.createFile should be renamed. Maybe bulkCreate.importZip, with a compatibility alias and a nice depreciation warning of course.

comment:6 Changed 9 years ago by gracinet

  • Status changed from new to closed
  • Resolution set to worksforme

renaming done, as bulkcreate.import_zip. This is working online for me, time to close.

comment:7 Changed 9 years ago by gracinet

  • Status changed from closed to reopened
  • Resolution worksforme deleted

There's an error in Image Gallery if one tries and import images whose filenames have already been used as document ids (typically, reimporting a zip for update does that).

We should at least use the standard disambiguation system. The best would be to optionnaly allow overwrites.

comment:8 Changed 9 years ago by gracinet

Update: now the document ids are based on the file name without extension (which used to confuse some user agents), and there's the standard lif of ambiguity (nothing forces the user to bulk upload on empty containers).

comment:9 Changed 9 years ago by gracinet

  • Status changed from reopened to closed
  • Resolution set to fixed

Released

Note: See TracTickets for help on using tickets.