Ticket #854 (closed defect: fixed)

Opened 14 years ago

Last modified 14 years ago

CPSIO import fails on sites with a lot of documents

Reported by: jmorliaguet Owned by: fguillaume
Priority: P2 Milestone: CPS 3.4.0
Component: CPSIO Version: TRUNK
Severity: normal Keywords: transactions catalog
Cc:

Description

I've encountered a recurring problem when importing sites with more than 1000-1500 documents with CPSIO. Apparently the transaction is commited when the entire import is done and in some cases it results in a large number of write conflict errors which har never resolved and the whole process then starts again.

There are also a lot of catalog errors

2005-07-28T18:37:39 ERROR(200) Catalog uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of ...

I don't know if this is related but disabling subtransactions in the portal catalog removes the errors and the import succeeds.

maybe CPSIO could do a commit every 100 documents, or it could disable the catalog subtransactions during the import process?

Change History

comment:1 Changed 14 years ago by jmorliaguet

  • Summary changed from CPSIO import fails on sites with a lot on documents to CPSIO import fails on sites with a lot of documents

it could be a race condition: if the "overwrite existing documents" is selected then the document being imported is first deleted and recreated right afterwards, so the catalogue tries to uncatalog it, but another object with the same id is there and the catalog cannot remove it from the index.

comment:2 Changed 14 years ago by fguillaume

I'm slowly fixing the catalog errors all over CPS.

Is there any other activity on the site while you're doing the import?

The subtransaction aspect is strange...

comment:3 Changed 14 years ago by jmorliaguet

This is a migration instance that I'm working on so there is no activity whatsoever.

the conflict error that causes the process to start again is on cps3importer:

2005-07-28T23:14:33 INFO(0) ZODB conflict error at /chalmers-en/cps3importer (1 conflicts since startup at 2005-07-28T23:04:11)

with a lot of:

2005-07-28T23:14:13 ERROR(200) Catalog uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /chalmers-en/sections.

just before on the same object uid (/chalmers-en/sections)

there are also catalog error messages on the repository object which should not occur:

2005-07-28T23:33:35 ERROR(200) Catalog uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /chalmers-en/portal_repository/1517891035__0004.

non-related there are a lot of:

2005-07-29T01:09:44 PROBLEM(100) DataModel.__init__ Field 'photo' is in schema common but also in schema flexible_content.

due to the fact that there is a 'photo' widget in flexible_content with a '?' field.

comment:4 Changed 14 years ago by fguillaume

The photo fields is fixed in svn since yesterday.

The uncataloging problems on multi-language docs is being fixed right now.

The one on the repository object is worrying, please debug it to check when something is added there by adding at the start of Catalog.py::catalogObject the lines:

        if uid.endswith('__0001'):
            print 'cataloging', uid
            import traceback
            from zExceptions.ExceptionFormatter import format_exception
            traceback.format_exception = format_exception
            traceback.print_stack()

and check what appears during a bin/runzope.

comment:5 Changed 14 years ago by jmorliaguet

the indexing of portal_repository objects comes from the 'CPSIO/branches/rspivak-cpsio-branch' branch I'm using.

in: import_modules/CPS3Importer.py

that uses:

folder.getEditableContent().edit(**kw)

instead of:

folder.getEditableContent().edit(proxy=folder, **kw)

but even in the trunk there is another edit() without proxy parameter passed line 2027:

--
2023-                val = File(id, title, data_file, content_type=content_type)
2024-            else:
2025-                val = eval(field_el.text)
2026-            kw[field_el.get('id')] = val
2027:        doc.edit(**kw)
2028:        # set Language manually as doing it through edit() does not do it
2029-        if kw.has_key('Language'):
2030-            doc.setLanguage(kw['Language'])
2031-        self.buildDocumentWorkflowHistory(doc_id, document_el)
2032-

comment:6 Changed 14 years ago by fguillaume

  • Owner changed from trac to fguillaume
  • Status changed from new to assigned

On svn trunk, doc.edit() without proxy isn't a problem anymore (at lease it doesn't reindex spurious repository objects).

comment:7 Changed 14 years ago by jmorliaguet

now I get:

    * <FSPythonScript at /cps5/cps3importer>
      Line 17
    * Module Products.CPSIO.import_modules.CPS3Importer, line 231, in importFile
    * Module Products.CPSIO.import_modules.CPS3Importer, line 1219, in importFile
    * Module Products.CPSIO.import_modules.CPS3Importer, line 1313, in buildFolder
    * Module Products.CPSDocument.CPSDocument, line 151, in edit
    * Module Products.CPSDocument.FlexibleTypeInformation, line 1015, in editObject
    * Module Products.CPSDocument.FlexibleTypeInformation, line 869, in _commitDM
    * Module Products.CPSDocument.FlexibleTypeInformation, line 877, in _notifyModification
    * Module Products.CPSCore.EventServiceTool, line 256, in notify
    * Module Products.CPSCore.ProxyTool, line 781, in notify_proxy
    * Module Products.CPSCore.ProxyTool, line 557, in _reindexProxiesForObject
    * Module Products.CPSCore.EventServiceTool, line 256, in notify
    * Module Products.CPSPortlets.PortletsTool, line 1153, in notify_event
    * Module Products.CPSPortlets.PortletsTool, line 1168, in listPortletsInterestedInEvent
    * Module Products.CPSPortlets.PortletsTool, line 108, in listAllPortlets
    * Module Products.ZCatalog.Lazy, line 158, in __getitem__
    * Module Products.ZCatalog.Catalog, line 147, in __getitem__
    * Module ZODB.Connection, line 599, in setstate
    * Module ZODB.Connection, line 628, in _is_invalidated

ReadConflictError: database read conflict error (oid 0x01bc87, class BTrees.IOBTree.IOBucket)

using for BBB with 2.7:

Index: PatchCMFCoreCMFCatalogAware.py
===================================================================
--- PatchCMFCoreCMFCatalogAware.py      (revision 25311)
+++ PatchCMFCoreCMFCatalogAware.py      (working copy)
@@ -14,7 +14,7 @@
 # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
 # 02111-1307, USA.
 #
-# $Id:$
+# $Id$
 """Patch CMFCore CMFCatalogAware
 
 - Make reindexObjectSecurity correctly recurse in the presence of
@@ -40,7 +40,15 @@
                     continue
 
                 # Get the object
-                ob = brain._unrestrictedGetObject()
+                if hasattr(aq_base(brain), '_unrestrictedGetObject'):
+                    ob = brain._unrestrictedGetObject()
+                else:
+                    # BBB older Zope
+                    ob = self.unrestrictedTraverse(brain_path, None)
+                if ob is None:
+                    LOG('reindexObjectSecurity', PROBLEM,
+                        "Cannot get %s from catalog" % brain_path)
+                    continue
                 s = getattr(ob, '_p_changed', 0)
                 # Recatalog with the same catalog uid.
                 catalog.catalog_object(ob, brain_path,

comment:8 Changed 14 years ago by jmorliaguet

It could be related to the event service. I'm deactivating 'portal_cpsportlets' from the event subscribers.

comment:9 Changed 14 years ago by jmorliaguet

I've added the

   traceback.print_stack()

lines to Catalog.py

the repository objects never get cataloged, however some method (probably event-related) tries to uncatalog them.

------
2005-08-24T15:10:22 ERROR(200) Catalog uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /cps/portal_repository/1571603903__0001. 
------

which explains why they never get uncataloged...

comment:10 Changed 14 years ago by fguillaume

You could add a print_stack() in uncatalogObject to see what tries to uncatalog them.

comment:11 Changed 14 years ago by jmorliaguet

there it goes ...

uncataloging /chalmers-sv/portal_repository/1358268432__0004
  File "/usr/lib/zope-2.7.6/lib/python/ZServer/PubCore/ZServerPublisher.py", lin                                     e 23, in __init__
    response=response)
  File "/usr/lib/zope-2.7.6/lib/python/ZPublisher/Publish.py", line 384, in publ                                     ish_module
    environ, debug, request, response)
  File "/usr/lib/zope-2.7.6/lib/python/ZPublisher/Publish.py", line 175, in publ                                     ish_module_standard
    response = publish(request, module_name, after_list, debug=debug)
  File "/home/jmo/zope27-cps3/Products/Localizer/__init__.py", line 60, in new_p                                     ublish
    x = Publish.old_publish(request, module_name, after_list, debug)
  File "/usr/lib/zope-2.7.6/lib/python/ZPublisher/Publish.py", line 101, in publ                                     ish
    request, bind=1)
  File "/usr/lib/zope-2.7.6/lib/python/ZPublisher/mapply.py", line 88, in mapply
    if debug is not None: return debug(object,args,context)
  File "/usr/lib/zope-2.7.6/lib/python/ZPublisher/Publish.py", line 39, in call_                                     object
    result=apply(object,args) # Type s<cr> to step into published object.
  File "/home/jmo/zope27-cps3/Products/CMFCore/FSPythonScript.py", line 108, in                                      __call__
    return Script.__call__(self, *args, **kw)
  File "/usr/lib/zope-2.7.6/lib/python/Shared/DC/Scripts/Bindings.py", line 306,                                      in __call__
    return self._bindAndExec(args, kw, None)
  File "/usr/lib/zope-2.7.6/lib/python/Shared/DC/Scripts/Bindings.py", line 343,                                      in _bindAndExec
    return self._exec(bound_data, args, kw)
  File "/home/jmo/zope27-cps3/Products/CMFCore/FSPythonScript.py", line 163, in                                      _exec
    result = f(*args, **kw)
  File "Script (Python)", line 17, in cps3importer
  File "/home/jmo/zope27-cps3/Products/CPSIO/import_modules/CPS3Importer.py", li                                     ne 236, in importFile
    DocumentImporter(self.portal, doc_file, self.dir_name, ido).importFile()
  File "/home/jmo/zope27-cps3/Products/CPSIO/import_modules/CPS3Importer.py", li                                     ne 2032, in importFile
    self.buildRepositoryDocument(document_el)
  File "/home/jmo/zope27-cps3/Products/CPSIO/import_modules/CPS3Importer.py", li                                     ne 2051, in buildRepositoryDocument
    self.repository.manage_delObjects([doc_id])
  File "/usr/lib/zope-2.7.6/lib/python/OFS/ObjectManager.py", line 461, in manag                                     e_delObjects
    self._delObject(id)
  File "/home/jmo/zope27-cps3/Products/CPSCore/PatchBTreeFolder2.py", line 34, i                                     n _delObject
    object.manage_beforeDelete(object, self)
  File "/home/jmo/zope27-cps3/Products/CPSCore/EventServicePatches.py", line 108                                     , in manage_beforeDelete
    self._cps_old_manage_beforeDelete(*args, **kw)
  File "/home/jmo/zope27-cps3/Products/CMFCore/CMFCatalogAware.py", line 210, in                                      manage_beforeDelete
    self.unindexObject()
  File "/home/jmo/zope27-cps3/Products/CMFCore/CMFCatalogAware.py", line 59, in                                      unindexObject
    catalog.unindexObject(self)
  File "/home/jmo/zope27-cps3/Products/CPSCore/PatchCatalogTool.py", line 246, i                                     n cat_unindexObject
    self.uncatalog_object(default_uid)
  File "/usr/lib/zope-2.7.6/lib/python/Products/ZCatalog/ZCatalog.py", line 558,                                      in uncatalog_object
    self._catalog.uncatalogObject(uid)
  File "/usr/lib/zope-2.7.6/lib/python/Products/ZCatalog/Catalog.py", line 429,                                      in uncatalogObject
    traceback.print_stack()

comment:12 Changed 14 years ago by fguillaume

The uncatalogObject messages (caused by the fact that the object (File or Image) was actually never cataloged) is fixed by [29728].

I'm not sure if there's anything left to this bug.

comment:13 Changed 14 years ago by gracinet

Note also that there are transaction commits in the code (probably added since then):

From method import_modules.CPS3Importer.DocumentImporter.importFile:

            if i % SUBTRANSACTION_COMMIT == 0:
                get_transaction().commit(1)
                LOG("CPS3 DocumentImporter:", DEBUG,
                    "subtransaction commit - %dth element" % i)

comment:14 Changed 14 years ago by fguillaume

  • Status changed from assigned to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.