Ticket #2186 (closed defect: fixed)

Opened 9 years ago

Last modified 9 years ago

Default system sanitizer uses latin1 encoding

Reported by: gracinet Owned by: madarche
Priority: P2 Milestone: CPS 3.5.1
Component: CPSSchemas Version: 3.5.1rc1
Severity: blocker Keywords:


The Text Widget can launch a command-line sanitizer, the command to use being stored in the xhtml_sanitize_system property. The default (class attribute) value is tidy with a set of options, including latin1 as input and output encoding:

xhtml_sanitize_system = 'tidy -indent -wrap 80 --input-encoding latin1 --output-encoding latin1 --force-output yes --clean yes --drop-font-tags yes --drop-proprietary-attributes yes --show-body-only yes --write-back yes --output-xhtml yes --show-errors 0 --show-warnings no --hide-comments no %s

We should provide in the property a pseudo-variable to put the correct encoding.

Note that tidy doesn't know about iso-8859-15, so that one would need in a custom project running this charset and using tidy to change the value for a fixed latin1. It is currently not possible to have a different charset as utf-8 (because of ZPublishers' :utf8:ustring.

Change History

comment:1 Changed 9 years ago by gracinet

  • Severity changed from normal to blocker

Forgot to mention that there is a UnicodeEncodeError before the property is actually used

comment:2 Changed 9 years ago by gracinet

  • Status changed from new to closed
  • Resolution set to fixed

Done and pushed.

Finally, backwards compatibility for tidy users in ensured : had to introduce a translation table anyway, because tidy does not understand 'utf-8' either, and wants to stick to 'utf8'. It does also the 'iso-8859-15' -> 'latin1' conversion.

Note: See TracTickets for help on using tickets.