Ticket #1904 (closed defect: fixed)

Opened 12 years ago

Last modified 9 years ago

CPS RSS-1.0.0 | UnicodeEncodeError: 'latin-1' codec can't encode characters ...

Reported by: tracguest Owned by: madarche
Priority: P2 Milestone: CPS 3.5.1
Component: CPSRSS Version: TRUNK
Severity: critical Keywords:
Cc: r.mahoney@…

Description

Environment:

CPS_RSS-1.0.0

CPS-3.4.6

feedparser.py 3.2 (default) & 4.1 (latest)

SunOS proliant 5.10 Generic_118855-19 i86pc i386 i86pc (Solaris 10) & SunOS z12522AA 5.11 snv_62 i86pc i386 i86pc (OpenSolaris?)

Issue:

Refreshing the following Japanese feed in the RSS Tool:

 http://blogs.dion.ne.jp/sanskrit/index.rdf

Results in:

UnicodeEncodeError?: 'latin-1' codec can't encode characters ...

[see the full error log attached as cps-latin-error.log]

Subsequently the RSS Tool becomes completely unusable. (The only way I could manage to get the RSS Tool running again was to reinstall the whole CPS site from backup.)

The problematic feed should render as follows (using SPIP):

Indica et Buddhica - Tabulae :: Kataoka, Kei  http://tabulae.indica-et-buddhica.org/rubrique.php3?id_rubrique=261

I'm not sure if this issue with Japanese characters is related to the incorrect rendering of Latin diacritics with the following feed -- many commonly used in Romanised Sanskrit transliteration, e.g., a, u and i macron, S acute, n under-dot &c.:

 http://www.informaworld.com/ampp/rss~content=t713405669

Incorrect (using CPS RSS):

Indica et Buddhica - Recently Published issues of Asian Philosophy  http://indica-et-buddhica.org/sections/tabulae/periodica/a/asian-philosophy/asp-recently-published

Correct (using SPIP):

Indica et Buddhica - Tabulae :: Asian Philosophy - Recently Published  http://tabulae.indica-et-buddhica.org/rubrique.php3?id_rubrique=238

I'd be very happy to receive any thoughts on how these issues might be resolved.

Kind regards,

Richard MAHONEY

-- Richard MAHONEY | internet:  http://indica-et-buddhica.org/

Attachments

cps-latin-error.log Download (4.9 KB) - added by tracguest 12 years ago.
CPS RSS error log
jp-feed-cps.png Download (96.0 KB) - added by gracinet 9 years ago.
Japanese feed been rendered in CPS default instance

Change History

Changed 12 years ago by tracguest

CPS RSS error log

comment:1 follow-up: ↓ 3 Changed 12 years ago by madarche

  • Owner changed from trac to madarche
  • Status changed from new to assigned

I confirm the reproducibility of the reported bug.

Is this bug a regression? Was this bug happening on your portal before you switched to CPS 3.4.6 or is it simply the first time you have tried to use Japanese feeds in CPS?

comment:2 Changed 12 years ago by madarche

The problem doesn't come from feedparser 3.2 (default). The following command line works fine without any error:

$ python2.4 feedparser.py http://blogs.dion.ne.jp/sanskrit/index.rdf

comment:3 in reply to: ↑ 1 Changed 12 years ago by tracguest

Replying to madarche:

I confirm the reproducibility of the reported bug.

Is this bug a regression? Was this bug happening on your portal before you switched to CPS 3.4.6 or is it simply the first time you have tried to use Japanese feeds in CPS?

I've only tried Japanese feeds with 3.4.6. Unfortunately neither my test or production servers still hold an instance of 3.4.5.

The incorrect rendering of Latin diacriticals did -- if I recall correctly -- occur with the previous version of CPS RSS under 3.4.5.

-- Richard MAHONEY

comment:4 Changed 12 years ago by madarche

  • Priority changed from P1 to P2

comment:5 Changed 9 years ago by gracinet

  • Milestone changed from CPS 3.5.2 to CPS 3.5.1

Part of unicodegeddon

Changed 9 years ago by gracinet

Japanese feed been rendered in CPS default instance

comment:6 Changed 9 years ago by gracinet

  • Status changed from assigned to closed
  • Resolution set to fixed

The problem with the tool is exactly #2194. Since #2185, the rendering itself works, see the attached screenshot.

These two fixes will be in CPS 3.5.1 rc2.

Note: See TracTickets for help on using tickets.