[WM] Strange behavior with EtText 2.3

Eric Pement pemente.northpark.edu
Fri May 10 08:06:02 IST 2002


Replying to Herbert Liechti, who on 10 May 2002 wrote:

> Hello
> 
> I have a strange behavior with EtText 2.3.
[...]
 
> The text2html method cuts away two characters at the end of the
> Subject line.

   If it's any help, I can confirm that this is happening. I run 
Active Perl under Win98, and here's my results. The problem is 
connected to the presence of the "Subject:" on the input line 
(simulating the header of an e-mail message). The following console 
output demonstrates the problem:  

---BEGIN console output---
[1] c:\text-ettext-2.3>>type test_one.txt
Subject: This is a test

And that's all there is

[1] c:\text-ettext-2.3>>perl ettext2html test_one.txt
<p><em>Subject: </em>This is a te <br />
</p><p>And that's all there is
</p>
[1] c:\text-ettext-2.3>>type test_two.txt
Subject This is a test

And that's all there is

[1] c:\text-ettext-2.3>>perl ettext2html test_two.txt
<p>Subject This is a test
</p><p>And that's all there is
</p>
[1] c:\text-ettext-2.3>>

---END console output-----

   I am a little annoyed that there is no trailing newline at the end 
of the console output, but maybe that's intentional.

   At any rate, I tested the behavior under both EtText v2.2 and 
v2.3, and the same problem persists. The truncation of the last two 
characters in the Subject: line is caused by the Lists.pm module of 
EtText. The source of the error is on lines 123-126, right here:  

123: s/^((?:From| To| Cc| Date| Subject| Return-Path| Delivered-To|
124: Received| Sender| Message-Id| Bounces-To| Errors-To|
125: Reply-To| MIME-Version| Content-Type):\s)
126: (\S.+)([^<+].+)$/<em>$1<\/em>$2 <br \/>/ix)

Let me try to simplify the match substitution:

  s/(Subject: )(\S.+)([^<+].+)$/<em>$1<\/em>$2 <br \/>/ix
     $1         $2    $3

   You will notice that $3 gets thrown away in the substitution. 
However, $3 has two MANDATORY characters which are being deleted: the 
character set [^<+] and the .+ expression, which matches one or more 
characters. The only reason you don't have more characters missing at 
the end is because the greedy + operator in $2 has saved them.  

   Now the truth is, I really don't know enough about e-mail headers 
to know why these things at the end of the line are being thrown 
away. What I do know is that your sample file was being misread as 
the header to an e-mail message, and for some reason certain strings 
at the end of an e-mail Subject: line are being intentionally 
discarded.

   It would be nice if the source code would explain why $3 is being 
discarded, or what sort of extraneous strings are usually found in $3 
(especially since he's using the /x switch in the s/// command!). The 
code is too sparsely commented for my taste, and it would take me 
hours of time to figure out what the invisible issues are. However, 
this is the source of the problem. I will leave it to better minds 
than mine to explain the solution.

Kind regards,

--
Eric Pement - pemente at northpark.edu





More information about the Webmake-talk mailing list