[WM] Strange behavior with EtText 2.3
Eric Pement
pemente.northpark.edu
Fri May 10 08:06:02 IST 2002
Replying to Herbert Liechti, who on 10 May 2002 wrote:
> Hello
>
> I have a strange behavior with EtText 2.3.
[...]
> The text2html method cuts away two characters at the end of the
> Subject line.
If it's any help, I can confirm that this is happening. I run
Active Perl under Win98, and here's my results. The problem is
connected to the presence of the "Subject:" on the input line
(simulating the header of an e-mail message). The following console
output demonstrates the problem:
---BEGIN console output---
[1] c:\text-ettext-2.3>>type test_one.txt
Subject: This is a test
And that's all there is
[1] c:\text-ettext-2.3>>perl ettext2html test_one.txt
<p><em>Subject: </em>This is a te <br />
</p><p>And that's all there is
</p>
[1] c:\text-ettext-2.3>>type test_two.txt
Subject This is a test
And that's all there is
[1] c:\text-ettext-2.3>>perl ettext2html test_two.txt
<p>Subject This is a test
</p><p>And that's all there is
</p>
[1] c:\text-ettext-2.3>>
---END console output-----
I am a little annoyed that there is no trailing newline at the end
of the console output, but maybe that's intentional.
At any rate, I tested the behavior under both EtText v2.2 and
v2.3, and the same problem persists. The truncation of the last two
characters in the Subject: line is caused by the Lists.pm module of
EtText. The source of the error is on lines 123-126, right here:
123: s/^((?:From| To| Cc| Date| Subject| Return-Path| Delivered-To|
124: Received| Sender| Message-Id| Bounces-To| Errors-To|
125: Reply-To| MIME-Version| Content-Type):\s)
126: (\S.+)([^<+].+)$/<em>$1<\/em>$2 <br \/>/ix)
Let me try to simplify the match substitution:
s/(Subject: )(\S.+)([^<+].+)$/<em>$1<\/em>$2 <br \/>/ix
$1 $2 $3
You will notice that $3 gets thrown away in the substitution.
However, $3 has two MANDATORY characters which are being deleted: the
character set [^<+] and the .+ expression, which matches one or more
characters. The only reason you don't have more characters missing at
the end is because the greedy + operator in $2 has saved them.
Now the truth is, I really don't know enough about e-mail headers
to know why these things at the end of the line are being thrown
away. What I do know is that your sample file was being misread as
the header to an e-mail message, and for some reason certain strings
at the end of an e-mail Subject: line are being intentionally
discarded.
It would be nice if the source code would explain why $3 is being
discarded, or what sort of extraneous strings are usually found in $3
(especially since he's using the /x switch in the s/// command!). The
code is too sparsely commented for my taste, and it would take me
hours of time to figure out what the invisible issues are. However,
this is the source of the problem. I will leave it to better minds
than mine to explain the solution.
Kind regards,
--
Eric Pement - pemente at northpark.edu
More information about the Webmake-talk
mailing list