[WM] [Fwd: Improper conversion from & to &]

jlm17 jlm17 at lucent.com
Fri Apr 30 20:37:48 IST 2004


Ok, I found where this is happening. It is in the final HTML cleanup, and the use of HTML::Parser. 
 From the HTML::Parser documentation:

$p->attr_encoded
$p->attr_encoded( $bool )
      By default, the "attr" and @attr argspecs will have general enti-
      ties for attribute values decoded.  Enabling this attribute leaves
      entities alone.

So in the new() function of HTMLCleaner.pm I added this line:

$self->attr_encoded(1);

I'm not sure what the best place for this really is.

Let me know if you want a diff or patch or the actual file with the one line added.

jlm17 wrote:
> I didn't look at the EtText code. I'm not using EtText. That made me 
> think, I'm not putting format="text/html" in my <content> tags, and that 
> maybe it was defaulting to "text/et". So I have a small example, and I 
> put the format="text/html" into it and it still converts &amp; to &. For 
> reference here is what I am doing:
> 
> <webmake>
> <content name=dud format="text/html">
> <html><body>
> <a href="http://nowhere.com/nofile.pl?&amp;htqdb">test</a>
> </body></html>
> </content>
> <out name="dud" file="dud2.html">
>   ${dud}
> </out>
> 
> I'm running webmake through the perl debugger. I'll let you know if I 
> find something.
> 
> Robert Echlin wrote:
> 
>> Thanks for that information, jlm.
>> I will watch for that.
>>
>> Did you check in the EtText code?
>>
>> Robert
>>
>> jlm17 wrote:
>>
>>> It appears that webmake is converting my &amp; entities into &. This 
>>> is actually incorrect behavior:
>>>
>>>> Ampersands (&'s) in URLs
>>>>
>>>> Another common error occurs when including a URL which contains an 
>>>> ampersand ("&"):
>>>>
>>>> <!-- This is invalid! --> <a href="foo.cgi?chapter=1&section=2">...</a>
>>>>
>>>> This example generates an error for "unknown entity section" because 
>>>> the "&" is assumed to begin 
>>>
>>>
>>>
>>>  >an entity. In many cases, browsers will recover safely from the 
>>> error, but the example used here
>>>  >will cause the link to fail in Netscape 3.x (but not other versions 
>>> of Netscape) since it will
>>>  >assume that the author intended to write &sect;ion, which is 
>>> equivalent to §ion.
>>>
>>>>
>>>> To avoid problems with both validators and browsers, always use 
>>>> &amp; in place of &:
>>>>
>>>> <a href="foo.cgi?chapter=1&amp;section=2">...</a>
>>>
>>>
>>>
>>>
>>> The above is from http://www.htmlhelp.com/tools/validator/problems.html
>>>
>>> So far I have been unlucky in finding out where in the webmake code 
>>> this is actually happening.
>>> Everywhere that I see an &amp; in the code it is actually converting 
>>> TO it instead of from it.
>>>
>>> _______________________________________________
>>> Webmake-talk mailing list
>>> Webmake-talk at taint.org
>>> http://webmake.taint.org/mailman/listinfo/webmake-talk
>>>
>>
> _______________________________________________
> Webmake-talk mailing list
> Webmake-talk at taint.org
> http://webmake.taint.org/mailman/listinfo/webmake-talk


More information about the Webmake-talk mailing list