comp.lang.ada
 help / color / mirror / Atom feed
* Those annoying HMTL entities from Google Groups
@ 2012-07-15 16:13 Simon Wright
  2012-07-17 10:41 ` Stephen Leake
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-15 16:13 UTC (permalink / raw)


You know how, of late, there have been a lot of HTML entities (for
example, ", ', > for ", ', and > respectively) in postings
from people who're using Google Groups? Well, I haven't worked out how
to translate them while reading, but if you're using Emacs you should be
able to translate them while replying using this - probably rubbish
- Elisp (which I haven't tidied up):

   (defun replace-html-entities-region (start end)
     "Replace “&lt;” by “<”, etc. This works on the current region."
     (interactive "r")
     (save-restriction
       (narrow-to-region start end)
       (goto-char (point-min))
       (while (re-search-forward "&\\([^&;]*\\);" nil t)
         (let ((e (match-string 1)))
           (replace-match (replace--entity e) nil nil))
         )
       )
     )

   (defun replace--entity (e)
     (cond
      ((equal e "amp") "&")
      ((equal e "apos") "'")
      ((equal e "gt") ">")
      ((equal e "lt") "<")
      ((equal e "quot") "\"")
      ((equal (substring e 0 1) "#")
       (char-to-string (string-to-number (substring e 1))))
      (t (concat "&" e ";"))
      )
     )

In the reply buffer,

C-x h M-x replace-html-entities-region

(you may need to do this more than once!)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
@ 2012-07-17 10:41 ` Stephen Leake
  2012-07-17 13:13   ` Simon Wright
  2012-07-20 16:15 ` Adam Beneschan
  2012-07-21  0:06 ` Jerry
  2 siblings, 1 reply; 8+ messages in thread
From: Stephen Leake @ 2012-07-17 10:41 UTC (permalink / raw)


Simon Wright <simon@pushface.org> writes:

> You know how, of late, there have been a lot of HTML entities (for
> example, &quot;, &#39;, &gt; for ", ', and > respectively) in postings
> from people who're using Google Groups? Well, I haven't worked out how
> to translate them while reading, 

There's already a package for this; html2text. 

I've enhanced it for use at work, where I read Outlook generated email
with Emacs:

(require 'html2text)
(add-to-list 'html2text-replace-list (cons "&#146;" "'"))
(add-to-list 'html2text-replace-list (cons "&#39;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8211;" "-"))
(add-to-list 'html2text-replace-list (cons "&#8216;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8217;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8220;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8221;" "'"))
(add-to-list 'html2text-replace-list (cons "&#39;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8230;" "..."))
(add-to-list 'html2text-replace-list (cons "&#8211;" "-"))

(add-to-list 'html2text-remove-tag-list "sup")

(setq html2text-remove-tag-list (delete "br" html2text-remove-tag-list))
(add-to-list 'html2text-remove-tag-list "style")
(add-to-list 'html2text-remove-tag-list "span")

(defun html2text-clean-newline (p1 p2 p3 p4)
  (html2text-delete-tags p1 p2 p3 p4)
  (newline))

(add-to-list 'html2text-format-tag-list
	     (cons "o:p" 'html2text-clean-newline))

(add-to-list 'html2text-format-tag-list
	     (cons "br" 'html2text-clean-newline))

(defun html2text-delete-comment ()
  (interactive)
  (let ((buffer-read-only))
    (goto-char (point-min))
    (while (re-search-forward "<!" (point-max) t)
      (delete-region (match-beginning 0)
		     (re-search-forward ">" (point-max) t)))))

(defun html2text-delete-xml ()
  (interactive)
  (let ((buffer-read-only))
    (goto-char (point-min))
    (while (re-search-forward "<xml>" (point-max) t)
      (delete-region (match-beginning 0)
		     (re-search-forward "</xml>" (point-max) t)))))

(defun html-clean ()
  (interactive)
  (html2text)
  (html2text-delete-comment)
  (html2text-delete-xml))


In a buffer with html: M-x html-clean

Do that before replying.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-17 10:41 ` Stephen Leake
@ 2012-07-17 13:13   ` Simon Wright
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-17 13:13 UTC (permalink / raw)


Stephen Leake <stephen_leake@stephe-leake.org> writes:

> Simon Wright <simon@pushface.org> writes:
>
>> You know how, of late, there have been a lot of HTML entities (for
>> example, &quot;, &#39;, &gt; for ", ', and > respectively) in postings
>> from people who're using Google Groups? Well, I haven't worked out how
>> to translate them while reading, 
>
> There's already a package for this; html2text. 

There usually is! thanks ...



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
  2012-07-17 10:41 ` Stephen Leake
@ 2012-07-20 16:15 ` Adam Beneschan
  2012-07-21  0:06 ` Jerry
  2 siblings, 0 replies; 8+ messages in thread
From: Adam Beneschan @ 2012-07-20 16:15 UTC (permalink / raw)


On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
> You know how, of late, there have been a lot of HTML entities (for
> example, &amp;quot;, &amp;#39;, &amp;gt; for &quot;, &#39;, and 
> &gt;respectively) in postings
> from people who&#39;re using Google Groups? 

For what it's worth (likely, 0.0), I submitted a bug report to them, and I'd be shocked if they didn't have hundreds or thousands of the same report.  I'm also somewhat shocked that they haven't done anything about this--even just by backing out their latest update--since this is such a serious problem.  

I dunno.  I'm beginning to think that Google has abandoned their famous corporate motto.  (Well, maybe that's too harsh, since it appears that they've kept the majority of it.  2/3 of it, to be precise.)

                            -- Adam



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
  2012-07-17 10:41 ` Stephen Leake
  2012-07-20 16:15 ` Adam Beneschan
@ 2012-07-21  0:06 ` Jerry
  2012-07-21  0:18   ` Adam Beneschan
                     ` (2 more replies)
  2 siblings, 3 replies; 8+ messages in thread
From: Jerry @ 2012-07-21  0:06 UTC (permalink / raw)


On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
> You know how, of late, there have been a lot of HTML entities (for
> example, &amp;quot;, &amp;#39;, &amp;gt; for &quot;, &#39;, and &gt; respectively) in postings
> from people who&#39;re using Google Groups?

That would include me. :-/ My ISP, CenturyLink, formerly Qwest Communications, which is a major U.S. telecom company and what was one of the "mini-Bell" spinoffs a number of years ago, does not provide usenet. Really. Thus I use Google Groups and its rudimentary newsreader. And Google is now forcing users to a new system and it might be the new system which is causing the pain.

Jerry



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-21  0:06 ` Jerry
@ 2012-07-21  0:18   ` Adam Beneschan
  2012-07-21  9:52   ` Manuel Gomez
  2012-07-21 15:32   ` Simon Wright
  2 siblings, 0 replies; 8+ messages in thread
From: Adam Beneschan @ 2012-07-21  0:18 UTC (permalink / raw)


On Friday, July 20, 2012 5:06:48 PM UTC-7, Jerry wrote:

> That would include me. :-/ My ISP, CenturyLink, formerly Qwest Communications, which is a major U.S. telecom company and what was one of the "mini-Bell" spinoffs a number of years ago, does not provide usenet. Really. Thus I use Google Groups and its rudimentary newsreader. And Google is now forcing users to a new system and it might be the new system which is causing the pain.

I don't know what you mean by a "new system"...  There has been a "New Google Groups" around for many months now, but this bug was just introduced recently.  The quoting was working fine until a week or two ago.  So I don't think the "new system" is the cause of the pain; I think they just plain screwed up.  Probably because they didn't write their software in Ada. :)

                           -- Adam





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-21  0:06 ` Jerry
  2012-07-21  0:18   ` Adam Beneschan
@ 2012-07-21  9:52   ` Manuel Gomez
  2012-07-21 15:32   ` Simon Wright
  2 siblings, 0 replies; 8+ messages in thread
From: Manuel Gomez @ 2012-07-21  9:52 UTC (permalink / raw)


El 21/07/12 02:06, Jerry escribi�:
> On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
>> You know how, of late, there have been a lot of HTML entities (for
>> example, &amp;quot;, &amp;#39;, &amp;gt; for &quot;, &#39;, and
>> &gt; respectively) in postings from people who&#39;re using Google
>> Groups?
>
> That would include me. :-/ My ISP, CenturyLink, formerly Qwest
> Communications, which is a major U.S. telecom company and what was
> one of the "mini-Bell" spinoffs a number of years ago, does not
> provide usenet. Really. Thus I use Google Groups and its rudimentary
> newsreader. And Google is now forcing users to a new system and it
> might be the new system which is causing the pain.
>
> Jerry
>

The time of news servers provided by Internet access provider companies
seem to have passed. But you can still use a free news server; I use
news.aioe.org without problem for accessing this newsgroup.

Regards.

Manuel



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Those annoying HMTL entities from Google Groups
  2012-07-21  0:06 ` Jerry
  2012-07-21  0:18   ` Adam Beneschan
  2012-07-21  9:52   ` Manuel Gomez
@ 2012-07-21 15:32   ` Simon Wright
  2 siblings, 0 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-21 15:32 UTC (permalink / raw)


Jerry <lanceboyle@qwest.net> writes:

> My ISP, CenturyLink, formerly Qwest Communications, which is a major
> U.S. telecom company and what was one of the "mini-Bell" spinoffs a
> number of years ago, does not provide usenet

I'm using news.eternal-september.org. Text-only.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-26 22:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake
2012-07-17 13:13   ` Simon Wright
2012-07-20 16:15 ` Adam Beneschan
2012-07-21  0:06 ` Jerry
2012-07-21  0:18   ` Adam Beneschan
2012-07-21  9:52   ` Manuel Gomez
2012-07-21 15:32   ` Simon Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox