* Those annoying HMTL entities from Google Groups
@ 2012-07-15 16:13 Simon Wright
2012-07-17 10:41 ` Stephen Leake
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-15 16:13 UTC (permalink / raw)
You know how, of late, there have been a lot of HTML entities (for
example, ", ', > for ", ', and > respectively) in postings
from people who're using Google Groups? Well, I haven't worked out how
to translate them while reading, but if you're using Emacs you should be
able to translate them while replying using this - probably rubbish
- Elisp (which I haven't tidied up):
(defun replace-html-entities-region (start end)
"Replace “<” by “<”, etc. This works on the current region."
(interactive "r")
(save-restriction
(narrow-to-region start end)
(goto-char (point-min))
(while (re-search-forward "&\\([^&;]*\\);" nil t)
(let ((e (match-string 1)))
(replace-match (replace--entity e) nil nil))
)
)
)
(defun replace--entity (e)
(cond
((equal e "amp") "&")
((equal e "apos") "'")
((equal e "gt") ">")
((equal e "lt") "<")
((equal e "quot") "\"")
((equal (substring e 0 1) "#")
(char-to-string (string-to-number (substring e 1))))
(t (concat "&" e ";"))
)
)
In the reply buffer,
C-x h M-x replace-html-entities-region
(you may need to do this more than once!)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
@ 2012-07-17 10:41 ` Stephen Leake
2012-07-17 13:13 ` Simon Wright
2012-07-20 16:15 ` Adam Beneschan
2012-07-21 0:06 ` Jerry
2 siblings, 1 reply; 8+ messages in thread
From: Stephen Leake @ 2012-07-17 10:41 UTC (permalink / raw)
Simon Wright <simon@pushface.org> writes:
> You know how, of late, there have been a lot of HTML entities (for
> example, ", ', > for ", ', and > respectively) in postings
> from people who're using Google Groups? Well, I haven't worked out how
> to translate them while reading,
There's already a package for this; html2text.
I've enhanced it for use at work, where I read Outlook generated email
with Emacs:
(require 'html2text)
(add-to-list 'html2text-replace-list (cons "’" "'"))
(add-to-list 'html2text-replace-list (cons "'" "'"))
(add-to-list 'html2text-replace-list (cons "–" "-"))
(add-to-list 'html2text-replace-list (cons "‘" "'"))
(add-to-list 'html2text-replace-list (cons "’" "'"))
(add-to-list 'html2text-replace-list (cons "“" "'"))
(add-to-list 'html2text-replace-list (cons "”" "'"))
(add-to-list 'html2text-replace-list (cons "'" "'"))
(add-to-list 'html2text-replace-list (cons "…" "..."))
(add-to-list 'html2text-replace-list (cons "–" "-"))
(add-to-list 'html2text-remove-tag-list "sup")
(setq html2text-remove-tag-list (delete "br" html2text-remove-tag-list))
(add-to-list 'html2text-remove-tag-list "style")
(add-to-list 'html2text-remove-tag-list "span")
(defun html2text-clean-newline (p1 p2 p3 p4)
(html2text-delete-tags p1 p2 p3 p4)
(newline))
(add-to-list 'html2text-format-tag-list
(cons "o:p" 'html2text-clean-newline))
(add-to-list 'html2text-format-tag-list
(cons "br" 'html2text-clean-newline))
(defun html2text-delete-comment ()
(interactive)
(let ((buffer-read-only))
(goto-char (point-min))
(while (re-search-forward "<!" (point-max) t)
(delete-region (match-beginning 0)
(re-search-forward ">" (point-max) t)))))
(defun html2text-delete-xml ()
(interactive)
(let ((buffer-read-only))
(goto-char (point-min))
(while (re-search-forward "<xml>" (point-max) t)
(delete-region (match-beginning 0)
(re-search-forward "</xml>" (point-max) t)))))
(defun html-clean ()
(interactive)
(html2text)
(html2text-delete-comment)
(html2text-delete-xml))
In a buffer with html: M-x html-clean
Do that before replying.
--
-- Stephe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-17 10:41 ` Stephen Leake
@ 2012-07-17 13:13 ` Simon Wright
0 siblings, 0 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-17 13:13 UTC (permalink / raw)
Stephen Leake <stephen_leake@stephe-leake.org> writes:
> Simon Wright <simon@pushface.org> writes:
>
>> You know how, of late, there have been a lot of HTML entities (for
>> example, ", ', > for ", ', and > respectively) in postings
>> from people who're using Google Groups? Well, I haven't worked out how
>> to translate them while reading,
>
> There's already a package for this; html2text.
There usually is! thanks ...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake
@ 2012-07-20 16:15 ` Adam Beneschan
2012-07-21 0:06 ` Jerry
2 siblings, 0 replies; 8+ messages in thread
From: Adam Beneschan @ 2012-07-20 16:15 UTC (permalink / raw)
On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
> You know how, of late, there have been a lot of HTML entities (for
> example, &quot;, &#39;, &gt; for ", ', and
> >respectively) in postings
> from people who're using Google Groups?
For what it's worth (likely, 0.0), I submitted a bug report to them, and I'd be shocked if they didn't have hundreds or thousands of the same report. I'm also somewhat shocked that they haven't done anything about this--even just by backing out their latest update--since this is such a serious problem.
I dunno. I'm beginning to think that Google has abandoned their famous corporate motto. (Well, maybe that's too harsh, since it appears that they've kept the majority of it. 2/3 of it, to be precise.)
-- Adam
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake
2012-07-20 16:15 ` Adam Beneschan
@ 2012-07-21 0:06 ` Jerry
2012-07-21 0:18 ` Adam Beneschan
` (2 more replies)
2 siblings, 3 replies; 8+ messages in thread
From: Jerry @ 2012-07-21 0:06 UTC (permalink / raw)
On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
> You know how, of late, there have been a lot of HTML entities (for
> example, &quot;, &#39;, &gt; for ", ', and > respectively) in postings
> from people who're using Google Groups?
That would include me. :-/ My ISP, CenturyLink, formerly Qwest Communications, which is a major U.S. telecom company and what was one of the "mini-Bell" spinoffs a number of years ago, does not provide usenet. Really. Thus I use Google Groups and its rudimentary newsreader. And Google is now forcing users to a new system and it might be the new system which is causing the pain.
Jerry
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-21 0:06 ` Jerry
@ 2012-07-21 0:18 ` Adam Beneschan
2012-07-21 9:52 ` Manuel Gomez
2012-07-21 15:32 ` Simon Wright
2 siblings, 0 replies; 8+ messages in thread
From: Adam Beneschan @ 2012-07-21 0:18 UTC (permalink / raw)
On Friday, July 20, 2012 5:06:48 PM UTC-7, Jerry wrote:
> That would include me. :-/ My ISP, CenturyLink, formerly Qwest Communications, which is a major U.S. telecom company and what was one of the "mini-Bell" spinoffs a number of years ago, does not provide usenet. Really. Thus I use Google Groups and its rudimentary newsreader. And Google is now forcing users to a new system and it might be the new system which is causing the pain.
I don't know what you mean by a "new system"... There has been a "New Google Groups" around for many months now, but this bug was just introduced recently. The quoting was working fine until a week or two ago. So I don't think the "new system" is the cause of the pain; I think they just plain screwed up. Probably because they didn't write their software in Ada. :)
-- Adam
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-21 0:06 ` Jerry
2012-07-21 0:18 ` Adam Beneschan
@ 2012-07-21 9:52 ` Manuel Gomez
2012-07-21 15:32 ` Simon Wright
2 siblings, 0 replies; 8+ messages in thread
From: Manuel Gomez @ 2012-07-21 9:52 UTC (permalink / raw)
El 21/07/12 02:06, Jerry escribi�:
> On Sunday, July 15, 2012 9:13:05 AM UTC-7, Simon Wright wrote:
>> You know how, of late, there have been a lot of HTML entities (for
>> example, &quot;, &#39;, &gt; for ", ', and
>> > respectively) in postings from people who're using Google
>> Groups?
>
> That would include me. :-/ My ISP, CenturyLink, formerly Qwest
> Communications, which is a major U.S. telecom company and what was
> one of the "mini-Bell" spinoffs a number of years ago, does not
> provide usenet. Really. Thus I use Google Groups and its rudimentary
> newsreader. And Google is now forcing users to a new system and it
> might be the new system which is causing the pain.
>
> Jerry
>
The time of news servers provided by Internet access provider companies
seem to have passed. But you can still use a free news server; I use
news.aioe.org without problem for accessing this newsgroup.
Regards.
Manuel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Those annoying HMTL entities from Google Groups
2012-07-21 0:06 ` Jerry
2012-07-21 0:18 ` Adam Beneschan
2012-07-21 9:52 ` Manuel Gomez
@ 2012-07-21 15:32 ` Simon Wright
2 siblings, 0 replies; 8+ messages in thread
From: Simon Wright @ 2012-07-21 15:32 UTC (permalink / raw)
Jerry <lanceboyle@qwest.net> writes:
> My ISP, CenturyLink, formerly Qwest Communications, which is a major
> U.S. telecom company and what was one of the "mini-Bell" spinoffs a
> number of years ago, does not provide usenet
I'm using news.eternal-september.org. Text-only.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-07-26 22:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake
2012-07-17 13:13 ` Simon Wright
2012-07-20 16:15 ` Adam Beneschan
2012-07-21 0:06 ` Jerry
2012-07-21 0:18 ` Adam Beneschan
2012-07-21 9:52 ` Manuel Gomez
2012-07-21 15:32 ` Simon Wright
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox