From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: Those annoying HMTL entities from Google Groups
Date: Tue, 17 Jul 2012 06:41:34 -0400
Date: 2012-07-17T06:41:34-04:00 [thread overview]
Message-ID: <85liiiy8ip.fsf@stephe-leake.org> (raw)
In-Reply-To: m2vchpuhny.fsf@nidhoggr.home
Simon Wright <simon@pushface.org> writes:
> You know how, of late, there have been a lot of HTML entities (for
> example, ", ', > for ", ', and > respectively) in postings
> from people who're using Google Groups? Well, I haven't worked out how
> to translate them while reading,
There's already a package for this; html2text.
I've enhanced it for use at work, where I read Outlook generated email
with Emacs:
(require 'html2text)
(add-to-list 'html2text-replace-list (cons "’" "'"))
(add-to-list 'html2text-replace-list (cons "'" "'"))
(add-to-list 'html2text-replace-list (cons "–" "-"))
(add-to-list 'html2text-replace-list (cons "‘" "'"))
(add-to-list 'html2text-replace-list (cons "’" "'"))
(add-to-list 'html2text-replace-list (cons "“" "'"))
(add-to-list 'html2text-replace-list (cons "”" "'"))
(add-to-list 'html2text-replace-list (cons "'" "'"))
(add-to-list 'html2text-replace-list (cons "…" "..."))
(add-to-list 'html2text-replace-list (cons "–" "-"))
(add-to-list 'html2text-remove-tag-list "sup")
(setq html2text-remove-tag-list (delete "br" html2text-remove-tag-list))
(add-to-list 'html2text-remove-tag-list "style")
(add-to-list 'html2text-remove-tag-list "span")
(defun html2text-clean-newline (p1 p2 p3 p4)
(html2text-delete-tags p1 p2 p3 p4)
(newline))
(add-to-list 'html2text-format-tag-list
(cons "o:p" 'html2text-clean-newline))
(add-to-list 'html2text-format-tag-list
(cons "br" 'html2text-clean-newline))
(defun html2text-delete-comment ()
(interactive)
(let ((buffer-read-only))
(goto-char (point-min))
(while (re-search-forward "<!" (point-max) t)
(delete-region (match-beginning 0)
(re-search-forward ">" (point-max) t)))))
(defun html2text-delete-xml ()
(interactive)
(let ((buffer-read-only))
(goto-char (point-min))
(while (re-search-forward "<xml>" (point-max) t)
(delete-region (match-beginning 0)
(re-search-forward "</xml>" (point-max) t)))))
(defun html-clean ()
(interactive)
(html2text)
(html2text-delete-comment)
(html2text-delete-xml))
In a buffer with html: M-x html-clean
Do that before replying.
--
-- Stephe
next prev parent reply other threads:[~2012-07-25 0:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake [this message]
2012-07-17 13:13 ` Simon Wright
2012-07-20 16:15 ` Adam Beneschan
2012-07-21 0:06 ` Jerry
2012-07-21 0:18 ` Adam Beneschan
2012-07-21 9:52 ` Manuel Gomez
2012-07-21 15:32 ` Simon Wright
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox