comp.lang.ada
 help / color / mirror / Atom feed
From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: Those annoying HMTL entities from Google Groups
Date: Tue, 17 Jul 2012 06:41:34 -0400
Date: 2012-07-17T06:41:34-04:00	[thread overview]
Message-ID: <85liiiy8ip.fsf@stephe-leake.org> (raw)
In-Reply-To: m2vchpuhny.fsf@nidhoggr.home

Simon Wright <simon@pushface.org> writes:

> You know how, of late, there have been a lot of HTML entities (for
> example, &quot;, &#39;, &gt; for ", ', and > respectively) in postings
> from people who're using Google Groups? Well, I haven't worked out how
> to translate them while reading, 

There's already a package for this; html2text. 

I've enhanced it for use at work, where I read Outlook generated email
with Emacs:

(require 'html2text)
(add-to-list 'html2text-replace-list (cons "&#146;" "'"))
(add-to-list 'html2text-replace-list (cons "&#39;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8211;" "-"))
(add-to-list 'html2text-replace-list (cons "&#8216;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8217;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8220;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8221;" "'"))
(add-to-list 'html2text-replace-list (cons "&#39;" "'"))
(add-to-list 'html2text-replace-list (cons "&#8230;" "..."))
(add-to-list 'html2text-replace-list (cons "&#8211;" "-"))

(add-to-list 'html2text-remove-tag-list "sup")

(setq html2text-remove-tag-list (delete "br" html2text-remove-tag-list))
(add-to-list 'html2text-remove-tag-list "style")
(add-to-list 'html2text-remove-tag-list "span")

(defun html2text-clean-newline (p1 p2 p3 p4)
  (html2text-delete-tags p1 p2 p3 p4)
  (newline))

(add-to-list 'html2text-format-tag-list
	     (cons "o:p" 'html2text-clean-newline))

(add-to-list 'html2text-format-tag-list
	     (cons "br" 'html2text-clean-newline))

(defun html2text-delete-comment ()
  (interactive)
  (let ((buffer-read-only))
    (goto-char (point-min))
    (while (re-search-forward "<!" (point-max) t)
      (delete-region (match-beginning 0)
		     (re-search-forward ">" (point-max) t)))))

(defun html2text-delete-xml ()
  (interactive)
  (let ((buffer-read-only))
    (goto-char (point-min))
    (while (re-search-forward "<xml>" (point-max) t)
      (delete-region (match-beginning 0)
		     (re-search-forward "</xml>" (point-max) t)))))

(defun html-clean ()
  (interactive)
  (html2text)
  (html2text-delete-comment)
  (html2text-delete-xml))


In a buffer with html: M-x html-clean

Do that before replying.

-- 
-- Stephe



  reply	other threads:[~2012-07-25  0:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-15 16:13 Those annoying HMTL entities from Google Groups Simon Wright
2012-07-17 10:41 ` Stephen Leake [this message]
2012-07-17 13:13   ` Simon Wright
2012-07-20 16:15 ` Adam Beneschan
2012-07-21  0:06 ` Jerry
2012-07-21  0:18   ` Adam Beneschan
2012-07-21  9:52   ` Manuel Gomez
2012-07-21 15:32   ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox