Outil de sauvegarde de mail en pdf + pièces jointes

JujuLand · Le 16/10/2016, à 15:20

Ayant eu à enregistrer en pdf (mode texte) des mails de fichier mbox et à en extraire les pièces jointes, j'ai été amené à modifier le script python 'mbox-extract-attachments' de Pablo Castellano datant de 2012 et qui ne faisait que l'extraction.

Comme le copier/coller semble mettre l'indentation en vrac, ce qui est catastrophisue, je mets ici le lien de l'archive.

Version: 1.50 du 13/11/2016
Principales caractéristiques:

- Décodage des mails, sujets, destinaitaires (to et cc) encodés en ISO-8859-1 ou UTF-8 et html
- Enregistrement des mails en pdf, avec le sujet du mail dans le nom de fichier, avec gestion des sujets identiques pour ne pas écraser un pdf par un autre. ( Exemple : Mail_sujet_du_mail.pdf.1 ).
- Le pdf comporte en outre la liste des pièces jointes cliquables ou non.
- Enregistrement des pièces jointes avec gestion des pièces ayant un nom identique.
- Les pièces et mails sont enregistrés dans le même dossier.
- Liens mailto de From, To, Cc cliquables.
- les messages sont en couleur pour une meilleure lisibilité.
- Possibilité de remplacer dans le dossier target, le nom du user pour faire des traitements sur un ordi autre que celui de destination ou pour générer un pdf avec un user différent (voir dans le README de l'archive, l'utilisation de ce paramètre).
- Fichier de configuration gérant les paramètres par défaut:
- l'utilisation ou non du paramètre du chemin d'accès au fichier mbox
- l'utilisation ou non du paramètre du chemin de destination
- la génération ou non des liens cliquables
- le type de traitement à exécuter
- la création automatique ou non du dossier target
- le nettoyage des accents dans le nom du dossier target
- le chemin d'accès au fichier mbox
- le chemin de destination du traitement
- le nom de user à spécifier dans les liens
- le chemin de destination des logs

Exemple de fichier config qui s'appelle .mbox2pdf et se situe dans le dossier utilisateur

auto_mbox=1
auto_target=1
no_accent=1
auto_create=1
link_att=1
treat=tout
mbox_path=/home/alain/.thunderbird/crtothve.default/Mail/Local Folders/Archives.sbd/
target_path=/home/alain/Bureau/Archives_GADEL/split/
move_user=gadel
log_path=/home/alain/Bureau/logs/

Remarques concernant ce fichier:
- le fichier peut être absent.
- l'ordre des lignes est figé et doit être celui présenté dans l'exemple
- chaque ligne peut être vide ou absente, mais les lignes 1 à 4 doivent toujours précéder les chemins (lignes 7 et 8)
- les valeurs de treat sont les suivantes : pdf / att / tout (en minuscule)
- mbox_path, target_path et log_path doivent être terminés par un slash (/)
- move_user permet de générer pour un autre utilisateur (remplacement dans les liens)
- auto_create crée le dossier target si nécessaire, pour ne plus avoir à le créer avant utilisation
- no_accent nettoie le nom du répertoire target de ses accents pour avoir des liens cliquables dans les pdf
- seule la valeur de treat peut être modifiée en ligne de commande
- dans le cas d'absence de ligne ou carremment de fichier les valeurs par défaut sont les suivantes:
- auto_mbox => 0
- auto_target => 0
- no_accent => 0
- auto_create => 0
- link_att => 0
- treat => att
- mbox_path => ""
- target_path => ""
- move_user => ""

Source du script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# mbox-extract-attachments.py - Extract attachments from mbox files - 16/March/2012
# Copyright (C) 2012 Pablo Castellano <pablo@anche.no>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#

# Notes (RFC 1341):
# The use of a Content-Type of multipart in a body part within another multipart entity is explicitly allowed.
#     In such cases, for obvious reasons, care must be taken to ensure that each nested multipart entity must
#     use a different boundary delimiter. See Appendix C for an example of nested multipart entities. 
# The use of the multipart Content-Type with only a single body part may be useful in certain contexts, and is
#     explicitly permitted. 
# The only mandatory parameter for the multipart Content-Type is the boundary parameter, which consists of 1 to 70
#     characters from a set of characters known to be very robust through email gateways, and NOT ending with white
#     space. (If a boundary appears to end with white space, the white space must be presumed to have been added by
#     a gateway, and should be deleted.) It is formally specified by the following BNF

# Related RFCs: 2047, 2044, 1522


__author__ = "Pablo Castellano <pablo@anche.no>"
__license__ = "GNU GPLv3+"
__version__ = 1.2
__date__ = "12/04/2012"
__extend_author__ = "Alain Aupeix <alain.aupeix@wanadoo.fr>"
__extended__ = 1.50
__extend_date__ = "13/11/2016"

import mailbox
import base64
import os
import sys
import email
import subprocess
import string
from string import upper
import re
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm
from reportlab.platypus import Paragraph, SimpleDocTemplate, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.styles import ParagraphStyle
from reportlab.pdfbase import pdfmetrics

# Normal
gBla = '\033[1;30m'
gRed = '\033[1;31m'
gGre = '\033[1;32m'
gYel = '\033[1;33m'
gBlu = '\033[1;34m'
gMag = '\033[1;35m'
gCya = '\033[1;36m'
gWhi = '\033[1;37m'
# Sans couleur
noColor = '\033[0m'


BLACKLIST = ('signature.asc', 'message-footer.txt', 'smime.p7s')
VERBOSE = 1

attachments = 0 #Count extracted attachment
skipped = 0

###################################################################################
def Join_line(texte):
###################################################################################
	texte=re.sub("\r","",texte)
	texte=re.sub("\n","",texte)
	texte=re.sub("\?= =\?UTF-8\?Q\?","",texte)
	texte=re.sub("\?= =\?utf-8\?q\?","",texte)
	texte=re.sub("\?= =\?utf-8\?Q\?","",texte)
	texte=re.sub("\?= =\?UTF-8\?B\?","",texte)
	texte=re.sub("\?= =\?utf-8\?b\?","",texte)
	texte=re.sub("\?= =\?utf-8\?B\?","",texte)
	texte=re.sub("\?= =\?ISO-8859-1\?Q\?","",texte)
	texte=re.sub("\?= =\?iso-8859-1\?q\?","",texte)
	texte=re.sub("\?= =\?iso-8859-1\?Q\?","",texte)
	texte=re.sub("\?= =\?ISO-8859-1\?B\?","",texte)
	texte=re.sub("\?= =\?iso-8859-1\?b\?","",texte)
	texte=re.sub("\?= =\?iso-8859-1\?B\?","",texte)
	texte=re.sub("\?= =\?ISO-8859-15\?Q\?","",texte) 
	texte=re.sub("\?= =\?iso-8859-15\?Q\?","",texte) 
	texte=re.sub("\?= =\?ISO-8859-15\?q\?","",texte) 
	texte=re.sub("\?= =\?iso-8859-15\?q\?","",texte) 
	texte=re.sub("\?= =\?windows-1252\?Q\?","",texte)
	texte=re.sub("\?= =\?Windows-1252\?Q\?","",texte)
	texte=re.sub("\?= =\?windows-1256\?Q\?","",texte)
	texte=re.sub("\?= =\?windows-1258\?Q\?","",texte)

	return texte        

###################################################################################
def Clean_codage(texte):
###################################################################################
        texte=re.sub("\n  ","",texte)
        texte=re.sub("\n ","",texte)

	texte=re.sub("=\?utf-8\?q\?","",texte)
	texte=re.sub("=\?utf-8\?Q\?","",texte)
	texte=re.sub("=\?UTF-8\?Q\?","",texte)
	texte=re.sub("=\?iso-8859-1\?q\?","",texte)
	texte=re.sub("=\?iso-8859-1\?Q\?","",texte)
	texte=re.sub("=\?ISO-8859-1\?Q\?","",texte)
	texte=re.sub("=\?ISO-8859-15\?Q\?","",texte) 
	texte=re.sub("=\?Windows-1252\?Q\?","",texte)
	texte=re.sub("=\?windows-1252\?Q\?","",texte)
	texte=re.sub("=\?windows-1256\?Q\?","",texte)
	texte=re.sub("=\?Windows-1256\?Q\?","",texte)
	texte=re.sub("=\?windows-1258\?Q\?","",texte)
	texte=re.sub("=\?Windows-1258\?Q\?","",texte)

	texte=re.sub("=C3=80","A",texte)
	texte=re.sub("=C3=81","A",texte)
	texte=re.sub("=C3=82","A",texte)
	texte=re.sub("=C3=83","A",texte)
	texte=re.sub("=C3=84","A",texte)
	texte=re.sub("=C3=85","A",texte)
	texte=re.sub("=C3=86","Ae",texte)
	texte=re.sub("=C3=87","C",texte)
	texte=re.sub("=C3=88","E",texte)
	texte=re.sub("=C3=89","E",texte)
	texte=re.sub("=C3=8a","E",texte)
	texte=re.sub("=C3=8b","E",texte)
	texte=re.sub("=C3=8C","I",texte)
	texte=re.sub("=C3=8D","I",texte)
	texte=re.sub("=C3=8E","I",texte)
	texte=re.sub("=C3=8F","I",texte)
	texte=re.sub("=C3=91","N",texte)
	texte=re.sub("=C3=92","O",texte)
	texte=re.sub("=C3=93","O",texte)
	texte=re.sub("=C3=94","O",texte)
	texte=re.sub("=C3=95","O",texte)
	texte=re.sub("=C3=96","O",texte)
	texte=re.sub("=C3=99","U",texte)
	texte=re.sub("=C3=9A","U",texte)
	texte=re.sub("=C3=9B","U",texte)
	texte=re.sub("=C3=9C","U",texte)
	texte=re.sub("=C3=9D","Y",texte)
	texte=re.sub("=C3=A0","a",texte)
	texte=re.sub("=c3=a0","a",texte)
	texte=re.sub("=C3=A1","a",texte)
	texte=re.sub("=C3=A2","a",texte)
	texte=re.sub("=C3=A3","a",texte)
	texte=re.sub("=C3=A4","a",texte)
	texte=re.sub("=C3=A5","a",texte)
	texte=re.sub("=C3=A7","c",texte)
	texte=re.sub("=C3=A8","e",texte)
	texte=re.sub("=C3=A9","e",texte)
	texte=re.sub("=c3=a9","e",texte)
	texte=re.sub("=C3=AA","e",texte)
	texte=re.sub("=C3=AB","e",texte)
	texte=re.sub("=C3=AC","ì",texte)
	texte=re.sub("=C3=AD","i",texte)
	texte=re.sub("=C3=AE","i",texte)
	texte=re.sub("=C3=AF","i",texte)
	texte=re.sub("=C3=B1","n",texte)
	texte=re.sub("=C3=B2","o",texte)
	texte=re.sub("=C3=B3","o",texte)
	texte=re.sub("=C3=B4","o",texte)
	texte=re.sub("=C3=B5","o",texte)
	texte=re.sub("=C3=B6","o",texte)
	texte=re.sub("=C3=B9","u",texte)
	texte=re.sub("=C3=BA","u",texte)
	texte=re.sub("=C3=BB","u",texte)
	texte=re.sub("=C3=BC","u",texte)
	texte=re.sub("=C3=BD","y",texte)
	texte=re.sub("=C3=BF","ÿ",texte)

	texte=re.sub("=0A"," ",texte)
	texte=re.sub("=20"," ",texte)
	texte=re.sub("=21"," ",texte)
	texte=re.sub("=22"," ",texte)
	texte=re.sub("=26"," ",texte)
	texte=re.sub("=27"," ",texte)
	texte=re.sub("=28","[",texte)
	texte=re.sub("=29","]",texte)
	texte=re.sub("=2D",".",texte)
	texte=re.sub("=2E"," ",texte)
	texte=re.sub("=3A"," ",texte)
	texte=re.sub("=3B"," ",texte)
	texte=re.sub("=3D"," ",texte)
	texte=re.sub("=3E"," ",texte)
	texte=re.sub("=3F"," ",texte)
	texte=re.sub("=5D"," ",texte)
	texte=re.sub("=5F"," ",texte)
	texte=re.sub("=92"," ",texte) 
	texte=re.sub("=B0","]",texte) 
	texte=re.sub("=AB","[",texte) 
	texte=re.sub("=BB","]",texte) 
	texte=re.sub("=A0"," ",texte) 
	texte=re.sub("=AC"," ",texte) 
	texte=re.sub("=C0","A",texte)
	texte=re.sub("=C1","A",texte)
	texte=re.sub("=C2","A",texte)
	texte=re.sub("=C3","A",texte)
	texte=re.sub("=C4","A",texte)
	texte=re.sub("=C5","A",texte)
	texte=re.sub("=C7","C",texte)
	texte=re.sub("=C8","E",texte)
	texte=re.sub("=C9","E",texte)
	texte=re.sub("=CA","E",texte)
	texte=re.sub("=CB","E",texte)
	texte=re.sub("=CC","I",texte)
	texte=re.sub("=CD","I",texte)
	texte=re.sub("=CE","I",texte)
	texte=re.sub("=CF","I",texte)
	texte=re.sub("=D1","N",texte)
	texte=re.sub("=D2","O",texte)
	texte=re.sub("=D3","O",texte)
	texte=re.sub("=D4","O",texte)
	texte=re.sub("=D5","O",texte)
	texte=re.sub("=D6","O",texte)
	texte=re.sub("=D9","U",texte)
	texte=re.sub("=DA","U",texte)
	texte=re.sub("=DB","U",texte)
	texte=re.sub("=DC","U",texte)
	texte=re.sub("=DD","Y",texte)
	texte=re.sub("=E0","a",texte)
	texte=re.sub("=E1","a",texte)
	texte=re.sub("=E2","a",texte)
	texte=re.sub("=E3","a",texte)
	texte=re.sub("=E4","a",texte)
	texte=re.sub("=E5","a",texte)
	texte=re.sub("=E7","c",texte)
	texte=re.sub("=E8","e",texte)
	texte=re.sub("=E9","e",texte)
	texte=re.sub("=EA","e",texte)
	texte=re.sub("=EB","e",texte)
	texte=re.sub("=EC","i",texte)
	texte=re.sub("=ED","i",texte)
	texte=re.sub("=EE","i",texte)
	texte=re.sub("=EF","i",texte)
	texte=re.sub("=F1","n",texte)
	texte=re.sub("=F2","o",texte)
	texte=re.sub("=F3","o",texte)
	texte=re.sub("=F4","o",texte)
	texte=re.sub("=F5","o",texte)
	texte=re.sub("=F6","o",texte)
	texte=re.sub("=F9","u",texte)
	texte=re.sub("=FA","u",texte)
	texte=re.sub("=FB","u",texte)
	texte=re.sub("=FC","u",texte)
	texte=re.sub("=FD","y",texte)
	texte=re.sub("=FF","ÿ",texte)
	texte=re.sub("=3F","?",texte)
        texte=re.sub("=3A","_",texte)
	texte=re.sub("=2C",",",texte)
	texte=re.sub("=2F","",texte)
	texte=re.sub("\?=","",texte)

	if texte.find("?Q?") != -1 :
	        if re.search("@",texte) is not None and ( re.search("\"",texte) is not None or re.search("'",texte) is not None ):
			texte=re.sub("\"","",texte)
			texte=re.sub("'","",texte)
	if element == "subject" :
		texte=re.sub("/","_",texte)
		texte=re.sub(":","_",texte)
		texte=re.sub("&eacute","é",texte)
		texte=re.sub(" utf-8 Q","",texte)
           
	texte=re.sub("\xc2\x80","euro",texte)
	texte=re.sub("\xc2\x92","'",texte)
	texte=re.sub("\xc2\x96","_",texte)
	texte=re.sub("\xc2\x9c","oe",texte)

	return texte        

###################################################################################
def decode_ansi(texte):
###################################################################################
	texte=re.sub("\x80","",texte)
	texte=re.sub("\x85","",texte)
	texte=re.sub("\x94",'"',texte)
	texte=re.sub("\x93",'"',texte)
	texte=re.sub("\x9c",'oe',texte)
	texte=re.sub("\x99",'',texte)
	texte=re.sub("\xa0","",texte)
	texte=re.sub("\xa6","",texte)
	texte=re.sub("\xb0","'",texte)
	texte=re.sub("\xb9","'",texte)
	texte=re.sub("\xa0","",texte)
	texte=re.sub("\xab","'",texte)
	texte=re.sub("\xb4","",texte)
	texte=re.sub("\xb8","",texte)
	texte=re.sub("\xbb","'",texte)
	texte=re.sub("\xbf","",texte)

	texte=re.sub("\x80","A",texte)
	texte=re.sub("\xc0","A",texte)
	texte=re.sub("\xc1","A",texte)
	texte=re.sub("\xc2","A",texte)
	texte=re.sub("\xc3","A",texte)
	texte=re.sub("\xc4","A",texte)
	texte=re.sub("\xc5","A",texte)
	texte=re.sub("\xc7","C",texte)
	texte=re.sub("\xc8","E",texte)
	texte=re.sub("\xc9","E",texte)
	texte=re.sub("\xca","E",texte)
	texte=re.sub("\xcb","E",texte)
	texte=re.sub("\xcc","I",texte)
	texte=re.sub("\xcd","I",texte)
	texte=re.sub("\xce","I",texte)
	texte=re.sub("\xcf","I",texte)
	texte=re.sub("\xd1","N",texte)
	texte=re.sub("\xd2","O",texte)
	texte=re.sub("\xd3","O",texte)
	texte=re.sub("\xd4","O",texte)
	texte=re.sub("\xd5","O",texte)
	texte=re.sub("\xd6","O",texte)
	texte=re.sub("\xd9","U",texte)
	texte=re.sub("\xda","U",texte)
	texte=re.sub("\xdb","U",texte)
	texte=re.sub("\xdf","U",texte)
	texte=re.sub("\xdd","Y",texte)
	texte=re.sub("\xe0","a",texte)
	texte=re.sub("\xe1","a",texte)
	texte=re.sub("\xe2","a",texte)
	texte=re.sub("\xe3","a",texte)
	texte=re.sub("\xe4","a",texte)
	texte=re.sub("\xe5","a",texte)
	texte=re.sub("\xe7","c",texte)
	texte=re.sub("\xe8","e",texte)
	texte=re.sub("\xe9","e",texte)
	texte=re.sub("\xea","e",texte)
	texte=re.sub("\xeb","e",texte)
	texte=re.sub("\xec","ì",texte)
	texte=re.sub("\xed","í",texte)
	texte=re.sub("\xee","i",texte)
	texte=re.sub("\xef","i",texte)
	texte=re.sub("\xf1","n",texte)
	texte=re.sub("\xf2","o",texte)
	texte=re.sub("\xf3","o",texte)
	texte=re.sub("\xf4","o",texte)
	texte=re.sub("\xf5","o",texte)
	texte=re.sub("\xf6","o",texte)
	texte=re.sub("\xf9","u",texte)
	texte=re.sub("\xfa","u",texte)
	texte=re.sub("\xfb","u",texte)
	texte=re.sub("\xfc","u",texte)
	texte=re.sub("\xfd","y",texte)
	texte=re.sub("\xff","y",texte)

	return texte

###################################################################################
def decode_utf8(texte):
###################################################################################
	texte=re.sub("\xc3\x80","A",texte)
	texte=re.sub("\xc3\x81","A",texte)
	texte=re.sub("\xc3\x82","A",texte)
	texte=re.sub("\xc3\x83","A",texte)
	texte=re.sub("\xc3\x84","A",texte)
	texte=re.sub("\xc3\x85","A",texte)
	texte=re.sub("\xc3\x86","A",texte)
	texte=re.sub("\xc3\x87","C",texte)
	texte=re.sub("\xc3\x88","E",texte)
	texte=re.sub("\xc3\x89","E",texte)
	texte=re.sub("\xc3\x8a","E",texte)
	texte=re.sub("\xc3\x8b","E",texte)
	texte=re.sub("\xc3\x8c","I",texte)
	texte=re.sub("\xc3\x8d","I",texte)
	texte=re.sub("\xc3\x8e","I",texte)
	texte=re.sub("\xc3\x8f","I",texte)
	texte=re.sub("\xc3\x91","N",texte)
	texte=re.sub("\xc3\x92","O",texte)
	texte=re.sub("\xc3\x93","O",texte)
	texte=re.sub("\xc3\x94","O",texte)
	texte=re.sub("\xc3\x95","O",texte)
	texte=re.sub("\xc3\x96","O",texte)
	texte=re.sub("\xc3\x99","U",texte)
	texte=re.sub("\xc3\x9a","U",texte)
	texte=re.sub("\xc3\x9b","U",texte)
	texte=re.sub("\xc3\x9c","U",texte)
	texte=re.sub("\xc3\x9d","Y",texte)
	texte=re.sub("\xc3\xa0","a",texte)
	texte=re.sub("\xc3\xa1","a",texte)
	texte=re.sub("\xc3\xa2","a",texte)
	texte=re.sub("\xc3\xa3","a",texte)
	texte=re.sub("\xc3\xa4","a",texte)
	texte=re.sub("\xc3\xa5","a",texte)
	texte=re.sub("\xc3\xa7","c",texte)
	texte=re.sub("\xc3\xa8","e",texte)
	texte=re.sub("\xc3\xa9","e",texte)
	texte=re.sub("\xc3\xaa","e",texte)
	texte=re.sub("\xc3\xab","e",texte)
	texte=re.sub("\xc3\xac","ì",texte)
	texte=re.sub("\xc3\xad","í",texte)
	texte=re.sub("\xc3\xae","i",texte)
	texte=re.sub("\xc3\xaf","i",texte)
	texte=re.sub("\xc3\xb0","]",texte)
	texte=re.sub("\xc3\xb1","n",texte)
	texte=re.sub("\xc3\xb2","o",texte)
	texte=re.sub("\xc3\xb3","o",texte)
	texte=re.sub("\xc3\xb4","o",texte)
	texte=re.sub("\xc3\xb5","o",texte)
	texte=re.sub("\xc3\xb6","o",texte)
	texte=re.sub("\xc3\xb9","u",texte)
	texte=re.sub("\xc3\xba","u",texte)
	texte=re.sub("\xc3\xbb","u",texte)
	texte=re.sub("\xc3\xbc","u",texte)
	texte=re.sub("\xc3\xbd","y",texte)
	texte=re.sub("\xc3\xbf","y",texte)

	texte=re.sub("A\xc2\xa8","e",texte)
	texte=re.sub("A\xc2\xa9","e",texte)
	texte=re.sub("A\xc2\xaa","e",texte)
	texte=re.sub("A\xc2\xab","e",texte)
#	texte=re.sub("\xc2\x80","euro",texte)

	return texte

###################################################################################
def noAccent(texte):
###################################################################################
	texte=re.sub("À","A",texte)
	texte=re.sub("Á","A",texte)
	texte=re.sub("Â","A",texte)
	texte=re.sub("Ã","A",texte)
	texte=re.sub("Ä","A",texte)
	texte=re.sub("Å","A",texte)
	texte=re.sub("Ç","C",texte)
	texte=re.sub("È","E",texte)
	texte=re.sub("É","E",texte)
	texte=re.sub("Ê","E",texte)
	texte=re.sub("Ë","E",texte)
	texte=re.sub("Ì","I",texte)
	texte=re.sub("Í","I",texte)
	texte=re.sub("Î","I",texte)
	texte=re.sub("Ï","I",texte)
	texte=re.sub("Ñ","N",texte)
	texte=re.sub("Ò","O",texte)
	texte=re.sub("Ó","O",texte)
	texte=re.sub("Ô","O",texte)
	texte=re.sub("Õ","O",texte)
	texte=re.sub("Ö","O",texte)
	texte=re.sub("Ù","U",texte)
	texte=re.sub("Ú","U",texte)
	texte=re.sub("Û","U",texte)
	texte=re.sub("Ü","U",texte)
	texte=re.sub("Ý","Y",texte)
	texte=re.sub("à","a",texte)
	texte=re.sub("á","a",texte)
	texte=re.sub("â","a",texte)
	texte=re.sub("ã","a",texte)
	texte=re.sub("ä","a",texte)
	texte=re.sub("å","a",texte)
	texte=re.sub("ç","c",texte)
	texte=re.sub("è","e",texte)
	texte=re.sub("é","e",texte)
	texte=re.sub("ê","e",texte)
	texte=re.sub("ë","e",texte)
	texte=re.sub("ì","i",texte)
	texte=re.sub("í","i",texte)
	texte=re.sub("î","i",texte)
	texte=re.sub("ï","i",texte)
	texte=re.sub("ñ","n",texte)
	texte=re.sub("ò","o",texte)
	texte=re.sub("ó","o",texte)
	texte=re.sub("ô","o",texte)
	texte=re.sub("õ","o",texte)
	texte=re.sub("ö","o",texte)
	texte=re.sub("ù","u",texte)
	texte=re.sub("ú","u",texte)
	texte=re.sub("û","u",texte)
	texte=re.sub("ü","u",texte)
	texte=re.sub("ý","y",texte)
	texte=re.sub("ÿ","y",texte)

	return texte

# Search for filename or find recursively if it's multipart
###################################################################################
def extract_attachment(payload):
###################################################################################
	global attachments, skipped
	filename = payload.get_filename()
	if filename is not None:
		if treat == "att" or treat == "tout" :
			print "\n%sPièce(s) jointe(s) trouvée(s)!%s" %(gGre, noColor)
		if filename.find('=?') != -1:
			ll = email.header.decode_header(filename)
			filename = ""
			for l in ll:
				filename = filename + l[0]

		if filename in BLACKLIST:
			skipped = skipped + 1
			if (VERBOSE >= 1) and (treat == "att" or treat == "tout") :
				print "%sNon traitée %s%s%s (liste noire)%s\n" %(gCya, gYel, filename, gCya, noColor)
			return

		# Puede no venir especificado el nombre del archivo??		
		#	if filename is None:
		#		filename = "unknown_%d_%d.txt" %(i, p)
		content = payload.as_string()
		# Skip headers, go to the content
		fh = content.find('\n\n')
		content = content[fh:]

		# if it's base64....
		if payload.get('Content-Transfer-Encoding') == 'base64':
			content = base64.decodestring(content)
		# quoted-printable
		# what else? ...

		# Nettoyage du nom de la pièce jointe
		filename=re.sub("\n","",filename)
		filename=re.sub("\r","",filename)
		filename=noAccent(filename)
		filename=decode_utf8(filename)
		filename=decode_ansi(filename)
		filename=re.sub("\x92","_",filename)
		filename=re.sub("\xb0","]",filename)
		filename=re.sub("\xb4","'",filename)

		if treat == "pdf" or treat == "tout" :
			origine=directory+"/"+filename
			cleaned=noAccent(directory+"/"+filename)
			if treat == "pdf":
				print "Pièce jointe : %s%s%s (%s%d%s octets)%s" %(gYel, filename, gGre, gYel, len(content), gGre, noColor)
			if link_att == "1" and treat == "tout" and origine == cleaned :
				if move_user != "" :
					path=re.sub(os.environ["USER"],move_user, directory)
				else:
					path=directory
				cline="Pièce jointe (cliquable): "+'<link href="file://'+path+"/"+filename+'">'+filename+"</link>  ("+str(len(content))+" octets)"
			else :
				cline="Pièce jointe : "+filename+" ("+str(len(content))+" octets)"
			body.append(Spacer(0, cm * .4))
			body.append(Paragraph("----------------------------------------------------------------------------------------------------------------------------------------", bold))
			body.append(Paragraph(cline,bold))
		if treat == "att" or treat == "tout" :
			filename=re.sub("\n","",filename)
			print "%sExtraction de %s%s%s (%s%d%s octets)%s" %(gGre, gYel, filename, gGre, gYel, len(content), gGre, noColor)
			n = 1
			orig_filename = filename
			while os.path.exists(filename):
				filename = orig_filename + "." + str(n)
				n = n+1
			try:
				fp = open(filename, "w")
				fp.write(content)
			except IOError:
				print "%sAbandon, %Erreur d'entrée-sortie!!!%s" %(gRed, gCya, noColor)
				sys.exit(2)
			finally:
				fp.close()	
		
		attachments = attachments + 1
	else:
		if payload.is_multipart():
			for payl in payload.get_payload():
				extract_attachment(payl)

### Main ##########################################################################

name=sys.argv[0]+"/"
prog=name.split('/')
progname=prog[len(prog)-2]
print "\n%s%s %s%s%s" %(gMag, progname, gRed, __version__, noColor)
print "%sExtraire les pièces jointes des fichiers mbox%s" %(gGre, noColor)
print "%sCopyright (C) 2012 %s%s" %(gBlu, __author__, noColor)
print "%sVersion étendue: %s%s%s" %(gMag, gRed, __extended__, noColor)
print "%sPièces jointes + export des mails en pdf%s" %(gGre, noColor)
print "%sCopyright (C) 2016 %s%s" %(gBlu, __extend_author__, noColor)
print

if len(sys.argv) < 3 or len(sys.argv) > 4:
	print "%sUsage: %s%s %s<fichier> <destination> %s[%spdf%s|%satt%s|%stout%s]%s" %(gMag, gYel, progname, gGre, gMag, gYel, gMag, gYel, gMag, gRed, gMag, noColor)
        print
	sys.exit(0)

filename = sys.argv[1]
directory = sys.argv[2]
auto_mbox = "0"
auto_target = "0"
link_att = "0"
treat="att"
mbox_path=""
target_path=""
move_user=""
no_accent="0"
auto_create="0"

# Fichier de config
if os.path.exists(os.environ['HOME']+"/.mbox2pdf") :
	config=os.environ['HOME']+"/.mbox2pdf"
	monfichier=open(config,"r")	
	params=monfichier.read()
	configs=params.split("\n")
	for config in configs :
		if re.search("auto_mbox=",config) is not None :
			auto_mbox=re.sub("auto_mbox=","",config)
		if re.search("auto_target=",config) is not None :
			auto_target=re.sub("auto_target=","",config)
		if re.search("link_att=",config) is not None :
			link_att=re.sub("link_att=","",config)
		if re.search("treat=",config) is not None :
			treat=re.sub("treat=","",config)
		if re.search("auto_create=",config) is not None :
			auto_create=re.sub("auto_create=","",config)
		if re.search("no_accent=",config) is not None :
			no_accent=re.sub("no_accent=","",config)
		if re.search("mbox_path=",config) is not None and auto_mbox == "1" :
			mbox_path=re.sub("mbox_path=","",config)
		if re.search("target_path=",config) is not None and auto_target == "1" :
			target_path=re.sub("target_path=","",config)
		if re.search("move_user=",config) is not None :
			move_user=re.sub("move_user=","",config)

# Test Répertoire mbox
if not os.path.exists(mbox_path) :        # or os.path.isdir(mbox_path):
	print "%sRépertoire inexistant: %s%s%s\n" %(gYel, gRed, mbox_path, noColor)
	sys.exit(1)
else :
	directory=filename
	filename=mbox_path+filename

# Test existence de mbox
if not os.path.exists(filename):
	print "%sFichier inexistant: %s%s%s\n" %(gYel, gRed, filename, noColor) 
	sys.exit(1)
else :
	print "%sFichier mbox: %s%s%s" %(gGre, gYel, filename, noColor)

# Test Répertoire target
if not os.path.exists(target_path) and auto_create != "1" : # or not os.path.isdir(target_path):
	print "%sRépertoire inexistant: %s%s%s\n" %(gYel, gRed, target_path, noColor)
	sys.exit(1)

# Nettoyage de directory
if no_accent == "1" :
	directory=target_path+noAccent(directory)
else :
	directory=target_path+directory

# Création de directory
if not os.path.exists(directory) and auto_create == "1" :
	os.mkdir(directory)	
elif not os.path.exists(directory) :
	print "%sRépertoire inexistant: %s%s%s\n" %(gYel, gRed, directory, noColor)
	sys.exit(1)
else :
	print "%sDossier: %s%s%s" %(gGre, gYel,directory, noColor)

if len(sys.argv) == 4:
	treat = sys.argv[3]

os.chdir(directory)
mb = mailbox.mbox(filename)
nmes = len(mb)

# Styles utilisés
normal = ParagraphStyle(
	name='Normal',
	fontName='Helvetica',
	fontSize=9,
)
bold = ParagraphStyle(
	name='Normal',
	fontName='Helvetica-Bold',
	fontSize=9,
)

# Ne fonctionne pas !!!
italic = ParagraphStyle(
	name='Italic',
	fontName='Helvetica',
	fontSize=9,
)

# Analyse du fichier mbox
for i in range(len(mb)):
	if (VERBOSE >= 1):
		print "--------------\n%sAnalyse du message numéro :%s%d%s"  %(gWhi, gCya, i, noColor)

	mes = mb.get_message(i)
	em = email.message_from_string(mes.as_string())

	element="subject"
	subject = em.get('Subject')
	if subject is None  or len(subject) == 0 :
		sujet="[sans sujet]"
	else:
		sujet=Join_line(subject)
		debut=""
		fin=""
		if re.search("\?Q\?",subject.upper()) is not None :
			subject=Clean_codage(subject)
		bus=subject.split("=?")
		if len(bus) > 1:
			if re.search("=\?",bus[0]) is None :
				debut=bus[0]
				nbus=""
				for l in subject:
                                    if l[0] == "=" or len(nbus) > 0:
 					nbus=nbus+l[0]
				subject=nbus
				nbus2=nbus.split("?=")
				if len(nbus2) == 1:
					subject=subject+"?="
		bus=subject.split("?=")
                if len(bus) > 1 :
			fin=bus[1]
			subject=bus[0]+"?="
	   	subject=re.sub(":","_",subject)
	  	subject=re.sub("=E2=2C=AC","euros",subject)
	  	subject=re.sub("=E2=82=AC","",subject)
	  	subject=re.sub("=C3=A9","e",subject)

		if re.search("\?=",subject) is not None :
			if re.search("\?UTF-8\?",subject.upper()) is not None :
				code="utf-8"
			else:
				code="iso-8859"
			ll = email.header.decode_header(subject)
			sujet = ""
                	keep=0
			for l in ll:
				sujet=l[0]
				break
			if code == "utf-8":
				sujet=decode_utf8(sujet)
			else :
				sujet=decode_ansi(sujet)
			if debut != "" :
				sujet=debut+sujet
			if fin != "" :
				if re.search("=\?",fin) is not None:
					milieu=""
					fin0=""
					for l in fin:
						if l[0] == "=" or len(fin0) > 0:
							fin0=fin0+l[0]
						else :
							milieu=milieu+l[0]
					isfin=fin.split("=?")
					if len(isfin[0]) == 0:
						fin1=fin0
						fin0=milieu
						milieu=fin1
					if len(isfin) > 1:
						fin0=fin0+"\?="
						fin0=re.sub("\?=\?=","\?=",fin0)
					sujet=sujet+milieu
					if re.search("\?=",fin0) is not None :
						if re.search("\?B\?",fin0) is not None :
							ll = email.header.decode_header(fin0)
							fin=""
				                	keep=0
							for l in ll:
								fin=l[0]
								break
							if code == "utf-8":
								fin=decode_utf8(fin)
							else :
								fin=decode_ansi(fin)
							fin0=fin
						else :
							fin0=Clean_codage(fin0)
					sujet=sujet+fin0
				else:
					sujet=sujet+fin
		else :
			sujet=subject
	sujet=re.sub("_"," ",sujet)
	sujet=re.sub("       "," ",sujet)
	sujet=re.sub("      "," ",sujet)
	sujet=re.sub("     "," ",sujet)
	sujet=re.sub("    "," ",sujet)
	sujet=re.sub("   "," ",sujet)
	sujet=re.sub("  "," ",sujet)
	sujet=re.sub("\t"," ",sujet)
	sujet=noAccent(sujet)
	sujet=re.sub("/"," ",sujet)
	sujet=re.sub(":"," ",sujet)
	sujet=re.sub("\"","",sujet)
	sujet=re.sub("=5B","[",sujet)

	sujet=decode_utf8(sujet)
	sujet=decode_ansi(sujet)	

	sujet=re.sub("\x92"," ",sujet)
	sujet=re.sub("\xb0","]",sujet)
	if re.search("=\?",sujet) is not None:
		sujet=Clean_codage(sujet)

	element=""

	em_from = em.get('From')
	if em_from is None or len(em_from) == 0 :
		cfrom = '[Expéditeur inconnu]'
        else:
		em_from=Join_line(em_from)
		debut=""
		fin=""
		milieu=""
		em_from1=""
		em_from2=""
		cfrom1=""
		cfrom2=""
		if re.search("\?Q\?",em_from.upper()) is not None :
			em_from=Clean_codage(em_from)
		bus=em_from.split("=?")
		if len(bus) > 1:
			nbus=""
			for l in em_from:
				if l[0] == "<" or len(fin) > 0:
					fin=fin+l[0]
				else :
					nbus=nbus+l[0]
			em_from0=nbus
			bus2=em_from0.split("=?")
			if len(bus2) > 1:
				debut=bus2[0]
				em_from1="=?"+bus2[1]+"?="
				if len(bus) > 2:
					milieu=bus[2]
					if len(bus) > 3:
						bus3=bus[3].split("=?")
						em_from2="=?"+bus3[0]+"?="
		else :
			em_from1=em_from
		if re.search("\?",em_from) is not None :
			if re.search("\?UTF-8\?",em_from.upper()) is not None :
				code="utf-8"
			else :
				code="iso-8859"

			ll = email.header.decode_header(em_from1)
			cfrom1 = ""
	                keep=0
			for l in ll:
				cfrom1=l[0]
				break
			if code == "utf-8":
				cfrom1=decode_utf8(cfrom1)
			else :
				cfrom1=decode_ansi(cfrom1)
			if em_from2 != "" :
				ll = email.header.decode_header(em_from2)
				cfrom2 = ""
		                keep=0
				for l in ll:
					cfrom2=l[0]
					break
				if code == "utf-8":
					cfrom2=decode_utf8(cfrom2)
				else :
					cfrom2=decode_ansi(cfrom2)
			cfrom1=re.sub("\?","",cfrom1)
			cfrom2=re.sub("\?","",cfrom2)
			cfrom1=re.sub("=","",cfrom1)
			cfrom2=re.sub("=","",cfrom2)
			cfrom=debut+cfrom1+milieu+cfrom2+fin
			if re.search(" <", cfrom) is None :
				cfrom=re.sub("<"," <",cfrom)
                else :
			cfrom=em_from
			if re.search("<",cfrom) is None:
				cfrom="<"+cfrom+">"
	cfrom=re.sub("\"","",cfrom)
	cfrom=re.sub("'","",cfrom)
	cfrom=re.sub("<","[",cfrom)
	cfrom=re.sub(">","]",cfrom)
	cfrom=re.sub("=","",cfrom)

	tfrom=cfrom.split("[")
	if len(tfrom) == 1:
		tmail=tfrom[0]
	else:
		tmail=tfrom[1]
	umail=tmail.split("]")
	mfrom=umail[0]
	dfrom='<a href="mailto:'+mfrom+'"><font color="blue">'+cfrom+"</link>"

	em_to = em.get('To')
	if em_to is None or len(em_to) == 0  :
		em_to = '[Destinataire inconnu]'
        else:
		em_to=re.sub("\r","",em_to)
		em_to=re.sub("\n","",em_to)
		em_to=re.sub("<","[",em_to)
		em_to=re.sub(">","]",em_to)
		em_to=re.sub('"',"",em_to)

	em_Cc = em.get('Cc')
	if em_Cc is None or len(em_Cc) == 0  :
		cCc = 'None'
        else:
		cCc=re.sub("\r","",em_Cc)
		cCc=re.sub("\n","",cCc)
		cCc=re.sub("<","[",cCc)
		cCc=re.sub(">","]",cCc)
		cCc=re.sub('"',"",cCc)

	em_date = em.get('Date')
	if em_date is None or len(em_date) == 0  :
		cdate = '[Date inconnue]'
        else:
        	if re.search("=\?",em_date) is not None :
			if re.search("\?UTF-8\?",em_date.upper()) is not None :
				code="utf-8"
			else:
				code="iso-8859"
			ll = email.header.decode_header(em_date)
			cdate = ""
			keep=0
			for l in ll:
				cdate=l[0]
				break
			if code == "utf-8":
				cdate=decode_utf8(cdate)
			else :
				cdate=decode_ansi(cdate)
                else :
			cdate=re.sub("\xc2\x92","'",em_date)

	# Création du mail en pdf
	if treat == "pdf" or treat == "tout" :
		em_type = em.get('Content-Type')
        	em_boundary = re.sub('\n','',em_type)
        	em_boundary = re.sub('multipart/alternative; boundary=','',em_boundary)
	        em_boundary = re.sub('multipart/mixed; boundary=','',em_boundary)
	        em_boundary = re.sub('"','',em_boundary)

	        psujet=re.sub("_"," ",sujet)
		psujet=re.sub("   "," ",sujet)
		psujet=re.sub("  "," ",sujet)
		print "Sujet:     %s%s%s" %(gBlu, psujet, noColor)
		print "De:        %s%s%s" %(gGre, cfrom, noColor)
		print "Date:      %s%s%s" %(gYel, cdate, noColor)

		pour=""
		em_to=Join_line(em_to)
		lcto=em_to.split(', ')
		for ecto in lcto :
			debut=""
			fin=""
			address=""
			milieu=""
			ecto0=""
			scto1=""
			scto2=""
			cto1=""
			cto2=""
			mcto=""
			if em_to == '[Destinataire inconnu]':
				pour=em_to
				dispour=em_to
				break
			if em_to == 'undisclosed-recipients:;':
				pour='[Destinataire(s) masqué(s)]'
				dispour='[Destinataire(s) masqué(s)]'
				break
			if em_to == 'destinataires inconnus:;':
				pour='[Destinataire(s) masqué(s)]'
				dispour='[Destinataire(s) masqué(s)]'
				break
			if len(ecto) > 0 :
				ncto=""
				for l in ecto:
					if l[0] == "[" or len(address) > 0:
						address=address+l[0]
					else :
						ncto=ncto+l[0]
				if address == "":
					ecto=ncto+" ["+ncto+"]"
					address="["+ncto+"]"
					ncto=""
				if re.search("\?Q\?",ncto.upper()) is not None :
					ncto=Clean_codage(ncto)
				if re.search("==",ncto) is not None:
					ncto=re.sub("==","qp",ncto)
				if re.search("=\?=",ncto) is not None:
					ncto=re.sub("=\?=","ù\?=",ncto)
				scto=ncto.split("=?")
				if len(scto) > 1:
					for ecto0 in scto :
						if re.search("qp",ecto0) is not None:
							fin0=ecto0.split("?=")
							if len(fin0) > 1:
								fin=fin0[1]+"ù"
							else:
								fin=""
							ecto0=fin0[0]+"==?="
							ecto0=re.sub(" =","=",ecto0)
						if ecto0 != "":
							if re.search("\?=",ecto0) is None and re.search("=",ecto0) is None:
								debut=debut+ecto0
							else :
								if re.search("= ",ecto0) is not None:
									ecto1=ecto0.split(" ")
									fin=" "+ecto1[1]
									ecto0=re.sub(fin,"",ecto0)
								ecto0=re.sub(" ","",ecto0)
								if re.search("\=",ecto0) is not None and re.search("\?=",ecto0) is None:
									ecto0=ecto0+"?="
								ecto0="=?"+ecto0
								if re.search("\?UTF-8\?",ecto0.upper()) is not None :
									code="utf-8"
								else:
									code="iso-8859"
								if re.search("\?Q\?",ecto0.upper()) is None :
									ll = email.header.decode_header(ecto0)
									ecto1 = ""
									keep=0
									for l in ll:
										ecto1=l[0]
										break
									if code == "utf-8":
										ecto1=decode_utf8(ecto1)
									else :
										ecto1=decode_ansi(ecto1)
									ecto3=ecto1.split("\n")
	                                                                ecto1=ecto3[0] 
									ecto2=ecto1.split(" ")
									ecto=""
									rg=0
									if ecto2[0] == "":
										for ecto3 in ecto2 :
											if rg > 0 :
												ecto=ecto+ecto3
											rg=rg+1
									else :
										ecto=ecto1
								else:
									ecto=Clean_codage(ecto0)
								fin=re.sub(" ù","",fin)
								ecto=debut+ecto+fin+" "+address
				else:
					ecto=ncto+address
				ecto=re.sub("'","",ecto)
				scto=address.split("[")
				if len(scto) == 1:
					mcto=scto[0]
				else:
					tmail=scto[1]
					umail=tmail.split("]")
					mcto=umail[0]
				dpour=ecto
				ecto='<a href="mailto:'+mcto+'"><font color="blue">'+ecto+"</link>"
				if pour == "" :
					pour=ecto
					dispour=dpour
				else :
					pour=pour+", "+ecto
					dispour=dispour+", "+dpour
		pour=re.sub("  "," ",pour)
		dispour=re.sub("  "," ",dispour)	
		print "Pour:      %s%s%s" %(gCya, dispour, noColor)

		if cCc == "None" :
                  cCc="None"
                else :
			copie=""
			lcCc=cCc.split(', ')
			for ecCc in lcCc :
				debut=""
				fin=""
				address=""
				milieu=""
				ecCc0=""
				scCc1=""
				scCc2=""
				cCc1=""
				cCc2=""
				mcCc=""
				if len(ecCc) > 0 :
					ecCc=Join_line(ecCc)
					ncCc=""
					for l in ecCc:
						if l[0] == "[" or len(address) > 0:
							address=address+l[0]
						else :
							ncCc=ncCc+l[0]
					if address == "":
						ecCc=ncCc+" ["+ncCc+"]"
						address="["+ncCc+"]"
						ncCc=""
					if re.search("\?Q\?",ncCc.upper()) is not None :
						ncCc=Clean_codage(ncCc)
					if re.search("==",ncCc) is not None:
						ncCc=re.sub("==","qp",ncCc)
					elif re.search("=\?=",ncCc) is not None:
						ncCc=re.sub("=\?=","q|p",ncCc)
					scCc=ncCc.split("=?")
					if len(scCc) > 1:
						for ecCc0 in scCc :
							if re.search("qp",ecCc0) is not None:
								fin0=ecCc0.split("?=")
								if len(fin0) > 1:
									fin=fin0[1]+"ù"
								else:
									fin=""
								ecCc0=fin0[0]+"?="
							elif re.search("q|p",ecCc0) is not None:
								fin0=ecCc0.split("?=")
								if len(fin0) > 1:
									fin=fin0[1]+"ù"
								else:
									fin=""
								ecCc0=fin0[0]+"=?="
								ecCc0=re.sub(" =","=",ecCc0)
							if ecCc0 != "":
								if re.search("\?=",ecCc0) is None and re.search("=",ecCc0) is None:
									debut=debut+ecCc0
								else :
									ecCc0=re.sub(" ","",ecCc0)
									if re.search("\=",ecCc0) is not None and re.search("\?=",ecCc0) is None:
										ecCc0=ecCc0+"?="
									ecCc0="=?"+ecCc0
									ecCc0=re.sub("qp","==",ecCc0)
									ecCc0=re.sub("q\|p","=\?=",ecCc0)
									if re.search("\?UTF-8\?",ecCc0.upper()) is not None :
										code="utf-8"
									else:
										code="iso-8859"
									ll = email.header.decode_header(ecCc0)
									ecCc1 = ""
									keep=0
									for l in ll:
										ecCc1=l[0]
										break
									if code == "utf-8":
										ecCc1=decode_utf8(ecCc1)
									else :
										ecCc1=decode_ansi(ecCc1)
										ecCc3=ecCc1.split("\n")
										ecCc1=ecCc3[0] 
									ecCc2=ecCc1.split(" ")
									ecCc=""
									rg=0
									
									if ecCc2[0] == "":
										for ecCc3 in ecCc2 :
											if rg > 0 :
												ecCc=ecCc+ecCc3
											rg=rg+1
									else :
										ecCc=ecCc1
									fin=re.sub(" ù","",fin)
									ecCc=debut+ecCc+fin+" "+address
									ecCc=re.sub("\xa9","",ecCc)
									ecCc=re.sub("\r","",ecCc)
									ecCc=re.sub("\n","",ecCc)
									ecCc=re.sub('\"',"",ecCc)
									ecCc=re.sub("'","",ecCc)
									ecCc=re.sub('"',"",ecCc)
					else:
						ecCc=ncCc+address
					ecCc=re.sub("'","",ecCc)
					scCc=address.split("[")
					if len(scCc) == 1:
						mcCc=scCc[0]
					else:
						tmail=scCc[1]
						umail=tmail.split("]")
						mcCc=umail[0]
					dcopie=ecCc
					ecCc='<a href="mailto:'+mcCc+'"><font color="blue">'+ecCc+"</link>"
					if copie == "" :
						copie=ecCc
						dispcopy=dcopie
					else :
						copie=copie+", "+ecCc
						dispcopy=dispcopy+", "+dcopie
			copie=re.sub("  "," ",copie)
			dispcopy=re.sub("  "," ",dispcopy)	
			print "Copie:     %s%s%s" %(gBlu, dispcopy, noColor)

		psujet="Sujet:     "+sujet
	        pfrom= "De:      "+dfrom
	        pdate= "Date:    "+cdate
	        pto=   "Pour:    "+pour
		if cCc != "None":
			pCc=   "Copie:   "+copie
	        body = []
		body.append(Paragraph(psujet, bold))
		body.append(Paragraph(pfrom, bold))
		body.append(Paragraph(pdate, bold))
	 	body.append(Paragraph(pto, bold))
		if cCc != "None" :
		 	body.append(Paragraph(pCc, bold))
		body.append(Spacer(0, cm * .08))
	 	body.append(Paragraph("----------------------------------------------------------------------------------------------------------------------------------------", bold))
		body.append(Spacer(0, cm * .4))
	     	em_body = mes.as_string()
                if sujet == "[sans sujet]" :
                   sujet="sans_sujet"
		sujet="Mail_"+sujet+".pdf"
		if len(sujet) > 155 :
			ll=""
			for lettre in sujet :
				ll=ll+lettre[0]
				if len(ll) > 154:
					sujet=ll
					break
		orig_sujet = sujet
                n=1
		while os.path.exists(sujet):
			sujet = orig_sujet + "." + str(n)
			n = n+1
		try:
			orig_sujet=sujet
		except IOError:
			print "%sAbandon, %Erreur d'entrée-sortie!!!%s" %(gRed, gCya, noColor)
			sys.exit(2)
		docpdf = SimpleDocTemplate(sujet, pagesize = A4)
		paragraphs = em_body.split("\n")
	        ensuite=0
	        precedent=1
	        if re.search("text/plain",em_type) is not None:
			if re.search("UTF-8",em_type) is None:
				code="iso8859"
			else :
				code="utf8"
			for para in paragraphs:
		                if re.search("Content-Disposition",para) is not None and ensuite == 0 :
					ensuite=3
			                continue
				elif re.search("Content-Transfer-Encoding",para) is not None and ensuite == 0 :
					ensuite=2
					cte=8
			                continue
	                	elif re.search("--",para) is not None and re.search("-- Message",para) is None and ensuite > 1 :
		        	        ensuite=1
	                        	break
	        	        elif ensuite == 2 and re.search("Pour : ",para) is not None :
		        	        ensuite=3
        	        	elif ensuite == 3 :
#					if code == "utf8" :
#						para=decode_utf8(para)
#					else :
#						para=decode_ansi(para)
					para=noAccent(para)
					longueur = len(para)
					if longueur > 0 or precedent > 0 :
	                                        if para == "**" :
							continue
						para = re.sub("\*\*\*"," ",para)
						body.append(Paragraph(para, normal))
						if re.search("Pièce jointe",para) is None :
							body.append(Spacer(0, cm * .08))
		elif re.search("multipart/",em_type) is not None :
			if re.search(".",em_boundary) is not None :
				att=em_boundary.partition('.')
				att_boundary=att[0]+"."+att[1]
			else :
				att=em_boundary
			for para in paragraphs:
		                if re.search("Content-Type\: text\/plain",para) is not None and ensuite == 0 :
					ensuite=1
			                continue
				elif re.search(em_boundary,para) is not None and ensuite == 2 :
		        	        ensuite=1
		                        break
				elif re.search(att_boundary,para) is not None and ensuite == 2 :
		        	        ensuite=1
		                        break
				elif re.search("cid:",para) is not None and ensuite == 2 :
		        	        ensuite=1
		                        break
				elif re.search("> ",para) is not None and ensuite == 2 :
		        	        ensuite=1
		                        break
		                elif re.search("Content-",para) is None and re.search(em_boundary,para) is not None and ensuite == 1:
		        	        ensuite=1
		                        continue
				elif re.search("Content-",para) is None and ensuite == 1 :
					ensuite=2
			                continue
	        	        elif ensuite == 2 :
					para=Clean_codage(para)
					longueur = len(para)
					if longueur > 0 or precedent > 0 :
	                                        if para == "**" :
							continue
						para = re.sub("\*\*\*"," ",para)
						if re.search("href",para) is not None:
							print para
						body.append(Paragraph(para, normal))
						body.append(Spacer(0, cm * .08))
		else :
			html_body=""
			for para in paragraphs:
				if re.search("<html>",para) is not None and ensuite == 0 :
					ensuite=1
					html_body=para+"\n"
	        	        elif ensuite == 1 :
					html_body=html_body+para+"\n"
			monfichier=open("html.html","w")
			monfichier.write(html_body)
			monfichier.close()
			text_body=subprocess.check_output(["html2text", "html.html"])
			paragraphs = text_body.split("\n")
			for para in paragraphs:
				body.append(Paragraph(para, normal))
				body.append(Spacer(0, cm * .08))

	filename = mes.get_filename()
	# Puede tener filename siendo multipart???
	if em.is_multipart():
		for payl in em.get_payload():
			extract_attachment(payl)
	else:
		extract_attachment(em)

	if treat == "pdf" or treat == "tout" :
		docpdf.build(body)

## Récapitulatif
print "\n--------------"
print "%sTotal des mails traités:%s %d%s" %(gYel, gCya, len(mb), noColor)
if treat == "att" or treat == "tout" :
	print "%sTotal des pièces jointes extraites:%s %d%s" %(gYel, gCya, attachments, noColor)
	print "%sTotal des pièces jointes non traitées:%s %d%s" %(gYel, gCya, skipped, noColor)
print

Execution du script:

mbox2pdf-extract-attachments 1.2
Extraire les pièces jointes des fichiers mbox
Copyright (C) 2012 Pablo Castellano <mail>
Version étendue: 1.32
Pièces jointes + export des mails en pdf
Copyright (C) 2016 Alain Aupeix <mail>

Usage: mbox2pdf-extract-attachments <fichier> <destination> [pdf|att|tout]

En cas de présence du fichier de configuration, seul le choix du type de traitement est paramétrable sur la ligne de commande.

Exemple d'un traitement:

mbox2pdf-extract-attachments 1.2
Extraire les pièces jointes des fichiers mbox
Copyright (C) 2012 Pablo Castellano
Version étendue: 1.32
Pièces jointes + export des mails en pdf
Copyright (C) 2016 Alain Aupeix

Fichier mbox: /home/alain/.thunderbird/crtothve.default/Mail/Local Folders/Archives.sbd/mboxfile
Dossier: /home/alain/Bureau/Archives_MAIL/split/mboxfile
--------------
Analyse du message numéro :0
Sujet:     CLIMAT
De:        Liliane Expéditeur [lili.expediteur@provider.fr]
Date:     Tue, 17 Dec 2013 08:18:38 +0100
Pour:     Le Destinataire [le.destinataire@provider.fr]
Cc   :     [autre@provider.fr]

Mail_CLIMAT.pdf
--------------
Analyse du message numéro :1
Sujet:     [sans sujet]
De:        Liliane Expéditeur [lili.expediteur@provider.fr]
Date:      Wed, 27 Nov 2013 14:18:12 +0100
Pour:      Le Destinataire [le.destinataire@provider.fr]
--------------
...
Analyse du message numéro :9
Sujet:     [sans sujet]
De:        Liliane Expéditeur [lili.expediteur@provider.fr]
Date:      Sat, 25 Jan 2014 15:42:12 +0100
Pour:      autre.testinataire@autre.fr

Pièce(s) jointe(s) trouvée(s)!
Extraction de Les chaufferies bois du Lot.doc (30720 octets)
--------------
Analyse du message numéro :10
Sujet:     notes
De:        Liliane Expéditeur [lili.expediteur@provider.fr]
Date:      Mon, 10 Feb 2014 13:02:59 +0100
Pour:      Le Destinataire [le.destinataire@provider.fr]
...
--------------
Total des mails traités: 20
Total des pièces jointes extraites: 16
Total des pièces jointes non traitées: 0

Remarques sur le traitement:

- si le chemin destination comporte un nom avec un caractère accentué, quelqu'en soit l'endroit, ou que treat = pdf, le pdf ne comportera aucun lien vers des pièces jointes. Seul treat = all permet de générer des liens
- si la destination précisée n'est pas la destination finale, en clair que les dossiers soient déplacés ensuite, les liens seront bien créés, mais seront alors mauvais.
- j'ai essayé de créer des liens sans le chemin, vu que les pièces jointes sont dans le même répertoire, python n'aime pas.
- les liens créés s'ouvrent avec evince, mais pas avec Acrobat qui veut le nom sans file:// Comme on est sous Linux, je privilégie evince

Boucle de traitement:

Pour automatiser le traitement de plusieurs mbox situées dans le même dossier, j'ai fait un petit script bash qui s'appuie sur le fichier ~/.mbox2pdf et qui génère en plus un fichier log par mbox. Ces logs sont situés dans le dossier paramétré dans .mbox2pdf

#!/bin/bash

if [ -f ~/.mbox2pdf ]; then
   mkdir ~/Bureau/logs 2>/dev/null
   mbox_path=$(cat ~/.mbox2pdf|grep mbox_path)
   mbox_path=$(echo $mbox_path |sed 's|mbox_path=||g')
   log_path=$(cat ~/.mbox2pdf|grep "log_path="|sed 's|log_path=||g')
   mboxes=$(ls -a1 "${mbox_path}"|grep -v ".msf"|grep ".")
   rm ${log_path}/* 2>/dev/null
   for mbox in $mboxes ; do
	if [ "$mbox" != "." ] && [ "$mbox" != ".." ];then
	       mbox2pdf-extract-attachments $mbox $mbox |tee ${log_path}${mbox}.log
	       cat ${log_path}${mbox}.log   |sed 's|\[1;31m||g' > ${log_path}_1_${mbox}.log
	       cat ${log_path}_1_${mbox}.log|sed 's|\[1;32m||g' > ${log_path}_2_${mbox}.log
	       cat ${log_path}_2_${mbox}.log|sed 's|\[1;33m||g' > ${log_path}_3_${mbox}.log
	       cat ${log_path}_3_${mbox}.log|sed 's|\[1;34m||g' > ${log_path}_4_${mbox}.log
	       cat ${log_path}_4_${mbox}.log|sed 's|\[1;35m||g' > ${log_path}_5_${mbox}.log
	       cat ${log_path}_5_${mbox}.log|sed 's|\[1;36m||g' > ${log_path}_6_${mbox}.log
	       cat ${log_path}_6_${mbox}.log|sed 's|\[1;37m||g' > ${log_path}_7_${mbox}.log
	       cat ${log_path}_7_${mbox}.log|sed 's|\[0m||g' > ${log_path}_8_${mbox}.log
	       cat ${log_path}_8_${mbox}.log|sed 's|\x1b||g' > ${log_path}${mbox}.log
	       rm ${log_path}_*${mbox}.log
	fi
   done
else
    echo "Fichier config inexistant !!!"
fi

Script de statistiques:

Afin de traquer les possibles plantages et de pouvoir calculer les stats de mbox2pdf, j'ai fait un petit script nommé mbox2pdf_stat

#!/bin/bash

if [ -f ~/.mbox2pdf ]; then
   target_path=$(cat ~/.mbox2pdf|grep "target_path="|sed 's|target_path=||g')
   log_path=$(cat ~/.mbox2pdf|grep "log_path="|sed 's|log_path=||g')
   cd "$log_path"
   rm mbox2pdf_stat.csv 2>/dev/null
   for i in $(ls -1 *.log)
   do
      j=$(cat $i|grep 'Total des mails traités')
      k=$(cat $i|grep 'Total des pièces jointes extraites')
      echo "$i : $j : $k" >> mbox2pdf_stat.csv
      l=$(cat $i|grep "utf-8")
      m=$(cat $i|grep "UTF-8")
      n=$(cat $i|grep "iso-")
      o=$(cat $i|grep "ISO-")
      echo "===============" >> mbox2pdf_to_decode.txt
      echo "$i"              >> mbox2pdf_to_decode.txt
      echo "===============" >> mbox2pdf_to_decode.txt
      echo "$l"              >> mbox2pdf_to_decode.txt
      echo "$m"              >> mbox2pdf_to_decode.txt
      echo "$n"              >> mbox2pdf_to_decode.txt
      echo "$o"              >> mbox2pdf_to_decode.txt
   done
   echo "============================" >> mbox2pdf_to_decode.txt
   rep=${target_path}*/Mail_*
   echo "Noms de fichiers incorrects:" >> mbox2pdf_to_decode.txt
   echo "============================" >> mbox2pdf_to_decode.txt
   ls -1 "${target_path}"* | grep '?=' >> mbox2pdf_to_decode.txt
   echo "============================" >> mbox2pdf_to_decode.txt

   xdg-open mbox2pdf_stat.csv
   xdg-open mbox2pdf_to_decode.txt
else
    echo "Fichier config inexistant !!!"
fi

Voila ...

Si vous avez des remarques à faire ou modifications à proposer, ne vous génez pas ...

A+

Dernière modification par JujuLand (Le 13/11/2016, à 17:09)

JujuLand · Le 17/10/2016, à 11:31

Comme je n'ai pas de solution pour les liens avec accents dans les pdf, j'ai ajouté un param dans le fichier de config qui transforme le nom du dossier target en enlevant les accents. Ceci permet notamment d'avoir des liens qui fonctionnent.

De plus, pour éviter d'avoir à créer le dossier target si celui-ci n'existe pas, un param dans le fichier a été ajouter pour faire la création automatique.

Reste la vérification lors d'une utilisation ultérieure, à ne pas créer les pièces jointes et le pdf si celui-ci existe déja et que les dates du mail étaient les mêmes.

A suivre ...

Le source dans le premier post, ainsi que les commentaires ont été mis à jour ...

A+

Dernière modification par JujuLand (Le 17/10/2016, à 12:57)

JujuLand · Le 29/10/2016, à 23:09

Pour les premiers tests sur un fichier mbox, j'avais réussi à faire fonctionne çà correctement.
Un essai sur un ensemble de mbox (dans une boucle), a été plus catastrophique ...

J'ai repri les boucles de to et Cc, et je pense que ce coup-ci, c'est ok.

Mais j'ai encore un petit problème. Je veus transformer une chaîne, et je m'y casse un peu les dents:

 Transformer cà (les quotes simples ou doubles et le \ sont évidemment dans la chaîne)
"'\"GUICHARD Olivier (Chef de l'Unité) - DDT 76/SDPDD/PT\"'"
en çà:
GUICHARD Olivier (Chef de l'Unité) - DDT 76/SDPDD/PT

Si quelqu'un a une solution. Le problème est évidemment sur le \ ...

Merci

Version mise à jour (1.35) sur le premier post.
Comme le copier/coller dans le post semble avoir mis en vrac l'indentation, ce qui est catastrophique pour du source python, je mets le lien vers l'archive en début du premier post.

A+

Dernière modification par JujuLand (Le 30/10/2016, à 09:33)

JujuLand · Le 01/11/2016, à 21:10

Après quelques test, je me suis apreçu de quelques anomalies condernant le décodage, aussi j'ai repris ça, et j'ai éliminé quelques problèmes.

Sur les essais qui ont portés sur 233 fichiers mbox, contenant 2776 mails et 5181 pièces jointes, je n'ai plus d'erreur.
Aussi, on peut considérer que ce script devient utilisable ...

Ont été ajoutés un script bash pour automatiser le traitement de mbox situés dans le même dossier, et un outil de statistique, ainsi qu'un README.

Tout çà dans une archive disponible à partir du premier post

Voilà ...
A+

JujuLand · Le 13/11/2016, à 15:14

Version 1.50 du 13/11/2016
- Quelques petites correction concernant le traitement de l'encodage.
- Ajout de liens pour from, to et Cc
- Modification des noms
- mbox2pdf-extract-attachments => mbox2pdf
- mbox2pdf => mbox2pdf-multi
- mbox2pdf_stat => mbox2pdf-stat
- Amélioration de mbox2pdf-stat

Reste à faire :
- Simplification du code en passant le décodage dans une fonction
- Amélioration du traitement du corps du mail (cas de mails imbriqués)

Tout çà dans une archive disponible à partir du premier post

A+

Ubuntu-fr

Navigation

Liens de recherche

Annonce

#1 Le 16/10/2016, à 15:20

Outil de sauvegarde de mail en pdf + pièces jointes

#2 Le 17/10/2016, à 11:31

Re : Outil de sauvegarde de mail en pdf + pièces jointes

#3 Le 29/10/2016, à 23:09

Re : Outil de sauvegarde de mail en pdf + pièces jointes

#4 Le 01/11/2016, à 21:10

Re : Outil de sauvegarde de mail en pdf + pièces jointes

#5 Le 13/11/2016, à 15:14

Re : Outil de sauvegarde de mail en pdf + pièces jointes

Pied de page des forums