NAME *MIME::Mini* - Minimal code to parse/create mbox files and mail messages SYNOPSIS use MIME::Mini ':all'; or: use MIME::Mini qw( formail mail2str mail2multipart mail2singlepart mail2mbox insert_header append_header replace_header delete_header insert_part append_part replace_part delete_part header headers header_names param mimetype encoding filename body message parts newparam newmail ); # Parse mbox file, doing something with each mail message formail(sub { <> }, sub { my $mail = shift; ...; }) # Create an email with text/plain, image/png, and message/rfc822 attachments my $mail = newmail ( To => 'you@there.com', From => 'me@here.com', Subject => 'test', parts => [ newmail(body => "hi\n"), newmail(body => $png, type => 'image/png', filename => 'hi.png'), newmail(message => newmail(qw(To to@you From from@me body hi"))) ] ); print mail2str($mail); DESCRIPTION *MIME::Mini* is a collection of functions that parse and produce mailbox files and individual mail messages. It started out as *minimail*, a non-module cut-and-paste version, intended to be compact enough to cut and paste directly into perl scripts that don't want to require non-standard perl modules. *MIME::Mini* is for people that prefer a CPAN module. It is intended to be yet another alternative to *MIME-tools*. *MIME-tools* does things that this code doesn't (such as uuencode and binhex decoding). And *MIME::Mini* does things that *MIME-tools* doesn't such as reading and writing mailbox files correctly (repairing incorrectly formatted ones along the way), and transparently unravelling "winmail.dat" attachments (aka *MS-TNEF*). *MIME::Mini* is much smaller (about 3% of the size of *MIME-tools* and the other modules it requires, and about 20% of the size of *MIME-Lite* (which doesn't parse)), and so takes much less time during program start up. FUNCTIONS formail(sub { <> }, sub { $mail = shift }) Parses a mailbox or a mail message. Calls the first function argument to retrieve input lines and calls the second function argument with every mail message found. Terminates when the first argument returns undef or when the second function returns false. Quoted "From_" lines are unquoted. mail2str($mail) Returns a string version of a mail message. If the mail message includes a mailbox header, lines in the body starting with "From_" are quoted and the string result will definitely be terminated with a blank line. This means that mailbox files with blank lines missing between mail messages and with unquoted "From_" lines will be automatically repaired with the code below (Incidentally, malformed nested multipart body parts are also repaired). formail(sub { <> }, sub { print mail2str(shift) }); mail2multipart($mail) Converts a singlepart mail message into a multipart mail message with a single body part (i.e. the body of the original mail message). Returns the mail message. Does nothing to mail messages that are already multipart mail messages. mail2singlepart($mail) Converts a multipart mail message with a single body part into a singlepart mail message whose body is the original body part. Returns the mail message. Does nothing to mail messages that are already singlepart mail messages or multipart mail messages with multiple parts. Acts recursively. mail2mbox($mail) Converts a mail message into an mailbox item. Does nothing to mail messages that are already mailbox items. This affects the result of *mail2str()*. insert_header($mail, $header[, $language[, $charset]]) Inserts a new mail header before any existing mail headers. If the header contains non-ascii characters, it will be encoded in accordance with RFC2047. If the *$language* and *$charset* parameters are not supplied, they default to "en" and "iso-8859-1" (if possible, "utf-8" otherwise), respectively. append_header($mail, $header[, $language[, $charset]]) Appends a new mail header after any existing mail headers. replace_header($mail, $header[, $language[, $charset]]) Replaces all instances of a mail header with a new mail header. delete_header($mail, $header, $recurse) Deletes all headers that match the *$header* pattern. If the *$recurse* parameter is provided and non-zero, matching headers in internal body parts will also be deleted. insert_part($mail, $part, $index) Inserts the given body part at the given index. The *$part* parameter must have been produced by *formail()* or *newmail()*. The *$mail* parameter must already be a multipart mail message. append_part($mail, $part) Appends the given body part. replace_part($mail, $part, $index) Replaces the body part at the given index with the given body part. delete_part($mail, $index) Deletes the body part at the given index. header($mail, $header) Returns a list of values of headers with the given name. RFC2822 comments are removed. If any of the values contain RFC2047 encoded words (i.e. "=?charset?[qb]?...?="), they are decoded, and the bytes in the given charset (e.g., "us-ascii", "iso-8859-*", "utf-8") are then decoded into "characters" (i.e., unicode codepoints). They are also unfolded. If this is not what you want, use $mail->{header} or $mail->{headers} directly. headers($mail) Returns a list of all complete headers with decoding and unfolding performed as with *header()*. header_names($mail) Returns a list of the names of headers present in the given mail message. param($mail, $header, $param) Returns the value of the given parameter of the given MIME header of the given mail message. *header()* is used for RFC2047 decoding. If the parameter has been split or encoded in accordance with RFC2231 (i.e. "param1*0="a" param1*1="b" param2*="charset'lang'%63""), it is decoded (if "us-ascii" or "iso-8859-*" or "utf-8") and reassembled. mimetype($mail, $parent) Returns the declared or default mimetype of the given mail message or body part. Returns "octet/application" when the encoding is invalid. encoding($mail) Returns the declared or implied encoding of the given mail message or body part. filename($part) Returns the RFC2183 filename of the given body part. Uses *param()* to perform any decoding that might be necessary. Also removes any directory component of the filename and replaces any unfriendly characters with dash characters. body($mail) Returns the decoded body of the given mail message or body part. Must not be called on a multipart mail message or a mail message whose mimetype is "message/rfc822". message($mail) Returns the message inside the given mail message whose mimetype is "message/rfc822". Must not be called on a multipart message or a mail message whose mimetype is not "message/rfc822". parts($mail[, $parts]) When no *$parts* parameter is given, returns a reference to an array of body parts in the given multipart message. When the *$parts* parameter is given, it is a reference to an array of body parts, and it will replace the existing body parts. Must not be called on a singlepart mail message. newparam($name, $value[, $language[, $charset]]]) Creates a MIME header parameter, possibly split and encoded in accordance with RFC2231. Returns a string that looks like "; name=value" which can be used as part of the *$header* argument in functions like *append_header()* and as part of any header value in the function *newmail()*. If the value contains non-ascii characters, and the *$language* and *$charset* parameters are not supplied, they default to "en" and "utf-8" or "iso-8859-1", respectively. newmail(...) Creates a new mail message based on the given arguments (which take the form of a hash). It is not necessary to supply all information. Anything that needs to be added will be added automatically. The important parameters are: [A-Z]* - Arbitrary mail headers: e.g. From To Subject type - Content-Type: e.g. image/png charset - Content-Type's charset parameter: e.g. iso-8859-1 encoding - Content-Transfer-Encoding: e.g. base64 filename - Content-Disposition's filename parameter body - body of the message (don't use with parts or message) parts - array-ref of parts (don't use with body or message) message - body of message/rfc822 message (don't use with body or parts) mbox - Mbox From_ header Supplying *body* implies "text/plain". Supplying *parts* implies "multipart/mixed". Supplying *message* implies "message/rfc822". Default *disposition* is "inline" for "text/*" and "message/rfc822", or "attachment" for all other types. The default *charset* is "us-ascii" when *body* contains only ASCII bytes. Otherwise, it is "utf-8" when *body* is a valid UTF-8 byte sequence. Otherwise, it is your local (non-utf8) charset, or "iso-8859-1". Default *encoding* is determined from the type and nature of the mail message and its data. You shouldn't have to supply *encoding* unless you want to create messages with "8bit" encoding. If the mail message really is a mail message, and not just a body part, "Date", "MIME-Version" and "Message-ID" headers are automatically included if they have not been supplied by the caller. Less important parameters are: disposition - Content-Disposition: i.e. inline or attachment created - Content-Disposition's creation-date parameter modified - Content-Disposition's modification-date parameter read - Content-Disposition's read-date parameter size - Content-Disposition's size parameter description - Content-Description language - Content-Language duration - Content-Duration location - Content-Location base - Content-Base features - Content-Features alternative - Content-Alternative id - Content-ID md5 - Content-MD5 Note: If you supply "filename" but not "body" (or "message" or "parts"), and the filename refers to a readable file, then the following parameters will be determined automatically: "body", "modified", "read", "size". The rest of the less important parameters are just shortcuts for standard MIME headers. There is no support beyond that for any of them. STRUCTURE A mail message (or body part) is a hash containing some of the following entries: mbox - mailbox From_ header warn - parser errors in the form: X-Warning: ... headers - arrayref of mail headers in order of appearance header - hashref by name of arrayrefs of mail headers body - text of singlepart mail message mime_type - mimetype of the mail message or body part mime_parts - arrayref of mail messages (body parts) mime_message - message of a message/rfc822 mail message mime_boundary - boundary for a multipart mail message mime_preamble - any text before the first multipart boundary mime_epilogue - any text after the last multipart boundary mime_prev_boundary - saved boundary of message after mail2singlepart mime_prev_preamble - saved preamble of message after mail2singlepart mime_prev_epilogue - saved epilogue of message after mail2singlepart Note that *body*, *mime_parts* and *mime_message* are mutually exclusive and that *mime_type* only exists when *mime_parts* or *mime_message* exist. EXAMPLES Parsing example: Repair mailbox files formail(sub { <> }, sub { print mail2str(shift) }); Building example: A mail message with attachments print mail2str(newmail( To => 'you@there.com', From => 'me@here.com', Subject => 'test', parts => [ newmail(body => "hi\n"), newmail(body => $png, type => 'image/png', filename => 'hi.png'), newmail(message => newmail(qw(To to@you From from@me body hi"))) ])); CAVEAT The *header()* and *headers()* functions automatically decode RFC2047 encoded headers. This is an attempt to satisfy the following requirement in RFC2047: The program must be able to display the unencoded text if the character set is "US-ASCII". For the ISO-8859-* character sets, the mail reading program must at least be able to display the characters which are also in the ASCII set. Rather than discarding "iso-8859-*" characters that are not also "us-ascii", *header()* and *headers()* decode them to "characters" (unicode codeponts) in perl's internal string format. This is arguably more useful, but knowledge of the original character set is lost. Hopefully, that isn't important. But actually "displaying" these characters will require the client application to encode the headers appropriately for the local system. The original, encoded headers can be accessed directly via "$mail->{headers}" which is a reference to an array of raw encoded headers. SEE ALSO RFC2822, RFC2045, RFC2046, RFC2047, RFC2231, RFC2183 (also RFC3282, RFC3066, RFC2424, RFC2557, RFC2110, RFC3297, RFC2912, RFC2533, RFC1864, RFC2387, RFC2912, RFC2533, RFC2387, RFC2076). The mailbox format used is the mboxrd format described in "http://www.qmail.org/man/man5/mbox.html". AUTHOR 20230510 raf <raf@raf.org> COPYRIGHT AND LICENSE Copyright (C) 2005-2007, 2023 raf <raf@raf.org> This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.