Skip to main content

Module commit_encoding

Module commit_encoding 

Source
Expand description

Git commit encoding labels (encoding header, i18n.commitEncoding) mapped to codecs.

Git’s ISO-8859-1 is strict Latin-1; encoding_rs maps that label to Windows-1252, so we handle Latin-1 separately.

Functions§

commit_message_unicode_for_display
Unicode commit message body for display (for example, format-patch).
decode_bytes
Decode bytes using Git’s encoding name, or lossy UTF-8 if unknown.
decode_rfc2047_mailbox_from_line
Decode =?charset?q?...?= encoded-words in an email display name (before <).
encode_header_text
Encode a single header field (author/committer line) without adding a trailing newline.
encode_unicode
Encode unicode for storage in a commit message body using Git’s encoding name.
ensure_body_trailing_newline
Git stores the commit message body with a trailing newline when non-empty.
finalize_stored_commit_message
Prepare a commit message for storage per i18n.commitEncoding (or equivalent).
find_invalid_utf8
Find the offset of the first byte that is not part of a strictly valid UTF-8 sequence, mirroring Git’s find_invalid_utf8 (commit.c).
identity_raw_for_serialized_commit
Raw author / committer header payloads for a new commit object.
is_known_encoding
Whether label names an encoding Git can decode (ISO-8859-1 or any encoding resolvable via resolve). Unknown names (e.g. the test’s non-utf-8) return false, matching Git’s logmsg_reencode no-op fallback.
is_strict_utf8
Whether buf is strictly valid UTF-8 per Git’s rules (see find_invalid_utf8).
reencode_utf8_to_label
Re-encode unicode from UTF-8 into output_label, or None if unsupported.
resolve
Resolve an encoding label the way Git uses it in config and commit objects.