Large XMP metadata corrupted in JPEG-to-JPEG conversion

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
chrisdolan
Posts: 2
Joined: 2016-10-28T07:29:20-07:00
Authentication code: 1151

Large XMP metadata corrupted in JPEG-to-JPEG conversion

Post by chrisdolan »

I've discovered that ImageMagick mildly corrupts JPEG files that
contain large quantities of XMP metadata. This problem only occurs for
JPEG files with more than 65502 bytes of XMP -- that is, more than can
fit in one JPEG segment. The symptom is that when you transform jpg ->
jpg (say, resizing the image for example) then the resulting XMP
metadata is still in the output file but is unreadable. The reasons
for this bug are clear but the solution is less obvious.

Background:

XMP is a XML+RDF metadata syntax. To embed XMP inside a JPEG file you
add an "APP1" segment where the first bytes of the segment must be
"http://ns.adobe.com/xap/1.0/\0". That leaves 65502 bytes remaining in
the 2**16 byte segment for the actual metadata. Some Photohop/PDF/PS
documents have larger metadata than this, often due to a large
"<photoshop:DocumentAncestors>" list. Adobe works around the 64K
limition by adding an XML attribute like the following to the main XMP
APP1 segment:

xmpNote:HasExtendedXMP="938840E3982212480AF51FFC62367B37"

Then they add one or more additional JPEG segments that start with
"http://ns.adobe.com/xmp/extension/\0" and the key
("938840E3982212480AF51FFC62367B37" in this example). Those segments
concatenated together form the full XMP metadata XML. This technique
is documented in section 1.1.3 of
http://wwwimages.adobe.com/content/dam/ ... nPart3.pdf


Root cause:

ImageMagick reads JPEG XMP metadata via ReadProfile() in jpeg.c. In
that method, all APP1 segments that start with "http:" are
concatenated together regardless of the actual prefix and remembered
as a single "XMP" profile. Then to write that XMP back out to JPEG,
the WriteProfile() method prefixes a single
"http://ns.adobe.com/xap/1.0/\0" to the front of that entire XMP
profile, splits it into blocks of length 65533 and writes each out as
an APP1 segment. This works great for XMP that's small enough to fit
in one segment, but the result is that the second segment and beyond
all lack any "http:..." prefix and thus are not recognizable as XMP
segments by ImageMagick or other tools. The JPEG image data is
unaffected as is any EXIF metdata but the XMP metadata is thereby
lost.


Solution ideas:

I don't know how to fix this but I have a few partial ideas. One
possibility is to remember the "http..." prefix for each block of
metadata, so instead of XMP being a single long string it becomes a
map of prefix-to-string where the key might be
"http://ns.adobe.com/xap/1.0/" or
"http://ns.adobe.com/xmp/extension/". Then the WriteProfile() method
could be fixed to add the correct prefix to each segment instead of
just the first segment. Alternatively, the prefixes could be saved
right in the XMP profile string so they don't need to be
re-concatenated on output. But that wouldn't work right with non-JPEG
output. Yet another approach might be a detailed transformation from
"extension" XMP to/from "xap" XMP when going from/to JPEG. The latter
is probably the best approach but would also be the hardest to
implement.
chrisdolan
Posts: 2
Joined: 2016-10-28T07:29:20-07:00
Authentication code: 1151

Re: Large XMP metadata corrupted in JPEG-to-JPEG conversion

Post by chrisdolan »

An example JPEG that demonstrates this issue is
https://github.com/spite/android-lens-b ... /table.jpg
which I found from this similar discussion: http://dev.exiv2.org/boards/3/topics/1631

If you run `convert table.jpg -resize 100x100 table.small.jpg` on that file, then the output still has a big "GImage:Data" attribute in the XML but ExifTool can no longer see it. Running another convert on that output then truncates the XMP down to 64K, which demonstrates that ImageMagick also cannot see the extended XMP that it wrote.
Post Reply