[png-mng-misc] ANG draft 7 (with multi-layer frames) [74][png-mng-misc] ANG draft 7 (with multi-layer frames) From: Adam M. Costello - 2007-05-02 06:36:52 Since both APNG and anIM allow frames to be constructed by compositing multiple images, and since none of the implementors here seem the slightest bit put off by this, but instead seem enthusiastic about the potential compression gains, I've added this capability to ANG. I've expressed it in a slightly different way, however, distinguishing between layers and frames, for reasons given in the Rationale section. ANG-7 is more like a midpoint between APNG and anIM. I've tried to keep the most important advantages of both: automatic fallback to a single frame in browsers, and a clear distinction between still images and animations as indicated by filename extensions and media types. I am still hopeful that the PNG folks and the Mozilla folks can meet each other halfway, rather than go off in divergent directions. AMC Animated Network Graphics (ANG), draft 7 (2007-May-01-Tue) Adam M. Costello [75]http://www.nicemice.net/amc/ Changes from draft 6 Introduced the concepts of layer and substrate, to allow frames to be constructed by compositing multiple images, like in APNG and anIM, with the goal of improving compression. This required the addition of a section explaining alpha-over-alpha compositing, expansion of the remarks about frame numbering, and the addition of remarks about the lossyness of frames contrasted with the losslessness of layers. Moved the discussion of the rejected media type out of the Rationale section and into an editorial comment that would not be included in a final draft. Acknowledgements Several good ideas have been taken from the PNG mailing list (currently png-mng-misc@...). Contents Goals Relationship to PNG Datastream tagging Conceptual model Datastream format Rationale Goals 1) Capabilities comparable to animated GIF, plus the added features of PNG (like 24-bit color and alpha). 2) Automatic fallback to PNG in existing web browsers, using the tag, showing an author-selected single frame instead of the animation. 3) Respect for the PNG specification and existing PNG applications and users, to the extent possible given goal 2. 4) Simplicity. 5) Compression at least as good as in animated GIF, or even better if possible in a simple format. The compression need not rival that of a complex format like MNG. Relationship to PNG ANG is not PNG, but it is deliberately very similar. PNG contains a single still image, whereas ANG contains both a still image and an animation. The ANG datastream format is identical to the PNG datastream format (including the signature) except that an ANG datastream must contain an ahDR chunk before IDAT and must contain an adAT chunk after IDAT, whereas a PNG datastream must not contain these chunks, because the PNG specification prohibits multiple images in a PNG datastream. The still image in an ANG serves two purposes: 1) A fallback for applications or display technologies (like paper) that do not support animation. 2) A source image to be used (optionally) in the animation, in addition to the montage in adAT. Unlike PNG and GIF, ANG is not a fully streamable format. ANG encoders cannot produce ANG in a streaming fashion because the frame data is contained in a single chunk. ANG decoders can or cannot consume ANG in a streaming fashion depending on how encoders choose to lay out the data. There is typically a trade-off between streamability and compression. Datastream tagging Because both PNG and ANG use the same signature, it is important that ANG be tagged correctly. Its media type is "video/x-ang". The media type "image/png" must not be used for ANG, because ANG is not PNG, and because a video is not an image. [[ If/when this media type is registered, the "x-" prefix will be removed. ]] The recommended file extension for ANG is ".ang". The extension ".png" should never be used for ANG, because it is important that users be able to easily distinguish PNG and ANG, so that they are not surprised if a PNG viewer does not show the animation in an ANG, or if a PNG editor drops the animation from an ANG. The deliberate similarity between the ANG and PNG formats facilitates incremental deployment of ANG, with automatic fallback to PNG. When ANG-unaware PNG-aware applications are fed an ANG datastream, they will misinterpret the ANG as a PNG, ignore the unrecognized ahDR and adAT chunks, and display the still image. This is potentially confusing to users, but hopefully the media type and the filename will mitigate that hazard. For example, this HTML inline image will display as a still-image PNG in most ANG-unaware web browsers, even though the URL ends in ".ang" and the HTTP server tags it as "video/x-ang": ; [[ I have verified this for Firefox 1.5 & 2.0, IE 6 & 7, Safari, Konqueror 3.1, and Opera Mini. To help people test other browsers, I have created [77]http://www.nicemice.net/amc/test/ang.html, which contains three instances of the same inline PNG image, one served as "image/png; x-anim=1", one served as "video/x-ang", and one served as "application/octet-stream". The first of those types might, in theory, be more likely than "video/x-ang" to facilitate automatic fallback, because RFCs 2045 and 2046 require that unrecognize media type parameters be ignored. However, "video/x-ang" works just fine in practice, so the param-hackery is not needed. ]] The distinct media types for ANG and PNG allow greater control over the fallback, if desired. For example, if you wanted ANG-unaware web browsers to fall back to animated GIF rather than still PNG, you could do something like this: ANG-aware decoders should use the following logic to determine whether a datastream beginning with the PNG signature is PNG or ANG: 1) If a media type is available, trust it. 2) Otherwise, if a filename is available, and it ends with ".png" or ".ang" (or any capitalization thereof), trust it. 3) Otherwise, if ahDR is present, assume ANG. 4) Otherwise, assume PNG. Since ahDR and adAT are invalid in PNG, they are errors if encountered in a PNG datastream. Decoders that recognize them should treat them like any erroneous ancillary chunks: ignore them, and notify the user if appropriate. For this particular error, the notification could perhaps suggest that the user rename or re-tag the file if possible. Of course decoders that do not recognize these chunks will just ignore them. Conceptual model An ANG datastream encodes a still image, just like PNG, and also encodes an animation, which is a sequence of images, called frames, all the same width and height as the still image, which are to be displayed consecutively in the same place, each for a nonzero duration indicated in the datastream (but interactive applications should allow the user to pause or jump to the next frame at any time). Each frame of the animation is the result of stacking zero or more constituent images, called layers, in front of a default image, called the substrate. The substrate has the same width, height, and position as the frames, and is uniformly filled with a single pixel value. The layers can be smaller than a frame, and they always lie completely within the frame boundary. Each layer is a positioned and clipped copy of either the still image or a second image called the montage. The layers and the substrate do not necessarily hide what lies behind them, because pixels can be transparent or partially transparent. The pixel value used to fill the substrate is different for different color types: For color type 1 (indexed-color), it is palette index 0. For color types 0 and 2 (non-indexed without alpha) it is the value of the tRNS chunk if present, otherwise all zeros (black). For color types 4 and 6 (non-indexed with alpha), it is all zeros (fully transparent black). All meta-data (chunk fields) that apply to the still image in IDAT also apply to the montage in adAT, with only three exceptions: The width, height, and interlace method in IHDR do not apply to the montage. The montage has its own width, height, and interlace method given in ahDR. All meta-data that apply to each pixel of the still image and the montage (like color type, bit depth, significant bits, palette, color space, physical size) also apply to each pixel of the layers and the substrate. The still image and montage are represented losslessly in an ANG datastream. Since the layers are simply positioned and clipped copies of those images, they are also represented losslessly. The frames, however, are represented as compositions of images which can be lossy. For datastreams with alpha channels (color types 4 and 6), the composition involves gamma decoding and alpha blending (and perhaps gamma re-encoding), which are subject to floating-point round-off errors and slight differences in implementation. For datastreams without alpha channels (color types 0, 1, and 4), the composition involves only simple pixel replacement, and the frames are lossless. Frame and layer numbering Frame and layer numbering is specified here for consistency among applications that allow users to refer to particular frames and layers. The frame and layer numbers are not used inside ANG datastreams. The frames of the animation are numbered starting with 1. Frame 0 refers to the still image, which unlike the animation frames does not have the substrate underlying it. An animation can request to be played more than once, but this does not affect the frame count. The layers of the animation are numbered starting with 3. Layer 0 refers to the still image. Layer 1 refers to the montage. Layer 2 refers to the substrate. A frame can inherit the layers of the previous frame and add new layers in front, but in this case only the new layers get new layer numbers; the inherited layers keep their original layer numbers. Layers within a frame can also be numbered relative to the frame, starting with 0 for the back-most layer. For example, if frame 2 is composed of layers 5 and 6, and frame 3 inherits the layers from frame 2, then frame-2-layer-0 equals frame-3-layer-0 equals layer 5. Datastream format See the PNG specification for all aspects of the ANG datastream format except the ahDR and adAT chunks, which are specified here. [[ If/when these chunks are registered, the second letter of each will be capitalized. ]] ahDR must appear exactly once, before IDAT. It contains: num_frames (4 bytes, unsigned) The number of frame specifiers in adAT. ticks_per_second (4 bytes, unsigned) Defines the time unit for frame durations. If this is zero, all frame durations are infinite. num_plays (4 bytes, unsigned) Number of times to play the animation. Zero means infinity. montage_width (4 bytes, unsigned) Width of the montage in adAT, in pixels. Not zero. montage_height (4 bytes, unsigned) Height of the montage in adAT, in pixels. Not zero. montage_interlace_method (1 byte) Interlace method used by the montage in adAT. still_image_used (1 byte, boolean) Must be 0 or 1. If 0, the still image is not used in the animation, and need not be decoded in order to display the animation, and the layer specifiers in adAT do not include from_still_image fields. If 1, the still image may be used in the animation, and each layer specifier in adAT includes a from_still_image field. adAT must appear exactly once, after IDAT. It contains a compressed stream (using the compression method indicated in IHDR), which contains a sequence of num_frames frame specifiers, immediately followed by a montage. Each frame specifier contains: frame_duration (4 bytes, unsigned) Duration of the frame, in ticks. Zero means infinity. keep_prior_layers (1 byte, boolean) Must be 0 or 1. If 0, this frame has layers indicated by its own layer specifiers and no others. If 1, the layers indicated by this frame's layer specifiers are added (in front) to the stack of layers inherited from the previous frame. For the first frame (frame 1), the inherited stack is empty (has zero layers). Even if the animation loops, the first frame does not inherit layers from the last frame of the previous loop. num_layers (1 byte, unsigned) The number of layer specifiers for this frame. layer_specifiers (num_layers * (24 + still_image_used) bytes) A sequence of num_layers layer specifiers, in order from back to front. Each layer specifier contains: from_still_image (0 or 1 byte, boolean) This field appears if and only if the still_image_used field of ahDR is 1. If from_still_image is 0 or absent, the montage is the source image for this layer. If from_still_image is present and 1, the still image is the source image for this layer. No other values are allowed. shift_left (4 bytes, signed) shift_up (4 bytes, signed) clip_left (4 bytes, signed) clip_top (4 bytes, signed) clip_width (4 bytes, unsigned) clip_height (4 bytes, unsigned) The layer is derived from the source image as follows. Starting with the source image positioned with its upper-left corner aligned with the upper-left corner of the frame, the source image is shifted shift_left pixels to the left and shift_up pixels upward, then it is clipped to both the clip boundaries and the frame boundaries, where clip_left and clip_top are the offsets (in pixels) from the upper-left corner of the frame to the upper-left corner of the clip rectangle, and clip_width and clip_height are the dimensions (in pixels) of the clip rectangle. Note that some of these fields are signed and can be negative. The width and height of the layer are the width and height of the overlap of the frame rectangle and the clip rectangle. If the two do not overlap, then the width and height of the layer are zero, which is not a problem for displaying the animation, but is an error when extracting layers as PNG datastreams, because a PNG image is required to have nonzero width and height. Therefore encoders should not specify zero-area layers (which are pointless anyway). Immediately following the sequence of frame specifiers is: montage (bytes) Filtered scanlines. The montage is formatted exactly like the uncompressed contents of IDAT chunk data, except that it uses the width, height, and interlace method indicated in ahDR rather than the ones in IHDR. Typically the best compression is obtained when the montage is very wide and not very tall, with similar layers adjacent; however, this layout makes it necessary for the decoder to decode the entire montage before it can display even the first frame. If the montage is taller and less wide, and earlier layers appear closer to the top, it becomes more possible for the decoder to display as it decodes, but the compression is likely to suffer. Notes on layer composition To display a substrate and layers in front of a background, there are two approaches. One way is to composite the substrate over the background, then composite the back-most layer over the result, and so on, performing the compositing as described in the PNG specification. Another way is to composite the substrate and the layers first (from back to front) to yield the frame image, then composite the frame image over the background. The second approach allows the frame image to be exported, or cached for re-use in case the background changes. The second approach can involve compositing over a not-fully-opaque image, but the PNG specification does not tell how to do that. For images without alpha channels, it is trivial: just keep the front pixel or the back pixel depending on whether the front pixel is transparent. For images with an alpha channel, it can be done as follows. 1. Normalize all the alpha samples the range [0,1]. 2. Gamma-decode all the non-alpha samples (or undo the more sophisticated transfer function indicated by sRGB or iCCP) to yield samples that are proportional to light intensity. 3. Multiply every non-alpha sample by the alpha sample from the same pixel (that is, convert to premultiplied form). 4. Store the substrate in an output buffer. 5. For each layer (in order from back to front), for each pixel in the output buffer: 5a. Let A be the alpha sample of the layer pixel. 5b. For each channel (including the alpha channel), let output_sample = output_sample * (1 - A) + layer_sample At this point the output buffer contains a non-gamma-encoded premultiplied frame image ready to be composited over a background. If the frame image is to be exported, it may be desirable to perform additional steps: 6. Divide each non-alpha sample by the alpha sample of the same pixel (that is, convert to non-premultiplied form). 7. Gamma-encode all the non-alpha samples (or encode them using a more sophisticated transfer function). Gamma encoding and premultiplication are not commutative--the order of steps 2, 3, 6, and 7 is significant. Rationale Putting all the frame data in one chunk avoids the complication of how to deal with reordering by ANG-unaware PNG editors. The cost is encoder streamability, but that capability of animated GIF is not used in practice. If encoder streamability is needed, MNG is available. Separating the concepts of layer and frame, rather than speaking of "zero-duration frames", is more consistent with existing animation terminology. It also helps clarify what is and is not lossless, and avoids tempting decoder implementors to momentarily display partially-constructed "frames" that were never meant to be displayed. The use of alpha composition within the animation (between layers) rather than just between the animation and the external background adds complication that is not strictly necessary, because the encoder could precompute the composed frames and include them in the montage. On the other hand, it can improve compression for animations that can be modeled as sprites moving over each other (possibly with semi-transparent regions and anti-aliased edges), and the general alpha-over-alpha compositing is not much different from the alpha-over-opaque-background compositing that PNG decoders already know. The substrate concept avoids awkward specifications of how to composite an image that lies partly over something and partly over nothing. It avoids questions of what the frame dimensions are if the bounding box of all the layers is smaller than the frame. It ensures that every frame has the same dimensions, which is what people expect. Finally, it lets ANG preserve a property of PNG that the background can show through only if an alpha channel or tRNS is present. The interlace method of the montage is independent of the interlace method of the still image because interlacing is less useful for animations. End of draft. __________________________________________________________________ Thread view [78][png-mng-misc] ANG draft 7 (with multi-layer frames) From: Adam M. Costello - 2007-05-02 06:36:52 Since both APNG and anIM allow frames to be constructed by compositing multiple images, and since none of the implementors here seem the slightest bit put off by this, but instead seem enthusiastic about the potential compression gains, I've added this capability to ANG. I've expressed it in a slightly different way, however, distinguishing between layers and frames, for reasons given in the Rationale section. ANG-7 is more like a midpoint between APNG and anIM. I've tried to keep the most important advantages of both: automatic fallback to a single frame in browsers, and a clear distinction between still images and animations as indicated by filename extensions and media types. I am still hopeful that the PNG folks and the Mozilla folks can meet each other halfway, rather than go off in divergent directions. AMC Animated Network Graphics (ANG), draft 7 (2007-May-01-Tue) Adam M. Costello [79]http://www.nicemice.net/amc/ Changes from draft 6 Introduced the concepts of layer and substrate, to allow frames to be constructed by compositing multiple images, like in APNG and anIM, with the goal of improving compression. This required the addition of a section explaining alpha-over-alpha compositing, expansion of the remarks about frame numbering, and the addition of remarks about the lossyness of frames contrasted with the losslessness of layers. Moved the discussion of the rejected media type out of the Rationale section and into an editorial comment that would not be included in a final draft. Acknowledgements Several good ideas have been taken from the PNG mailing list (currently png-mng-misc@...). Contents Goals Relationship to PNG Datastream tagging Conceptual model Datastream format Rationale Goals 1) Capabilities comparable to animated GIF, plus the added features of PNG (like 24-bit color and alpha). 2) Automatic fallback to PNG in existing web browsers, using the tag, showing an author-selected single frame instead of the animation. 3) Respect for the PNG specification and existing PNG applications and users, to the extent possible given goal 2. 4) Simplicity. 5) Compression at least as good as in animated GIF, or even better if possible in a simple format. The compression need not rival that of a complex format like MNG. Relationship to PNG ANG is not PNG, but it is deliberately very similar. PNG contains a single still image, whereas ANG contains both a still image and an animation. The ANG datastream format is identical to the PNG datastream format (including the signature) except that an ANG datastream must contain an ahDR chunk before IDAT and must contain an adAT chunk after IDAT, whereas a PNG datastream must not contain these chunks, because the PNG specification prohibits multiple images in a PNG datastream. The still image in an ANG serves two purposes: 1) A fallback for applications or display technologies (like paper) that do not support animation. 2) A source image to be used (optionally) in the animation, in addition to the montage in adAT. Unlike PNG and GIF, ANG is not a fully streamable format. ANG encoders cannot produce ANG in a streaming fashion because the frame data is contained in a single chunk. ANG decoders can or cannot consume ANG in a streaming fashion depending on how encoders choose to lay out the data. There is typically a trade-off between streamability and compression. Datastream tagging Because both PNG and ANG use the same signature, it is important that ANG be tagged correctly. Its media type is "video/x-ang". The media type "image/png" must not be used for ANG, because ANG is not PNG, and because a video is not an image. [[ If/when this media type is registered, the "x-" prefix will be removed. ]] The recommended file extension for ANG is ".ang". The extension ".png" should never be used for ANG, because it is important that users be able to easily distinguish PNG and ANG, so that they are not surprised if a PNG viewer does not show the animation in an ANG, or if a PNG editor drops the animation from an ANG. The deliberate similarity between the ANG and PNG formats facilitates incremental deployment of ANG, with automatic fallback to PNG. When ANG-unaware PNG-aware applications are fed an ANG datastream, they will misinterpret the ANG as a PNG, ignore the unrecognized ahDR and adAT chunks, and display the still image. This is potentially confusing to users, but hopefully the media type and the filename will mitigate that hazard. For example, this HTML inline image will display as a still-image PNG in most ANG-unaware web browsers, even though the URL ends in ".ang" and the HTTP server tags it as "video/x-ang": ; [[ I have verified this for Firefox 1.5 & 2.0, IE 6 & 7, Safari, Konqueror 3.1, and Opera Mini. To help people test other browsers, I have created [81]http://www.nicemice.net/amc/test/ang.html, which contains three instances of the same inline PNG image, one served as "image/png; x-anim=1", one served as "video/x-ang", and one served as "application/octet-stream". The first of those types might, in theory, be more likely than "video/x-ang" to facilitate automatic fallback, because RFCs 2045 and 2046 require that unrecognize media type parameters be ignored. However, "video/x-ang" works just fine in practice, so the param-hackery is not needed. ]] The distinct media types for ANG and PNG allow greater control over the fallback, if desired. For example, if you wanted ANG-unaware web browsers to fall back to animated GIF rather than still PNG, you could do something like this: ANG-aware decoders should use the following logic to determine whether a datastream beginning with the PNG signature is PNG or ANG: 1) If a media type is available, trust it. 2) Otherwise, if a filename is available, and it ends with ".png" or ".ang" (or any capitalization thereof), trust it. 3) Otherwise, if ahDR is present, assume ANG. 4) Otherwise, assume PNG. Since ahDR and adAT are invalid in PNG, they are errors if encountered in a PNG datastream. Decoders that recognize them should treat them like any erroneous ancillary chunks: ignore them, and notify the user if appropriate. For this particular error, the notification could perhaps suggest that the user rename or re-tag the file if possible. Of course decoders that do not recognize these chunks will just ignore them. Conceptual model An ANG datastream encodes a still image, just like PNG, and also encodes an animation, which is a sequence of images, called frames, all the same width and height as the still image, which are to be displayed consecutively in the same place, each for a nonzero duration indicated in the datastream (but interactive applications should allow the user to pause or jump to the next frame at any time). Each frame of the animation is the result of stacking zero or more constituent images, called layers, in front of a default image, called the substrate. The substrate has the same width, height, and position as the frames, and is uniformly filled with a single pixel value. The layers can be smaller than a frame, and they always lie completely within the frame boundary. Each layer is a positioned and clipped copy of either the still image or a second image called the montage. The layers and the substrate do not necessarily hide what lies behind them, because pixels can be transparent or partially transparent. The pixel value used to fill the substrate is different for different color types: For color type 1 (indexed-color), it is palette index 0. For color types 0 and 2 (non-indexed without alpha) it is the value of the tRNS chunk if present, otherwise all zeros (black). For color types 4 and 6 (non-indexed with alpha), it is all zeros (fully transparent black). All meta-data (chunk fields) that apply to the still image in IDAT also apply to the montage in adAT, with only three exceptions: The width, height, and interlace method in IHDR do not apply to the montage. The montage has its own width, height, and interlace method given in ahDR. All meta-data that apply to each pixel of the still image and the montage (like color type, bit depth, significant bits, palette, color space, physical size) also apply to each pixel of the layers and the substrate. The still image and montage are represented losslessly in an ANG datastream. Since the layers are simply positioned and clipped copies of those images, they are also represented losslessly. The frames, however, are represented as compositions of images which can be lossy. For datastreams with alpha channels (color types 4 and 6), the composition involves gamma decoding and alpha blending (and perhaps gamma re-encoding), which are subject to floating-point round-off errors and slight differences in implementation. For datastreams without alpha channels (color types 0, 1, and 4), the composition involves only simple pixel replacement, and the frames are lossless. Frame and layer numbering Frame and layer numbering is specified here for consistency among applications that allow users to refer to particular frames and layers. The frame and layer numbers are not used inside ANG datastreams. The frames of the animation are numbered starting with 1. Frame 0 refers to the still image, which unlike the animation frames does not have the substrate underlying it. An animation can request to be played more than once, but this does not affect the frame count. The layers of the animation are numbered starting with 3. Layer 0 refers to the still image. Layer 1 refers to the montage. Layer 2 refers to the substrate. A frame can inherit the layers of the previous frame and add new layers in front, but in this case only the new layers get new layer numbers; the inherited layers keep their original layer numbers. Layers within a frame can also be numbered relative to the frame, starting with 0 for the back-most layer. For example, if frame 2 is composed of layers 5 and 6, and frame 3 inherits the layers from frame 2, then frame-2-layer-0 equals frame-3-layer-0 equals layer 5. Datastream format See the PNG specification for all aspects of the ANG datastream format except the ahDR and adAT chunks, which are specified here. [[ If/when these chunks are registered, the second letter of each will be capitalized. ]] ahDR must appear exactly once, before IDAT. It contains: num_frames (4 bytes, unsigned) The number of frame specifiers in adAT. ticks_per_second (4 bytes, unsigned) Defines the time unit for frame durations. If this is zero, all frame durations are infinite. num_plays (4 bytes, unsigned) Number of times to play the animation. Zero means infinity. montage_width (4 bytes, unsigned) Width of the montage in adAT, in pixels. Not zero. montage_height (4 bytes, unsigned) Height of the montage in adAT, in pixels. Not zero. montage_interlace_method (1 byte) Interlace method used by the montage in adAT. still_image_used (1 byte, boolean) Must be 0 or 1. If 0, the still image is not used in the animation, and need not be decoded in order to display the animation, and the layer specifiers in adAT do not include from_still_image fields. If 1, the still image may be used in the animation, and each layer specifier in adAT includes a from_still_image field. adAT must appear exactly once, after IDAT. It contains a compressed stream (using the compression method indicated in IHDR), which contains a sequence of num_frames frame specifiers, immediately followed by a montage. Each frame specifier contains: frame_duration (4 bytes, unsigned) Duration of the frame, in ticks. Zero means infinity. keep_prior_layers (1 byte, boolean) Must be 0 or 1. If 0, this frame has layers indicated by its own layer specifiers and no others. If 1, the layers indicated by this frame's layer specifiers are added (in front) to the stack of layers inherited from the previous frame. For the first frame (frame 1), the inherited stack is empty (has zero layers). Even if the animation loops, the first frame does not inherit layers from the last frame of the previous loop. num_layers (1 byte, unsigned) The number of layer specifiers for this frame. layer_specifiers (num_layers * (24 + still_image_used) bytes) A sequence of num_layers layer specifiers, in order from back to front. Each layer specifier contains: from_still_image (0 or 1 byte, boolean) This field appears if and only if the still_image_used field of ahDR is 1. If from_still_image is 0 or absent, the montage is the source image for this layer. If from_still_image is present and 1, the still image is the source image for this layer. No other values are allowed. shift_left (4 bytes, signed) shift_up (4 bytes, signed) clip_left (4 bytes, signed) clip_top (4 bytes, signed) clip_width (4 bytes, unsigned) clip_height (4 bytes, unsigned) The layer is derived from the source image as follows. Starting with the source image positioned with its upper-left corner aligned with the upper-left corner of the frame, the source image is shifted shift_left pixels to the left and shift_up pixels upward, then it is clipped to both the clip boundaries and the frame boundaries, where clip_left and clip_top are the offsets (in pixels) from the upper-left corner of the frame to the upper-left corner of the clip rectangle, and clip_width and clip_height are the dimensions (in pixels) of the clip rectangle. Note that some of these fields are signed and can be negative. The width and height of the layer are the width and height of the overlap of the frame rectangle and the clip rectangle. If the two do not overlap, then the width and height of the layer are zero, which is not a problem for displaying the animation, but is an error when extracting layers as PNG datastreams, because a PNG image is required to have nonzero width and height. Therefore encoders should not specify zero-area layers (which are pointless anyway). Immediately following the sequence of frame specifiers is: montage (bytes) Filtered scanlines. The montage is formatted exactly like the uncompressed contents of IDAT chunk data, except that it uses the width, height, and interlace method indicated in ahDR rather than the ones in IHDR. Typically the best compression is obtained when the montage is very wide and not very tall, with similar layers adjacent; however, this layout makes it necessary for the decoder to decode the entire montage before it can display even the first frame. If the montage is taller and less wide, and earlier layers appear closer to the top, it becomes more possible for the decoder to display as it decodes, but the compression is likely to suffer. Notes on layer composition To display a substrate and layers in front of a background, there are two approaches. One way is to composite the substrate over the background, then composite the back-most layer over the result, and so on, performing the compositing as described in the PNG specification. Another way is to composite the substrate and the layers first (from back to front) to yield the frame image, then composite the frame image over the background. The second approach allows the frame image to be exported, or cached for re-use in case the background changes. The second approach can involve compositing over a not-fully-opaque image, but the PNG specification does not tell how to do that. For images without alpha channels, it is trivial: just keep the front pixel or the back pixel depending on whether the front pixel is transparent. For images with an alpha channel, it can be done as follows. 1. Normalize all the alpha samples the range [0,1]. 2. Gamma-decode all the non-alpha samples (or undo the more sophisticated transfer function indicated by sRGB or iCCP) to yield samples that are proportional to light intensity. 3. Multiply every non-alpha sample by the alpha sample from the same pixel (that is, convert to premultiplied form). 4. Store the substrate in an output buffer. 5. For each layer (in order from back to front), for each pixel in the output buffer: 5a. Let A be the alpha sample of the layer pixel. 5b. For each channel (including the alpha channel), let output_sample = output_sample * (1 - A) + layer_sample At this point the output buffer contains a non-gamma-encoded premultiplied frame image ready to be composited over a background. If the frame image is to be exported, it may be desirable to perform additional steps: 6. Divide each non-alpha sample by the alpha sample of the same pixel (that is, convert to non-premultiplied form). 7. Gamma-encode all the non-alpha samples (or encode them using a more sophisticated transfer function). Gamma encoding and premultiplication are not commutative--the order of steps 2, 3, 6, and 7 is significant. Rationale Putting all the frame data in one chunk avoids the complication of how to deal with reordering by ANG-unaware PNG editors. The cost is encoder streamability, but that capability of animated GIF is not used in practice. If encoder streamability is needed, MNG is available. Separating the concepts of layer and frame, rather than speaking of "zero-duration frames", is more consistent with existing animation terminology. It also helps clarify what is and is not lossless, and avoids tempting decoder implementors to momentarily display partially-constructed "frames" that were never meant to be displayed. The use of alpha composition within the animation (between layers) rather than just between the animation and the external background adds complication that is not strictly necessary, because the encoder could precompute the composed frames and include them in the montage. On the other hand, it can improve compression for animations that can be modeled as sprites moving over each other (possibly with semi-transparent regions and anti-aliased edges), and the general alpha-over-alpha compositing is not much different from the alpha-over-opaque-background compositing that PNG decoders already know. The substrate concept avoids awkward specifications of how to composite an image that lies partly over something and partly over nothing. It avoids questions of what the frame dimensions are if the bounding box of all the layers is smaller than the frame. It ensures that every frame has the same dimensions, which is what people expect. Finally, it lets ANG preserve a property of PNG that the background can show through only if an alpha channel or tRNS is present. The interlace method of the montage is independent of the interlace method of the still image because interlacing is less useful for animations. End of draft. References 74. file:///p/png-mng/mailman/message/2439203/ 75. http://www.nicemice.net/amc/ 76. http://example/foo.ang"> 77. http://www.nicemice.net/amc/test/ang.html 78. file:///p/png-mng/mailman/message/2439203/ 79. http://www.nicemice.net/amc/ 80. http://example/foo.ang"> 81. http://www.nicemice.net/amc/test/ang.html