[guardian-dev] sanitizing PNGs

Michael Rogers michael at briarproject.org
Thu Mar 29 05:24:23 EDT 2018


That doesn't look easy, unfortunately. The class seems to be designed to
work in three stages:

1. Load the EXIF data from a file or input stream
2. Modify the EXIF data
3. Write the modified image to an output stream by reading the input a
second time and replacing the EXIF segment

We can't skip to stage 3 because it depends on state that was
initialised during stage 1. Even if we don't care about the original
EXIF data, some of the state seems like it would be vital, such as the
byte order and colour space.

Maybe we could use a giant BufferedInputStream big enough to hold the
whole image, allowing us to read the stream twice?

Cheers,
Michael

On 28/03/18 19:43, Hans-Christoph Steiner wrote:
> 
> Ah cool!  It would be awesome to have the EXIF stripping work on a
> stream, rather than a file.
> 
> .hc
> 
> Michael Rogers:
>> Fantastic!
>>
>> The code is just a single file with minimal Android dependencies, so I
>> made a quick (untested) Java port:
>>
>> https://code.briarproject.org/akwizgran/metadata
>>
>> Cheers,
>> Michael
>>
>> On 26/03/18 22:32, Hans-Christoph Steiner wrote:
>>>
>>> Turns out Google released an Android Support library that makes it
>>> trivial to strip EXIF from JPEGs and some RAW formats:
>>> https://android-developers.googleblog.com/2016/12/introducing-the-exifinterface-support-library.html
>>>
>>> I found it via this app in F-Droid:
>>> https://gitlab.com/juanitobananas/scrambled-exif
>>>
>>> This is all it does:
>>> ExifInterface exifInterface = new ExifInterface(imagePath);
>>> for (String attribute : getExifAttributes()) {
>>>   if (exifInterface.getAttribute(attribute) != null) {
>>>     exifInterface.setAttribute(attribute, null);
>>>   }
>>> exifInterface.saveAttributes();
>>>
>>> .hc
>>>
>>> Michael Rogers:
>>>> Please feel free to use it, I place it in the public domain. I'll have a
>>>> look at JPEGs next time I'm procrastinating. ;-)
>>>>
>>>> (By the way, after sending I noticed a bug: if the file ends with a
>>>> truncated ancillary chunk, I think the cleaner will loop forever trying
>>>> to skip to the end of the chunk. Should be easy to fix though.)
>>>>
>>>> Cheers,
>>>> Michael
>>>>
>>>> On 13/12/17 13:02, Hans-Christoph Steiner wrote:
>>>>>
>>>>> That's awesome!  Feeling inspired to also strip JPEGs? :-)  I think
>>>>> they're easier.  There is jhead, exiftool, and ObscuraCam's JNI code for
>>>>> examples.  Can we use this under the GPLv3?
>>>>>
>>>>> .hc
>>>>>
>>>>> Michael Rogers:
>>>>>> Hi Hans-Christoph,
>>>>>>
>>>>>> I hacked this together based on the PNG specification, which
>>>>>> distinguishes between ancillary chunks that can be removed without
>>>>>> affecting the image data, and critical chunks that can't. It's been
>>>>>> tested on exactly two PNGs so far. :-)
>>>>>>
>>>>>> http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html
>>>>>>
>>>>>> Cheers,
>>>>>> Michael
>>>>>>
>>>>>> On 12/12/17 16:25, Hans-Christoph Steiner wrote:
>>>>>>>
>>>>>>> pyexiftool is just a wrapper for exiftool.  exiftool looks great, but
>>>>>>> for my use case, I only need to strip all metadata.  It would be much
>>>>>>> easier if that was in pure Python and pure Java.  perl is a no go on
>>>>>>> Android.
>>>>>>>
>>>>>>> It was dead simple to strip EXIF from JPEG in Python:
>>>>>>>
>>>>>>>         from pil import Image
>>>>>>>         with open(inpath) as fp:
>>>>>>>             in_image = Image.open(fp)
>>>>>>>             data = list(in_image.getdata())
>>>>>>>             out_image = Image.new(in_image.mode, in_image.size)
>>>>>>>         out_image.putdata(data)
>>>>>>>         out_image.save(outpath)
>>>>>>>
>>>>>>> But that broke some PNGs, and the rest were larger in size.
>>>>>>>
>>>>>>> .hc
>>>>>>>
>>>>>>> Rick Valenzuela:
>>>>>>>> oh, you may already know this, but the previous code keeps a copy of the
>>>>>>>> file and metadata. if you want it gone with no copies, you have to add a
>>>>>>>> switch to overwrite, e.g.:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> with exiftool.ExifTool() as et:
>>>>>>>>     et.execute(b'-all=', b'-overwrite_original', b'some.png')
>>>>>>>> ```
>>>>>>>>
>>>>>>>> On 12/12/2017 23:45, Rick Valenzuela wrote:
>>>>>>>>> heh, nice --  I just found this:
>>>>>>>>>
>>>>>>>>> https://github.com/smarnach/pyexiftool
>>>>>>>>>
>>>>>>>>> Tried it out and it worked great:
>>>>>>>>> ```
>>>>>>>>> with exiftool.ExifTool() as et:
>>>>>>>>>      et.execute(b'-all=', b'some.png')
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> On 12/12/2017 19:53, Hans-Christoph Steiner wrote:
>>>>>>>>>>
>>>>>>>>>> Ah, cool, I thought exiftool only worked with JPEGs.  It seems to work
>>>>>>>>>> with just about every image format.  Now the open question is how to
>>>>>>>>>> strip all PNG metadata with Python and Java.
>>>>>>>>>>
>>>>>>>>>> .hc
>>>>>>>>>>
>>>>>>>>>> Rick Valenzuela:
>>>>>>>>>>> does exiftool do what you need?
>>>>>>>>>>>
>>>>>>>>>>> `exiftool -all= <some.PNG>`
>>>>>>>>>>>
>>>>>>>>>>> On 11/12/2017 17:57, Hans-Christoph Steiner wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Anyone know any tools for sanitizing PNGs without touching the
>>>>>>>>>>>> compressed image data?  With JPEG it is easy to strip out EXIF with
>>>>>>>>>>>> python-pil or many other tools. I haven't found a simple, clean approach
>>>>>>>>>>>> in Python for PNGs.
>>>>>>>>>>>>
>>>>>>>>>>>> .hc
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x9FC527CC.asc
Type: application/pgp-keys
Size: 4660 bytes
Desc: not available
URL: <http://lists.mayfirst.org/pipermail/guardian-dev/attachments/20180329/63518eb1/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mayfirst.org/pipermail/guardian-dev/attachments/20180329/63518eb1/attachment-0001.sig>


More information about the guardian-dev mailing list