iText and PDF/UA development

interview

↧

Using PDF to solve healthcare's unstructured data problem

August 9, 2014, 6:37 am

≫ Next: ZUGFeRD Tutorial: examples for chapter 2

≪ Previous: iText and PDF/UA development

Language English

If there’s one major challenge to single out in healthcare IT today, it would be leveraging the growth and usage of big data. While consumer IT made big advances in the past decade to get a handle of data by marking up content, indexing it, and annotating it for use, enterprise, and healthcare IT in particular, still need to catch up on making data actionable.

A typical healthcare office handles tens of thousands of documents for patient records, legal, finance, billing processes. In pharma and biotech, a typical FDA drug review process, involves multiple stages of trials, testing, applications, marketing and manufacturing for the new drug – all requiring a mind-blowing amount of paperwork. In all these cases, either the collected data is not timely or relevant, or it doesn’t present enough opportunity to easily access, archive for the future or comply with legal standards.

This article provides insights into how using the Portable Document Format (PDF) and accompanying tools within healthcare organizations can be a powerful way to help solve the unstructured data challenge, speed up processes, and reduce the costs for document handling.

We will explain why PDF, with its ability to contain data structure and interactivity, is the perfect document format for meeting the archiving, accessibility and compliance requirements of the healthcare industry. We will also examine the building blocks of a solution that helps create such compliant PDF documents, and deep dive into the ways to organize and structure PDFs.

Read the full Article...

Blog Post Type:

Insights and thoughts

Tags:

↧

ZUGFeRD Tutorial: examples for chapter 2

August 27, 2015, 3:03 am

≫ Next: How can I generate a PDF/UA compatible PDF with iText?

≪ Previous: Using PDF to solve healthcare's unstructured data problem

These are some examples that were written in the context of Chapter 2 of the tutorial ZUGFeRD: The Future of Invoicing.

C2E1_SimplePdf creates a simple "Quick brown fox jumps over the lazy dog" PDF with some images, but without any structure. This results in a regular PDF.
C2E2_TaggedPdf.java uses the same code as the first example, but now we ask iText to introduce structure. This results in a Tagged PDF.
C2E3_PdfA3b.java adapts the first example, so that it conforms to the PDF/A-3 standard, level B (for Basic). The resulting PDF is not a Tagged PDF.
C2E4_PdfA3a.java adapts the third example, so that it conforms to the PDF/A-3 standard, level A (for Accessibility). The resulting PDF is a Tagged PDF.

Files:

C2E1_SimplePdf.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; 
importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.PdfWriter; 
importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; 
importsandbox.WrapToTest; 
/**
 * Creates a simple PDF with images and text.
 */
@WrapToTest
publicclass C2E1_SimplePdf { 
    /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox1.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp"; 
    /**
     * Creates a simple PDF with images and text.
     * @param args no arguments needed.
     * @throws IOException
     * @throws DocumentException 
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E1_SimplePdf().createPdf(DEST);} 
    /**
     * Creates a simple PDF with images and text
     * @param dest  the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);
        document.open();Paragraph p =newParagraph();
        p.setFont(newFont(Font.FontFamily.HELVETICA, 20));Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p);
        document.close();} 
}

C2E2_TaggedPdf.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; 
importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.PdfWriter; 
importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; 
/**
 * Creates a Tagged PDF with images and text.
 */publicclass C2E2_TaggedPdf { 
    /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox2.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp"; 
    /**
     * Creates a tagged PDF with images and text.
     * @param args  no arguments needed
     * @throws IOException
     * @throws DocumentException 
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E2_TaggedPdf().createPdf(DEST);} 
    /**
     * Creates a tagged PDF with images and text.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//TAGGED PDF//Make document tagged
        writer.setTagged();//==========
        document.open(); 
        Paragraph p =newParagraph();
        p.setFont(newFont(Font.FontFamily.HELVETICA, 20));Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p); 
        document.close();} 
}

C2E3_PdfA3b.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; 
importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.ICC_Profile;importcom.itextpdf.text.pdf.PdfAConformanceLevel;importcom.itextpdf.text.pdf.PdfAWriter;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; 
importjava.io.File;importjava.io.FileInputStream;importjava.io.FileOutputStream;importjava.io.IOException; 
/**
 * Creates a PDF that conforms with PDF/A-3 Level B.
 */
@WrapToTest
publicclass C2E3_PdfA3b { 
    /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox3.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A path to a color profile. */publicstaticfinalString ICC ="resources/data/sRGB_CS_profile.icm";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; 
    /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param args  No arguments needed
     * @throws IOException
     * @throws DocumentException 
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E3_PdfA3b().createPdf(DEST);} 
    /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());//PDF/A-3b//Create PdfAWriter with the required conformance levelPdfAWriter writer =PdfAWriter.getInstance(document, newFileOutputStream(dest), PdfAConformanceLevel.PDF_A_3B);
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//Create XMP metadata
        writer.createXmpMetadata();//====================
        document.open();//PDF/A-3b//Set output intentsICC_Profile icc =ICC_Profile.getInstance(newFileInputStream(ICC));
        writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);//=================== 
        Paragraph p =newParagraph();//PDF/A-3b//Embed font
        p.setFont(FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20));//=============Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p); 
        document.close();} 
}

C2E4_PdfA3a.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; 
importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.ICC_Profile;importcom.itextpdf.text.pdf.PdfAConformanceLevel;importcom.itextpdf.text.pdf.PdfAWriter;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfString;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; 
importjava.io.File;importjava.io.FileInputStream;importjava.io.FileOutputStream;importjava.io.IOException; 
/**
 * Creates a PDF that conforms with PDF/A-3 Level A.
 */
@WrapToTest
publicclass C2E4_PdfA3a { 
    /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox4.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A path to a color profile. */publicstaticfinalString ICC ="resources/data/sRGB_CS_profile.icm";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; 
    /**
     * Creates a PDF that conforms with PDF/A-3 Level A.
     * @param args  no arguments needed
     * @throws IOException
     * @throws DocumentException 
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E4_PdfA3a().createPdf(DEST);} 
    /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());//PDF/A-3a//Create PdfAWriter with the required conformance levelPdfAWriter writer =PdfAWriter.getInstance(document, newFileOutputStream(dest), PdfAConformanceLevel.PDF_A_3A);
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//====================//TAGGED PDF//Make document tagged
        writer.setTagged();//===============//PDF/UA//Set document metadata
        writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
        document.addLanguage("en-US");
        document.addTitle("Some title");
        writer.createXmpMetadata();//=====================
        document.open();//PDF/A-3b//Set output intentsICC_Profile icc =ICC_Profile.getInstance(newFileInputStream(ICC));
        writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);//=================== 
        Paragraph p =newParagraph();//PDF/UA//Embed font
        p.setFont(FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20));//==================Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Fox"));//==============
        p.add(c);
        p.add(newChunk(" jumps over the lazy "));
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Dog"));//==================
        p.add(c);
        document.add(p); 
        document.close();} 
}

File name	Raw URL	Updated
C2E1_SimplePdf.java	C2E1_SimplePdf.java	2015-08-27 12:02 pm
C2E2_TaggedPdf.java	C2E2_TaggedPdf.java	2015-08-27 12:02 pm
C2E3_PdfA3b.java	C2E3_PdfA3b.java	2015-08-27 12:02 pm
C2E4_PdfA3a.java	C2E4_PdfA3a.java	2015-08-27 12:02 pm

Resources:

File name	Raw URL	Updated
dog.bmp	dog.bmp	2015-08-27 12:05 pm
fox.bmp	fox.bmp	2015-08-27 12:05 pm
sRGB_CS_profile.icm	sRGB_CS_profile.icm	2015-08-27 12:06 pm
FreeSans.ttf	FreeSans.ttf	2015-08-27 12:07 pm

Results:

File name	Raw URL	Updated
cmp_quickbrownfox1.pdf	cmp_quickbrownfox1.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox2.pdf	cmp_quickbrownfox2.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox3.pdf	cmp_quickbrownfox3.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox4.pdf	cmp_quickbrownfox4.pdf	2015-08-27 12:19 pm

Tags:

PDF/A-3

PDF/A

↧

How can I generate a PDF/UA compatible PDF with iText?

October 11, 2015, 1:36 am

≫ Next: Creating a simple PDF/UA document

≪ Previous: ZUGFeRD Tutorial: examples for chapter 2

We have a number of dynamically generated PDFs on our site that were created using iText 2.1.7. However, we also have a large number of users that have disabilities and use screen readers, like JAWS, to render our PDFs. We use the setTagged() method to tag the PDFs, but some elements of the PDF appear out of order. Some even become more jumbled after calling setTagged()!

I read about PDF/UA in a 2013 interview about iText with Bruno Lowagie, and this seems like something that might help with our problem. However, I have not been able to find a good example of how to generate a PDF/UA document. Can you provide an example?

Posted on StackOverflow on Jan 29, 2015 by k-den

Please take a look at the PdfUA example. It explains step by step what is needed to be compliant with PDF/UA. A similar example was presented at the iText Summit in 2014 and at JavaOne. Watch the iText Summit video tutorial.

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document(PageSize.A4.rotate());
    PdfWriter writer =
        PdfWriter.getInstance(document, new FileOutputStream(dest));
    writer.setPdfVersion(PdfWriter.VERSION_1_7);
    //TAGGED PDF
    //Make document tagged
    writer.setTagged();
    //===============
    //PDF/UA
    //Set document metadata
    writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
    document.addLanguage("en-US");
    document.addTitle("English pangram");
    writer.createXmpMetadata();
    //=====================
    document.open(); 
    Paragraph p = new Paragraph();
    //PDF/UA
    //Embed font
    Font font =
        FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20);
    p.setFont(font);
    //==================
    Chunk c = new Chunk("The quick brown ");
    p.add(c);
    Image i = Image.getInstance(FOX);
    c = new Chunk(i, 0, -24);
    //PDF/UA
    //Set alt text
    c.setAccessibleAttribute(PdfName.ALT, new PdfString("Fox"));
    //==============
    p.add(c);
    p.add(new Chunk(" jumps over the lazy "));
    i = Image.getInstance(DOG);
    c = new Chunk(i, 0, -24);
    //PDF/UA
    //Set alt text
    c.setAccessibleAttribute(PdfName.ALT, new PdfString("Dog"));
    //==================
    p.add(c);
    document.add(p);
    p = new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n", font);
    document.add(p);
    List list = new List(true);
    list.add(new ListItem("quick", font));
    list.add(new ListItem("brown", font));
    list.add(new ListItem("fox", font));
    list.add(new ListItem("jumps", font));
    list.add(new ListItem("over", font));
    list.add(new ListItem("the", font));
    list.add(new ListItem("lazy", font));
    list.add(new ListItem("dog", font));
    document.add(list);
    document.close();
}

You make the document tagged with the setTagged document, but that's not sufficient. You also need to set document data: the document title needs to be displayed and you need to indicate the language used in the document. XMP metadata is mandatory.

Furthermore you need to embed all fonts. When you have images, you need a alternate description. In the example, we replace the words "dog" and "fox" by an image. To make sure that these images are "read out loud" correctly, we need to use the setAccessibleAttribute() method.

At the end of the example, I added a numbered list. In another question, you claim that the list is not read out loud correctly by JAWS. If you check the PDF file created with the above example, more specifically pdfua.pdf, you'll discover that JAWS reads the document as expected, with the numbers and the text in the right order.

The reason why "it doesn't work" when you try this, is simple. You are using a version of iText that is 3 years older than the PDF/UA standard. Also: in the version you are using, you are responsible for creating the tag structure at the lowest PDF level when you use the setTagged() method. In more recent version, iText takes care of this at a high level. You need the latest iText version to achieve what you want.

Category:

Getting Started

Tags:

↧

Creating a simple PDF/UA document

October 11, 2015, 7:03 am

≫ Next: Tagged PDF: Adding Alt to the Structure Tree

≪ Previous: How can I generate a PDF/UA compatible PDF with iText?

This example was written in answer to the question How can I generate a PDF/UA compatible PDF with iText?

Files:

PdfUA.java

/**
 * Example written by Bruno Lowagie in answer to:
 * http://stackoverflow.com/questions/28222277/how-can-i-generate-a-pdf-ua-compatible-pdf-with-itext
 */packagesandbox.pdfa; 
importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.List;importcom.itextpdf.text.ListItem;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfString;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; 
importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; 
/**
 * Creates an accessible PDF with images and text.
 */
@WrapToTest
publicclass PdfUA { 
    /** The resulting PDF. */publicstaticfinalString DEST ="results/pdfa/pdfua.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; 
    /**
     * Creates an accessible PDF with images and text.
     * @param args no arguments needed
     * @throws IOException
     * @throws DocumentException 
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new PdfUA().createPdf(DEST);} 
    /**
     * Creates an accessible PDF with images and text.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException 
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//TAGGED PDF//Make document tagged
        writer.setTagged();//===============//PDF/UA//Set document metadata
        writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
        document.addLanguage("en-US");
        document.addTitle("English pangram");
        writer.createXmpMetadata();//=====================
        document.open(); 
        Paragraph p =newParagraph();//PDF/UA//Embed fontFont font =FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20);
        p.setFont(font);//==================Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Fox"));//==============
        p.add(c);
        p.add(newChunk(" jumps over the lazy "));
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Dog"));//==================
        p.add(c);
        document.add(p); 
        p =newParagraph("\n\n\n\n\n\n\n\n\n\n\n\n", font);
        document.add(p);List list =newList(true);
        list.add(newListItem("quick", font));
        list.add(newListItem("brown", font));
        list.add(newListItem("fox", font));
        list.add(newListItem("jumps", font));
        list.add(newListItem("over", font));
        list.add(newListItem("the", font));
        list.add(newListItem("lazy", font));
        list.add(newListItem("dog", font));
        document.add(list);
        document.close();} 
}

File name	Raw URL	Updated
PdfUA.java	PdfUA.java	2015-10-11 4:03 pm

Results:

File name	Raw URL	Updated
cmp_pdfua.pdf	cmp_pdfua.pdf	2015-10-11 4:04 pm

Tags:

↧

Tagged PDF: Adding Alt to the Structure Tree

December 2, 2015, 7:51 am

≫ Next: How to add alternative text for an image in Tagged PDF?

≪ Previous: Creating a simple PDF/UA document

Files:

/*
 * This example was written in answer to the following question:
 * http://stackoverflow.com/questions/34036200
 */packagesandbox.pdfa; 
importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.pdf.PdfArray;importcom.itextpdf.text.pdf.PdfDictionary;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfReader;importcom.itextpdf.text.pdf.PdfStamper;importcom.itextpdf.text.pdf.PdfString;importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException;importsandbox.WrapToTest; 
@WrapToTest
publicclass AddAltTags { 
    publicstaticfinalString SRC ="resources/pdfs/no_alt_attribute.pdf";publicstaticfinalString DEST ="results/pdfa/added_alt_attributes.pdf"; 
    publicstaticvoid main(String[] args)throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new AddAltTags().manipulatePdf(SRC, DEST);} 
    publicvoid manipulatePdf(String src, String dest)throwsIOException, DocumentException{PdfReader reader =newPdfReader(src);PdfDictionary catalog = reader.getCatalog();PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);
        manipulate(structTreeRoot);PdfStamper stamper =newPdfStamper(reader, newFileOutputStream(dest));
        stamper.close();} 
    publicvoid manipulate(PdfDictionary element){if(element ==null)return;if(PdfName.FIGURE.equals(element.get(PdfName.S))){
            element.put(PdfName.ALT, newPdfString("Figure without an Alt description"));}PdfArray kids = element.getAsArray(PdfName.K);if(kids ==null)return;for(int i =0; i < kids.size(); i++)
            manipulate(kids.getAsDict(i));}}

File name	Raw URL	Updated
AddAltTags.java	AddAltTags.java	2015-12-02 4:51 pm

Resources:

File name	Raw URL	Updated
no_alt_attribute.pdf	no_alt_attribute.pdf	2015-12-02 4:55 pm

Results:

File name	Raw URL	Updated
cmp_added_alt_attributes.pdf	cmp_added_alt_attributes.pdf	2015-12-02 4:57 pm

Tags:

images

Manipulating existing PDFs

↧

How to add alternative text for an image in Tagged PDF?

December 2, 2015, 8:23 am

≫ Next: How to add alternative text for an image in Tagged PDF?

≪ Previous: Tagged PDF: Adding Alt to the Structure Tree

I know that iText can generate tagged PDF documents from scratch, but is it possible to insert alternative text for images in an existing tagged PDF without changing anything else? I need to implement this feature in a program without using GUI applications such as Adobe Acrobat Pro.

Posted on StackOverflow on Dec 2, 2015 by tsforsure

Please take a look at the AddAltTags example.

In this example, we take a PDF with images of a fox and a dog where the Alt keys are missing: no_alt_attribute.pdf

Structure element without /Alt key

Code can't recognize a fox or a dog, so we create a new document with Alt attributes saying "Figure without an Alt description": added_alt_attributes.pdf)

Structure element with /Alt key

We add this description by walking through the structure tree, looking for structural elements marked as /Figure elements:

public void manipulatePdf(String src, String dest)
    throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfDictionary catalog = reader.getCatalog();
    PdfDictionary structTreeRoot =
        catalog.getAsDict(PdfName.STRUCTTREEROOT);
    manipulate(structTreeRoot);
    PdfStamper stamper = new PdfStamper(
        reader, new FileOutputStream(dest));
    stamper.close();
}

public void manipulate(PdfDictionary element) {
    if (element == null)
        return;
    if (PdfName.FIGURE.equals(element.get(PdfName.S))) {
        element.put(PdfName.ALT,
            new PdfString("Figure without an Alt description"));
    }
    PdfArray kids = element.getAsArray(PdfName.K);
    if (kids == null) return;
    for (int i = 0; i < kids.size(); i++)
        manipulate(kids.getAsDict(i));
}

You can easily port this Java example to C#:

Get the root dictionary from the PdfReader object,
Get the root of the structure tree (a dictionary),
Loop over all the kids of every branch of that tree,
When a lead is a figure, add an /Alt entry.

Once this is done, use PdfStamper to save the altered file.

Category:

Tags:

Manipulating existing PDFs (iText 7)

images

↧

How to add alternative text for an image in Tagged PDF?

May 31, 2016, 2:17 am

≫ Next: Tagged PDF: Adding Alt to the Structure Tree

≪ Previous: How to add alternative text for an image in Tagged PDF?

Posted on StackOverflow on Dec 2, 2015 by tsforsure

Please take a look at the AddAltTags example.

In this example, we take a PDF with images of a fox and a dog where the Alt keys are missing: no_alt_attribute.pdf

Code can't recognize a fox or a dog, so we create a new document with Alt attributes saying "Figure without an Alt description": added_alt_attributes.pdf)

We add this description by walking through the structure tree, looking for structural elements marked as /Figure elements:

public void manipulatePdf(String src, String dest) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
    PdfDictionary catalog = pdfDoc.getCatalog().getPdfObject();
    PdfDictionary structTreeRoot = catalog.getAsDictionary(PdfName.StructTreeRoot);
    manipulate(structTreeRoot);
    pdfDoc.close();
}

public void manipulate(PdfDictionary element) {
    if (element == null) {
        return;
    }
    if (PdfName.Figure.equals(element.get(PdfName.S))) {
        element.put(PdfName.Alt, new PdfString("Figure without an Alt description"));
    }
    PdfArray kids = element.getAsArray(PdfName.K);
    if (kids == null) {
        return;
    }
    for (int i = 0; i < kids.size(); i++) {
        manipulate(kids.getAsDictionary(i));
    }
}

You can easily port this Java example to C#:

Get the root dictionary from the PdfDocument object,
Get the root of the structure tree (a dictionary),
Loop over all the kids of every branch of that tree,
When a lead is a figure, add an /Alt entry.

Click this link if you want to see how to answer this question in iText 5.

Category:

Tags:

iText Version:

↧

Tagged PDF: Adding Alt to the Structure Tree

May 31, 2016, 2:17 am

≫ Next: Creating a simple PDF/UA document

≪ Previous: How to add alternative text for an image in Tagged PDF?

Files:

Getting started (iText 7)

/*
 
    This file is part of the iText (R) project.
    Copyright (c) 1998-2016 iText Group NV
 
*/ 
/*
 * This example was written in answer to the following question:
 * http://stackoverflow.com/questions/34036200
 */packagecom.itextpdf.samples.sandbox.pdfa; 
importcom.itextpdf.kernel.pdf.*;importcom.itextpdf.samples.GenericTest;importcom.itextpdf.test.annotations.type.SampleTest;importorg.junit.experimental.categories.Category; 
importjava.io.File;importjava.io.IOException; 
@Category(SampleTest.class)publicclass AddAltTags extends GenericTest {publicstaticfinalString DEST ="./target/test/resources/sandbox/pdfa/add_alt_tags.pdf";publicstaticfinalString SRC ="./src/test/resources/pdfs/no_alt_attribute.pdf"; 
    publicstaticvoid main(String[] args)throwsException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new AddAltTags().manipulatePdf(DEST);} 
    publicvoid manipulatePdf(String dest)throwsIOException{
        PdfDocument pdfDoc =new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));
        PdfDictionary catalog = pdfDoc.getCatalog().getPdfObject();
        PdfDictionary structTreeRoot = catalog.getAsDictionary(PdfName.StructTreeRoot);
        manipulate(structTreeRoot);
        pdfDoc.close();} 
    publicvoid manipulate(PdfDictionary element){if(element ==null){return;}if(PdfName.Figure.equals(element.get(PdfName.S))){
            element.put(PdfName.Alt, new PdfString("Figure without an Alt description"));}
        PdfArray kids = element.getAsArray(PdfName.K);if(kids ==null){return;}for(int i =0; i < kids.size(); i++){
            manipulate(kids.getAsDictionary(i));}}}

File name	Raw URL	Updated
AddAltTags.java	AddAltTags.java	2016-08-04 2:02 pm

Resources:

File name	Raw URL	Updated
no_alt_attribute.pdf	no_alt_attribute.pdf	2016-08-04 2:03 pm

Results:

File name	Raw URL	Updated
cmp_add_alt_tags.pdf	cmp_add_alt_tags.pdf	2016-08-04 2:04 pm

Tags:

iText Version:

↧

Creating a simple PDF/UA document

May 31, 2016, 2:18 am

≫ Next: How can I generate a PDF/UA compatible PDF with iText?

≪ Previous: Tagged PDF: Adding Alt to the Structure Tree

This example was written in answer to the question How can I generate a PDF/UA compatible PDF with iText?

Files:

PdfUA.java

/*
 
    This file is part of the iText (R) project.
    Copyright (c) 1998-2016 iText Group NV
 
*/ 
/**
 * Example written by Bruno Lowagie in answer to:
 * http://stackoverflow.com/questions/28222277/how-can-i-generate-a-pdf-ua-compatible-pdf-with-itext
 */packagecom.itextpdf.samples.sandbox.pdfa; 
 
importcom.itextpdf.io.font.PdfEncodings;importcom.itextpdf.io.image.ImageDataFactory;importcom.itextpdf.kernel.font.PdfFont;importcom.itextpdf.kernel.font.PdfFontFactory;importcom.itextpdf.kernel.geom.PageSize;importcom.itextpdf.kernel.pdf.*;importcom.itextpdf.kernel.xmp.XMPException;importcom.itextpdf.layout.Document;importcom.itextpdf.layout.element.*;importcom.itextpdf.samples.GenericTest;importcom.itextpdf.test.annotations.type.SampleTest;importorg.junit.experimental.categories.Category; 
importjava.io.File;importjava.io.IOException; 
@Category(SampleTest.class)publicclass PdfUA extends GenericTest {publicstaticfinalString DEST ="./target/test/resources/sandbox/pdfa/pdf_ua.pdf";publicstaticfinalString DOG ="./src/test/resources/img/dog.bmp";publicstaticfinalString FONT ="./src/test/resources/font/FreeSans.ttf";publicstaticfinalString FOX ="./src/test/resources/img/fox.bmp"; 
    publicstaticvoid main(String[] args)throwsException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new PdfUA().manipulatePdf(DEST);} 
    publicvoid manipulatePdf(String dest)throwsIOException, XMPException {
        PdfDocument pdfDoc =new PdfDocument(new PdfWriter(dest, new WriterProperties().setPdfVersion(PdfVersion.PDF_1_7)));Document document =newDocument(pdfDoc, new PageSize(PageSize.A4).rotate());//TAGGED PDF//Make document tagged
        pdfDoc.setTagged();//===============//PDF/UA//Set document metadata 
        pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
        pdfDoc.getCatalog().setLang(new PdfString("en-US"));
        PdfDocumentInfo info = pdfDoc.getDocumentInfo();
        info.setTitle("English pangram");//===================== 
        Paragraph p =new Paragraph();//PDF/UA//Embed font
        PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
        p.setFont(font);//==================
        Text c =new Text("The quick brown ");
        p.add(c);Image i =newImage(ImageDataFactory.create(FOX));//PDF/UA//Set alt text
        i.getAccessibilityProperties().setAlternateDescription("Fox");//==============
        p.add(i);
        p.add(" jumps over the lazy ");
        i =newImage(ImageDataFactory.create(DOG));//PDF/UA//Set alt text
        i.getAccessibilityProperties().setAlternateDescription("Dog");//==================
        p.add(i);
        document.add(p);
        p =new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n").setFont(font).setFontSize(20);
        document.add(p);List list =newList();
        list.add((ListItem)new ListItem("quick").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("brown").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("fox").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("jumps").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("over").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("the").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("lazy").setFont(font).setFontSize(20));
        list.add((ListItem)new ListItem("dog").setFont(font).setFontSize(20));
        document.add(list);
        document.close();}}

File name	Raw URL	Updated
PdfUA.java	PdfUA.java	2016-08-11 10:07 am

Results:

File name	Raw URL	Updated
cmp_pdf_ua.pdf	cmp_pdf_ua.pdf	2016-08-11 10:10 am

Tags:

iText Version:

↧

How can I generate a PDF/UA compatible PDF with iText?

May 31, 2016, 2:18 am

≫ Next: Chapter 7: Creating PDF/UA and PDF/A documents

≪ Previous: Creating a simple PDF/UA document

Posted on StackOverflow on Jan 29, 2015 by k-den

public void manipulatePdf(String dest) throws IOException, XMPException {
    PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest, new WriterProperties().setPdfVersion(PdfVersion.PDF_1_7)));
    Document document = new Document(pdfDoc, new PageSize(PageSize.A4).rotate());
    //TAGGED PDF
    //Make document tagged
    pdfDoc.setTagged();
    //===============
    //PDF/UA
    //Set document metadata

    pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
    pdfDoc.getCatalog().setLang(new PdfString("en-US"));
    PdfDocumentInfo info = pdfDoc.getDocumentInfo();
    info.setTitle("English pangram");
    //=====================

    Paragraph p = new Paragraph();
    //PDF/UA
    //Embed font
    PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
    p.setFont(font);
    //==================
    Text c = new Text("The quick brown ");
    p.add(c);
    Image i = new Image(ImageDataFactory.create(FOX));
    //PDF/UA
    //Set alt text
    i.getAccessibilityProperties().setAlternateDescription("Fox");
    //==============
    p.add(i);
    p.add(" jumps over the lazy ");
    i = new Image(ImageDataFactory.create(DOG));
    //PDF/UA
    //Set alt text
    i.getAccessibilityProperties().setAlternateDescription("Dog");
    //==================
    p.add(i);
    document.add(p);
    p = new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n").setFont(font).setFontSize(20);
    document.add(p);
    List list = new List();
    list.add((ListItem) new ListItem("quick").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("brown").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("fox").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("jumps").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("over").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("the").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("lazy").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("dog").setFont(font).setFontSize(20));
    document.add(list);
    document.close();
}

Click this link if you want to see how to answer this question in iText 5.

Category:

Tags:

iText Version:

↧

Chapter 7: Creating PDF/UA and PDF/A documents

April 9, 2016, 9:48 am

≫ Next: ZUGFeRD Tutorial: examples for chapter 2

≪ Previous: How can I generate a PDF/UA compatible PDF with iText?

In chapter 1 to 4, we've created PDF documents using iText 7. In chapters 5 and 6, we've manipulated and reused existing PDF documents. All the PDFs we dealt with in those chapters were PDF documents that complied to ISO 32000, which is the core standard for PDF. ISO 32000 isn't the only ISO standard for PDF, there are many different sub-standards that were created for specific reasons. In this chapter, we'll highlight two:

ISO 14289 is better known as PDF/UA. UA stands for Universal Accessibility. PDFs that comply with the PDF/UA standard can be consumed by anyone, including people who are blind or visually impaired.
ISO 19005 is better known as PDF/A. A stands for Archiving. The goal of this standard is the long-term preservation of digital documents.

In this chapter, we'll learn more about PDF/A and PDF/UA by creating a series of PDF/A and PDF/UA files.

Creating accessible PDF documents

Before we start with a PDF/UA example, let's take a closer look at the problem we want to solve. In chapter 1, we created a document that included images. In the sentence "Quick brown fox jumps over the lazy dog", we replaced the words "fox" and "dog" by images representing a fox and a dog. When this file is read out loud, a machine doesn't know that the first image represents a fox and that the second image represents a dog, hence the file will be read as "Quick brown jumps over the lazy."

In an ordinary PDF, content is painted to a canvas. We might use high-level objects such as List and Table, but once the PDF is created, there is no structure left. A list is a sequence of lines and a text snippet in a list item doesn't know that it's part of a list. A table is just a bunch of lines and text added at absolute positions on a page. A text snippet in a table doesn't know it belongs to a cell in a specific column and a specific row.

Unless we make the PDF a tagged PDF, the document doesn't contain any semantic structure. When there's no semantic structure, the PDF isn't accessible. To be accessible, the document needs to be able to distinguish which part of a page is actual content, and which part is an artifact that isn't part of the actual content (e.g. a header, a page number). A line of text needs to know if its a title, if it's part of a paragraph, and so on. We can add all of this information to the page, by creating a structure tree and by defining content as marked content. This sounds complex, but if you use iText 7's high-level objects, it's sufficient to introduce the method setTagged(). By defining a PdfDocument as a tagged document, the structure we introduce by using objects such as List, Table, Paragraph, will be reflected in the Tagged PDF.

This is only one requirement to make a PDF accessible. The QuickBrownFox_PDFUA example will help us understand the other requirements.

PdfDocument pdf =new PdfDocument(new PdfWriter(dest),new WriterProperties().addXmpMetadata()));
Document document =newDocument(pdf);
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/UA example");
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p =new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage =newImage(ImageFactory.getImage(FOX));
//PDF/UA: Set alt text
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage =newImage(ImageFactory.getImage(DOG));
//PDF/UA: Set alt text
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();

We create a PdfDocument and a Document, but this time we tell the 'PdfWriter' to automatically add XMP metadata using the 'addXmpMetadata()' method of 'WriterProperties'. In PDF/UA, it is mandatory to have the same metadata stored in the PDF as XML. This XML may not be compressed. Processors that don't "understand" PDF must be able to detect this XMP metadata and process it. An XMP stream is created automatically based on the entries in the Info dictionary. This Info dictionary is a PDF Object that includes such data as the title of the document. In addition to this requirement, we make sure that we comply to PDF by introducing some extra features:

We tell the PdfDocument that we're going to create Tagged PDF (line 4),
We add a language specifier. In our case, the document knows that the main language used in this document is American English (line 5).
We change the viewer preferences so that the title of the document is always displayed in the top bar of the PDF viewer (line 6-7). Obviously, this implies that we add a title to the metadata of the document (line 8-9).
All fonts need to be embedded (line 11). There are some other requirements relating to fonts, but it would lead us too far right now to discuss these in detail.
All the content needs to be tagged. When an image is encountered, we need to provide a description of that image using alt text (line 17 and line 22).

We have now created a PDF/UA document. When we look at the resulting page in Figure 7.1, we don't see much difference, but if we open the Tags panel, we see that the document has a specific structure.

Figure 7.1: a PDF/UA document and its structure

We see that the <Document> consists of a <P>aragraph that is composed of four parts, two <Span>s and two <Figures>s. We'll create a more complex PDF/UA document later in this chapter, but let's take a look at what makes PDF/A special first.

Creating PDFs for long-term preservation, part 1

Part 1 of ISO 19005 was released in 2005. It was defined as a subset of version 1.4 of Adobe's PDF specification (which, at that time, wasn't an ISO standard yet). ISO 19005-1 introduced a series of obligations and restrictions:

The document needs to be self-contained: all fonts need to be embedded; external movie, sound or other binary files are not allowed.
The document needs to contain metadata in the eXtensible Metadata Platform (XMP) format: ISO 16684 (XMP) describes how to embed XML metadata into a binary file, so that software that doesn't know how to interpret the binary data format can still extract the file's metadata.
Functionality that isn't future-proof isn't allowed: the PDF can't contain any JavaScript and may not be encrypted.

ISO 19005-1:2005 (PDF/A-1) defined two conformance levels:

Level B ("basic"): ensures that the visual appearance of a document will be preserved for the long term.
Level A ("accessible"): ensures that the visual appearance of a document will be preserved for the long term, but also introduces structural and semantic properties. The PDF needs to be a Tagged PDF.

The QuickBrownFox_PDFA_1b example shows how we can create a "Quick brown fox" PDF that complies to PDF/A-1b.

//Initialize PDFA document with output intent
PdfADocument pdf =new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1B,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", newFileInputStream(INTENT)));
Document document =newDocument(pdf);
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p =new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage =newImage(ImageFactory.getImage(FOX));
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage =newImage(ImageFactory.getImage(DOG));
p.add(dogImage);
document.add(p);
document.close();

The first thing that jumps to the eye, is that we are no longer using a PdfDocument instance. Instead, we create a PdfADocument instance. The PdfADocument constructor needs a PdfWriter as its first parameter, but also a conformance level (in this case PdfAConformanceLevel.PDF_A_1B) and a PdfOutputIntent. This output intent tells the document how to interpret the colors that will be used in the document. In line 10, we make sure that the font we're using is embedded.

Looking at the PDF shown in Figure 7.2, we see a blue ribbon with the text "This file claims compliance with the PDF/A standard and has been opened read-only to prevent modification." Allow me to explain two things about this sentence:

This doesn't mean that the PDF is, in effect, compliant with the PDF/A standard. It only claims it is. To be sure, you need to open the Standards panel in Adobe Acrobat. When you click on the "Verify Conformance" link, Acrobat will verify if the document is what it claims to be. In this case, we read "Status: verification succeeded"; we have successfully created a document complying with PDF/A-1B.
The document has been opened read-only, not because you are not allowed to modify it (PDF/A is not a way to protect a PDF against modification), but Adobe Acrobat presents it as read-only because any modification might change the PDF into a PDF that is no longer compliant to the PDF/A standard. It's not trivial to update a PDF/A without breaking its PDF/A status.

Let's adapt our example, and create a PDF/A-1 level A document with the QuickBrownFox_PDFA_1a example.

//Initialize PDFA document with output intent
PdfADocument pdf =new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1A,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", newFileInputStream(INTENT)));
Document document =newDocument(pdf);
//Setting some required parameters
pdf.setTagged();
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p =new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage =newImage(ImageFactory.getImage(FOX));
//Set alt text
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage =newImage(ImageFactory.getImage(DOG));
//Set alt text
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();

We've changed PdfAConformanceLevel.PDF_A_1B into PdfAConformanceLevel.PDF_A_1A in line 3. We've made the PdfADocument a Tagged PDF (line 8) and we've added some alt text for the images. Figure 7.3 is somewhat confusing.

When we look at the Standards panel, we see that the document thinks it conforms to PDF/A-1A and to PDF/UA-1. We don't have a "Verify Conformance" link, so we have to use Preflight. Preflight informs us that there were "No problems found" when executing the "Verify compliance with PDF/A-1a" profile. We can't verify the PDF/UA compliance because PDF/UA involves some requirements that can't be verified by a machine. For instance: a machine wouldn't notice if we switched the description of the image of the fox with the description of the image of the dog. That would make the document inaccessible as the document would spread false information to people depending on screen-readers. In any case, we know that our document doesn't comply to the PDF/UA standard because we omitted a number of essential elements (such as the language).

From the start, it was determined that approved parts of ISO 19005 could never become invalid. New, subsequent parts would only define new, useful features. That's what happened when part 2 and part 3 were created.

Creating PDFs for long-term preservation, part 2 and 3

ISO 19005-2:2011 (PDF/A-2) was introduced to have a PDF/A standard that was based on the ISO standard (ISO 32000-1) instead of on Adobe's PDF specification. PDF/A-2 also adds a handful of features that were introduced in PDF 1.5, 1.6 and 1.7:

Useful additions include: support for JPEG2000, Collections, object-level XMP, and optional content.
Useful improvements include: better support for transparency, comment types and annotations, and digital signatures.

PDF/A-2 also defines an extra level besides Level A and Level B:

Level U ("Unicode"): ensures that the visual appearance of a document will be preserved for the long term, and that all text is stored in UNICODE.

ISO 19005-3:2012 (PDF/A-3) was an almost identical copy of PDF/A-2. There was only one difference with PDF/A-2: in PDF/A-3, attachments don't need to be PDF/A. You can attach any file to a PDF/A-3, for instance: an XLS file containing calculations of which the results are used in the document, the original Word document that was used to create the PDF document, and so on. The document itself needs to conform to all the obligations and restrictions of the PDF/A specification, but these obligations and restrictions do not apply to its attachments.

In the UnitedStates_PDFA_3a example, we'll create a document that complies with PDF/UA as well as with PDF/A-3A. We choose PDF/A3, because we're going to add the CSV file that was used as the source for creating the PDF.

PdfADocument pdf =new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_3A,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", newFileInputStream(INTENT)));
Document document =newDocument(pdf, PageSize.A4.rotate());
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/A-3 example");
//Add attachment
PdfDictionary parameters =new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(
    pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv",
"united_states.csv", new PdfName("text/csv"), parameters,
    PdfName.Data, false);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdf.addFileAttachment("united_states.csv", fileSpec);
PdfArray array =new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdf.getCatalog().put(new PdfName("AF"), array);
//Embed fonts
PdfFont font = PdfFontFactory.createFont(FONT, true);
PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);
// Create content
Table table =new Table(newfloat[]{4, 1, 3, 4, 3, 3, 3, 3, 1});
table.setWidthPercent(100);
BufferedReader br =newBufferedReader(newFileReader(DATA));
String line = br.readLine();
process(table, line, bold, true);
while((line = br.readLine())!=null){
    process(table, line, font, false);
}
br.close();
document.add(table);
//Close document
document.close();

Let's examine the different parts of this example.

Line 1-5: We create a PdfADocument (PdfAConformanceLevel.PDF_A_3A) and a Document.
Line 7: Making the PDF a Tagged PDF is a requirement for PDF/UA as well as for PDF/A-3A.
Line 8-12: Setting the language, the document title and the viewer preference to display the title is a requirement for PDF/UA.
Line 14-20: We add a file attachment using specific parameters that are required for PDF/A-3A.
Line 26-27: We embed the fonts which is a requirement for PDF/UA as well as for PDF/A.
Line 28-38: We've seen this code before in the UnitedStates example in chapter 1 (including the process() method).
Line 40: We close the document.

Figure 7.4 demonstrates how using the Table class with Cell objects added as header cells, and Cell objects added as normal cells, resulted in a structure tree that makes the PDF document accessible.

When we open the Attachments panel as shown in Figure 7.5, we see our original united_states.csv file that we can easily extract from the PDF.

Figure 7.5: a PDF/A-3 level A document and its attachment

The examples in this chapter taught us that PDF/UA or PDF/A documents involve extra requirements when compared to ordinary PDFs. "Can we use iText to convert an existing PDF to a PDF/UA or PDF/A document" is a question that is posted frequently on mailing-lists or user forums. I hope that this chapter explains that iText can't do this automatically.

If you have a document that has a picture of a fox and a dog, iText can't add any missing alt text for those images, because iText can't see that fox nor that dog. iText only sees pixels, it can't interpret the image.
If you are using a font that isn't embedded, iText doesn't know what that font looks like. If you don't provide the corresponding font program, iText can never embed that font.

These are only two examples of many that explain why converting an ordinary PDF to PDF/A or PDF/UA isn't trivial. It's very easy to change the PDF so that it shows a blue bar saying that the document complies to PDF/A, but that doesn't many that claim is true.

We also need to pay attention when we merge existing PDF/A documents.

Merging PDF/A documents

When merging PDF/A documents, it's very important that every single document that you are adding to PdfMerger is already a PDF/A document. You can't mix PDF/A documents and ordinary PDF documents into one single PDF and hope the result will be a PDF/A document. The same is true for mixing a PDF/A level A document with a PDF/A level B document. One has a structure tree, the other hasn't; you can't expect the resulting PDF to be a PDF/A level A document.

Figure 7.6 shows how we merged the two PDF/A level A documents we created in the previous sections.

Figure 7.6: merging 2 PDF/A level A documents

When we look at the structure of the tags, we see that the <P>aragraph is now followed by a <Table>. The MergePDFADocuments shows how it's done.

PdfADocument pdf =new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1A,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", newFileInputStream(INTENT)));
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/A-1a example");
//Create PdfMerger instance
PdfMerger merger =new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf =new PdfDocument(new PdfReader(SRC1));
merger.addPages(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf =new PdfDocument(new PdfReader(SRC2));
merger.addPages(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
//Merge
merger.merge();
//Close the documents
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();

This example is assembled using parts of two examples we've already seen before:

Lines 1 to 11 are almost identical to the first part of the UnitedStates_PDFA_3a example we've used in the previous section, except that we now use PdfAConformanceLevel.PDF_A_1A and that we don't need a Document object.
Lines 12 to 25 are identical to the last part of the 88th_Oscar_Combine example of the previous chapter. Note that we use a PdfDocument instance instead of a PdfADocument; the PdfADocument will check if the source documents comply.

There's a lot more to be said about PDF/UA and PDF/A, and even about other sub-standards. For instance: there's a German standard for invoicing called ZUGFeRD that is built on top of PDF/A-3, but let's save that for another tutorial.

Summary

In this chapter, we've discovered that there's more to PDF than meets the eye. We've learned how to introduce structure into our documents so that they are accessible for the blind and the visually impaired. We've also made sure that our PDFs were self-contained, for instance by embedding fonts, so that our documents can be archived for the long term.

We'll need several other tutorials to cover the functionality covered in this tutorial in more depth, but these seven chapters should already give you a good impression of what you can do with iText 7.

Tags:

Java

PDF/A

Getting Started (iText 5)

↧

ZUGFeRD Tutorial: examples for chapter 2

August 27, 2015, 3:03 am

≫ Next: How can I generate a PDF/UA compatible PDF with iText?

≪ Previous: Chapter 7: Creating PDF/UA and PDF/A documents

These are some examples that were written in the context of Chapter 2 of the tutorial ZUGFeRD: The Future of Invoicing.

C2E1_SimplePdf creates a simple "Quick brown fox jumps over the lazy dog" PDF with some images, but without any structure. This results in a regular PDF.
C2E2_TaggedPdf.java uses the same code as the first example, but now we ask iText to introduce structure. This results in a Tagged PDF.
C2E3_PdfA3b.java adapts the first example, so that it conforms to the PDF/A-3 standard, level B (for Basic). The resulting PDF is not a Tagged PDF.
C2E4_PdfA3a.java adapts the third example, so that it conforms to the PDF/A-3 standard, level A (for Accessibility). The resulting PDF is a Tagged PDF.

Files:

C2E1_SimplePdf.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.PdfWriter; importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; importsandbox.WrapToTest; /**
 * Creates a simple PDF with images and text.
 */
@WrapToTestpublicclass C2E1_SimplePdf { /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox1.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp"; /**
     * Creates a simple PDF with images and text.
     * @param args no arguments needed.
     * @throws IOException
     * @throws DocumentException
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E1_SimplePdf().createPdf(DEST);} /**
     * Creates a simple PDF with images and text
     * @param dest  the resulting PDF
     * @throws IOException
     * @throws DocumentException
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);
        document.open();Paragraph p =newParagraph();
        p.setFont(newFont(Font.FontFamily.HELVETICA, 20));Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p);
        document.close();} }

C2E2_TaggedPdf.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.PdfWriter; importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; /**
 * Creates a Tagged PDF with images and text.
 */publicclass C2E2_TaggedPdf { /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox2.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp"; /**
     * Creates a tagged PDF with images and text.
     * @param args  no arguments needed
     * @throws IOException
     * @throws DocumentException
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E2_TaggedPdf().createPdf(DEST);} /**
     * Creates a tagged PDF with images and text.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//TAGGED PDF//Make document tagged
        writer.setTagged();//==========
        document.open(); Paragraph p =newParagraph();
        p.setFont(newFont(Font.FontFamily.HELVETICA, 20));Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p); 
        document.close();} }

C2E3_PdfA3b.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.ICC_Profile;importcom.itextpdf.text.pdf.PdfAConformanceLevel;importcom.itextpdf.text.pdf.PdfAWriter;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; importjava.io.File;importjava.io.FileInputStream;importjava.io.FileOutputStream;importjava.io.IOException; /**
 * Creates a PDF that conforms with PDF/A-3 Level B.
 */
@WrapToTestpublicclass C2E3_PdfA3b { /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox3.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A path to a color profile. */publicstaticfinalString ICC ="resources/data/sRGB_CS_profile.icm";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param args  No arguments needed
     * @throws IOException
     * @throws DocumentException
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E3_PdfA3b().createPdf(DEST);} /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());//PDF/A-3b//Create PdfAWriter with the required conformance levelPdfAWriter writer =PdfAWriter.getInstance(document, newFileOutputStream(dest), PdfAConformanceLevel.PDF_A_3B);
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//Create XMP metadata
        writer.createXmpMetadata();//====================
        document.open();//PDF/A-3b//Set output intentsICC_Profile icc =ICC_Profile.getInstance(newFileInputStream(ICC));
        writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);//=================== Paragraph p =newParagraph();//PDF/A-3b//Embed font
        p.setFont(FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20));//=============Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);
        p.add(c);
        c =newChunk(" jumps over the lazy ");
        p.add(c);
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);
        p.add(c);
        document.add(p); 
        document.close();} }

C2E4_PdfA3a.java

/*
 * This code sample was written in the context of the tutorial:
 * ZUGFeRD: The future of Invoicing
 */packagezugferd.pdfa; importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.ICC_Profile;importcom.itextpdf.text.pdf.PdfAConformanceLevel;importcom.itextpdf.text.pdf.PdfAWriter;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfString;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; importjava.io.File;importjava.io.FileInputStream;importjava.io.FileOutputStream;importjava.io.IOException; /**
 * Creates a PDF that conforms with PDF/A-3 Level A.
 */
@WrapToTestpublicclass C2E4_PdfA3a { /** The resulting PDF. */publicstaticfinalString DEST ="results/zugferd/pdfa/quickbrownfox4.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A path to a color profile. */publicstaticfinalString ICC ="resources/data/sRGB_CS_profile.icm";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; /**
     * Creates a PDF that conforms with PDF/A-3 Level A.
     * @param args  no arguments needed
     * @throws IOException
     * @throws DocumentException
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new C2E4_PdfA3a().createPdf(DEST);} /**
     * Creates a PDF that conforms with PDF/A-3 Level B.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());//PDF/A-3a//Create PdfAWriter with the required conformance levelPdfAWriter writer =PdfAWriter.getInstance(document, newFileOutputStream(dest), PdfAConformanceLevel.PDF_A_3A);
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//====================//TAGGED PDF//Make document tagged
        writer.setTagged();//===============//PDF/UA//Set document metadata
        writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
        document.addLanguage("en-US");
        document.addTitle("Some title");
        writer.createXmpMetadata();//=====================
        document.open();//PDF/A-3b//Set output intentsICC_Profile icc =ICC_Profile.getInstance(newFileInputStream(ICC));
        writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);//=================== Paragraph p =newParagraph();//PDF/UA//Embed font
        p.setFont(FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20));//==================Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Fox"));//==============
        p.add(c);
        p.add(newChunk(" jumps over the lazy "));
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Dog"));//==================
        p.add(c);
        document.add(p); 
        document.close();} }

File name	Raw URL	Updated
C2E1_SimplePdf.java	C2E1_SimplePdf.java	2015-08-27 12:02 pm
C2E2_TaggedPdf.java	C2E2_TaggedPdf.java	2015-08-27 12:02 pm
C2E3_PdfA3b.java	C2E3_PdfA3b.java	2015-08-27 12:02 pm
C2E4_PdfA3a.java	C2E4_PdfA3a.java	2015-08-27 12:02 pm

Resources:

File name	Raw URL	Updated
dog.bmp	dog.bmp	2015-08-27 12:05 pm
fox.bmp	fox.bmp	2015-08-27 12:05 pm
sRGB_CS_profile.icm	sRGB_CS_profile.icm	2015-08-27 12:06 pm
FreeSans.ttf	FreeSans.ttf	2015-08-27 12:07 pm

Results:

File name	Raw URL	Updated
cmp_quickbrownfox1.pdf	cmp_quickbrownfox1.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox2.pdf	cmp_quickbrownfox2.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox3.pdf	cmp_quickbrownfox3.pdf	2015-08-27 12:19 pm
cmp_quickbrownfox4.pdf	cmp_quickbrownfox4.pdf	2015-08-27 12:19 pm

Tags:

iText Version:

↧

How can I generate a PDF/UA compatible PDF with iText?

October 11, 2015, 1:36 am

≫ Next: Creating a simple PDF/UA document

≪ Previous: ZUGFeRD Tutorial: examples for chapter 2

Posted on StackOverflow on Jan 29, 2015 by k-den

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document(PageSize.A4.rotate());
    PdfWriter writer =
        PdfWriter.getInstance(document, new FileOutputStream(dest));
    writer.setPdfVersion(PdfWriter.VERSION_1_7);
    //TAGGED PDF
    //Make document tagged
    writer.setTagged();
    //===============
    //PDF/UA
    //Set document metadata
    writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
    document.addLanguage("en-US");
    document.addTitle("English pangram");
    writer.createXmpMetadata();
    //=====================
    document.open();
    Paragraph p = new Paragraph();
    //PDF/UA
    //Embed font
    Font font =
        FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20);
    p.setFont(font);
    //==================
    Chunk c = new Chunk("The quick brown ");
    p.add(c);
    Image i = Image.getInstance(FOX);
    c = new Chunk(i, 0, -24);
    //PDF/UA
    //Set alt text
    c.setAccessibleAttribute(PdfName.ALT, new PdfString("Fox"));
    //==============
    p.add(c);
    p.add(new Chunk(" jumps over the lazy "));
    i = Image.getInstance(DOG);
    c = new Chunk(i, 0, -24);
    //PDF/UA
    //Set alt text
    c.setAccessibleAttribute(PdfName.ALT, new PdfString("Dog"));
    //==================
    p.add(c);
    document.add(p);
    p = new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n", font);
    document.add(p);
    List list = new List(true);
    list.add(new ListItem("quick", font));
    list.add(new ListItem("brown", font));
    list.add(new ListItem("fox", font));
    list.add(new ListItem("jumps", font));
    list.add(new ListItem("over", font));
    list.add(new ListItem("the", font));
    list.add(new ListItem("lazy", font));
    list.add(new ListItem("dog", font));
    document.add(list);
    document.close();
}

Category:

Tags:

iText Version:

↧

Creating a simple PDF/UA document

October 11, 2015, 7:03 am

≫ Next: Tagged PDF: Adding Alt to the Structure Tree

≪ Previous: How can I generate a PDF/UA compatible PDF with iText?

This example was written in answer to the question How can I generate a PDF/UA compatible PDF with iText?

Files:

PdfUA.java

/**
 * Example written by Bruno Lowagie in answer to:
 * http://stackoverflow.com/questions/28222277/how-can-i-generate-a-pdf-ua-compatible-pdf-with-itext
 */packagesandbox.pdfa; importcom.itextpdf.text.Chunk;importcom.itextpdf.text.Document;importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.Font;importcom.itextpdf.text.FontFactory;importcom.itextpdf.text.Image;importcom.itextpdf.text.List;importcom.itextpdf.text.ListItem;importcom.itextpdf.text.PageSize;importcom.itextpdf.text.Paragraph;importcom.itextpdf.text.pdf.BaseFont;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfString;importcom.itextpdf.text.pdf.PdfWriter;importsandbox.WrapToTest; importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException; /**
 * Creates an accessible PDF with images and text.
 */
@WrapToTestpublicclass PdfUA { /** The resulting PDF. */publicstaticfinalString DEST ="results/pdfa/pdfua.pdf";/** An image resource. */publicstaticfinalString FOX ="resources/images/fox.bmp";/** An image resource. */publicstaticfinalString DOG ="resources/images/dog.bmp";/** A font that will be embedded. */publicstaticfinalString FONT ="resources/fonts/FreeSans.ttf"; /**
     * Creates an accessible PDF with images and text.
     * @param args no arguments needed
     * @throws IOException
     * @throws DocumentException
     */staticpublicvoid main(String args[])throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new PdfUA().createPdf(DEST);} /**
     * Creates an accessible PDF with images and text.
     * @param dest  the path to the resulting PDF
     * @throws IOException
     * @throws DocumentException
     */publicvoid createPdf(String dest)throwsIOException, DocumentException{Document document =newDocument(PageSize.A4.rotate());PdfWriter writer =PdfWriter.getInstance(document, newFileOutputStream(dest));
        writer.setPdfVersion(PdfWriter.VERSION_1_7);//TAGGED PDF//Make document tagged
        writer.setTagged();//===============//PDF/UA//Set document metadata
        writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
        document.addLanguage("en-US");
        document.addTitle("English pangram");
        writer.createXmpMetadata();//=====================
        document.open(); Paragraph p =newParagraph();//PDF/UA//Embed fontFont font =FontFactory.getFont(FONT, BaseFont.WINANSI, BaseFont.EMBEDDED, 20);
        p.setFont(font);//==================Chunk c =newChunk("The quick brown ");
        p.add(c);Image i =Image.getInstance(FOX);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Fox"));//==============
        p.add(c);
        p.add(newChunk(" jumps over the lazy "));
        i =Image.getInstance(DOG);
        c =newChunk(i, 0, -24);//PDF/UA//Set alt text
        c.setAccessibleAttribute(PdfName.ALT, newPdfString("Dog"));//==================
        p.add(c);
        document.add(p); 
        p =newParagraph("\n\n\n\n\n\n\n\n\n\n\n\n", font);
        document.add(p);List list =newList(true);
        list.add(newListItem("quick", font));
        list.add(newListItem("brown", font));
        list.add(newListItem("fox", font));
        list.add(newListItem("jumps", font));
        list.add(newListItem("over", font));
        list.add(newListItem("the", font));
        list.add(newListItem("lazy", font));
        list.add(newListItem("dog", font));
        document.add(list);
        document.close();} }

File name	Raw URL	Updated
PdfUA.java	PdfUA.java	2015-10-11 4:03 pm

Results:

File name	Raw URL	Updated
cmp_pdfua.pdf	cmp_pdfua.pdf	2015-10-11 4:04 pm

Tags:

iText Version:

↧

Tagged PDF: Adding Alt to the Structure Tree

December 2, 2015, 7:51 am

≫ Next: How to add alternative text for an image in Tagged PDF?

≪ Previous: Creating a simple PDF/UA document

Files:

Manipulating existing PDFs (iText 5)

/*
 * This example was written in answer to the following question:
 * http://stackoverflow.com/questions/34036200
 */packagesandbox.pdfa; importcom.itextpdf.text.DocumentException;importcom.itextpdf.text.pdf.PdfArray;importcom.itextpdf.text.pdf.PdfDictionary;importcom.itextpdf.text.pdf.PdfName;importcom.itextpdf.text.pdf.PdfReader;importcom.itextpdf.text.pdf.PdfStamper;importcom.itextpdf.text.pdf.PdfString;importjava.io.File;importjava.io.FileOutputStream;importjava.io.IOException;importsandbox.WrapToTest; 
@WrapToTestpublicclass AddAltTags { publicstaticfinalString SRC ="resources/pdfs/no_alt_attribute.pdf";publicstaticfinalString DEST ="results/pdfa/added_alt_attributes.pdf"; publicstaticvoid main(String[] args)throwsIOException, DocumentException{File file =newFile(DEST);
        file.getParentFile().mkdirs();new AddAltTags().manipulatePdf(SRC, DEST);} publicvoid manipulatePdf(String src, String dest)throwsIOException, DocumentException{PdfReader reader =newPdfReader(src);PdfDictionary catalog = reader.getCatalog();PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);
        manipulate(structTreeRoot);PdfStamper stamper =newPdfStamper(reader, newFileOutputStream(dest));
        stamper.close();} publicvoid manipulate(PdfDictionary element){if(element ==null)return;if(PdfName.FIGURE.equals(element.get(PdfName.S))){
            element.put(PdfName.ALT, newPdfString("Figure without an Alt description"));}PdfArray kids = element.getAsArray(PdfName.K);if(kids ==null)return;for(int i =0; i < kids.size(); i++)
            manipulate(kids.getAsDict(i));}}

File name	Raw URL	Updated
AddAltTags.java	AddAltTags.java	2015-12-02 4:51 pm

Resources:

File name	Raw URL	Updated
no_alt_attribute.pdf	no_alt_attribute.pdf	2015-12-02 4:55 pm

Results:

File name	Raw URL	Updated
cmp_added_alt_attributes.pdf	cmp_added_alt_attributes.pdf	2015-12-02 4:57 pm

Tags:

iText Version:

↧

How to add alternative text for an image in Tagged PDF?

December 2, 2015, 8:23 am

≫ Next: Chapter 7: Creating PDF/UA and PDF/A documents

≪ Previous: Tagged PDF: Adding Alt to the Structure Tree

Posted on StackOverflow on Dec 2, 2015 by tsforsure

Please take a look at the AddAltTags example.

In this example, we take a PDF with images of a fox and a dog where the Alt keys are missing: no_alt_attribute.pdf

Code can't recognize a fox or a dog, so we create a new document with Alt attributes saying "Figure without an Alt description": added_alt_attributes.pdf)

We add this description by walking through the structure tree, looking for structural elements marked as /Figure elements:

public void manipulatePdf(String src, String dest)
    throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfDictionary catalog = reader.getCatalog();
    PdfDictionary structTreeRoot =
        catalog.getAsDict(PdfName.STRUCTTREEROOT);
    manipulate(structTreeRoot);
    PdfStamper stamper = new PdfStamper(
        reader, new FileOutputStream(dest));
    stamper.close();
}

public void manipulate(PdfDictionary element) {
    if (element == null)
        return;
    if (PdfName.FIGURE.equals(element.get(PdfName.S))) {
        element.put(PdfName.ALT,
            new PdfString("Figure without an Alt description"));
    }
    PdfArray kids = element.getAsArray(PdfName.K);
    if (kids == null) return;
    for (int i = 0; i < kids.size(); i++)
        manipulate(kids.getAsDict(i));
}

You can easily port this Java example to C#:

Get the root dictionary from the PdfReader object,
Get the root of the structure tree (a dictionary),
Loop over all the kids of every branch of that tree,
When a lead is a figure, add an /Alt entry.

Once this is done, use PdfStamper to save the altered file.

Category:

Tags:

iText Version:

↧

Chapter 7: Creating PDF/UA and PDF/A documents

April 9, 2016, 9:48 am

≫ Next: How to add alternative text for an image in Tagged PDF?

≪ Previous: How to add alternative text for an image in Tagged PDF?

ISO 14289 is better known as PDF/UA. UA stands for Universal Accessibility. PDFs that comply with the PDF/UA standard can be consumed by anyone, including people who are blind or visually impaired.
ISO 19005 is better known as PDF/A. A stands for Archiving. The goal of this standard is the long-term preservation of digital documents.

In this chapter, we'll learn more about PDF/A and PDF/UA by creating a series of PDF/A and PDF/UA files.

Creating accessible PDF documents

This is only one requirement to make a PDF accessible. The QuickBrownFox_PDFUA example will help us understand the other requirements.

PdfDocument pdf = new PdfDocument(new PdfWriter(dest),new WriterProperties().addXmpMetadata()));
Document document = new Document(pdf);
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
        new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/UA example");
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
//PDF/UA: Set alt text
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
//PDF/UA: Set alt text
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();

We tell the PdfDocument that we're going to create Tagged PDF (line 4),
We add a language specifier. In our case, the document knows that the main language used in this document is American English (line 5).
We change the viewer preferences so that the title of the document is always displayed in the top bar of the PDF viewer (line 6-7). Obviously, this implies that we add a title to the metadata of the document (line 8-9).
All fonts need to be embedded (line 11). There are some other requirements relating to fonts, but it would lead us too far right now to discuss these in detail.
All the content needs to be tagged. When an image is encountered, we need to provide a description of that image using alt text (line 17 and line 22).

Creating PDFs for long-term preservation, part 1

The document needs to be self-contained: all fonts need to be embedded; external movie, sound or other binary files are not allowed.
The document needs to contain metadata in the eXtensible Metadata Platform (XMP) format: ISO 16684 (XMP) describes how to embed XML metadata into a binary file, so that software that doesn't know how to interpret the binary data format can still extract the file's metadata.
Functionality that isn't future-proof isn't allowed: the PDF can't contain any JavaScript and may not be encrypted.

ISO 19005-1:2005 (PDF/A-1) defined two conformance levels:

Level B ("basic"): ensures that the visual appearance of a document will be preserved for the long term.
Level A ("accessible"): ensures that the visual appearance of a document will be preserved for the long term, but also introduces structural and semantic properties. The PDF needs to be a Tagged PDF.

The QuickBrownFox_PDFA_1b example shows how we can create a "Quick brown fox" PDF that complies to PDF/A-1b.

//Initialize PDFA document with output intent
PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1B,
    new PdfOutputIntent("Custom", "", "http://www.color.org","sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf);
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
p.add(dogImage);
document.add(p);
document.close();

This doesn't mean that the PDF is, in effect, compliant with the PDF/A standard. It only claims it is. To be sure, you need to open the Standards panel in Adobe Acrobat. When you click on the "Verify Conformance" link, Acrobat will verify if the document is what it claims to be. In this case, we read "Status: verification succeeded"; we have successfully created a document complying with PDF/A-1B.
The document has been opened read-only, not because you are not allowed to modify it (PDF/A is not a way to protect a PDF against modification), but Adobe Acrobat presents it as read-only because any modification might change the PDF into a PDF that is no longer compliant to the PDF/A standard. It's not trivial to update a PDF/A without breaking its PDF/A status.

Let's adapt our example, and create a PDF/A-1 level A document with the QuickBrownFox_PDFA_1a example.

//Initialize PDFA document with output intent
PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1A,
    new PdfOutputIntent("Custom", "", "http://www.color.org","sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf);
//Setting some required parameters
pdf.setTagged();
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
//Set alt text
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
//Set alt text
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();

Creating PDFs for long-term preservation, part 2 and 3

Useful additions include: support for JPEG2000, Collections, object-level XMP, and optional content.
Useful improvements include: better support for transparency, comment types and annotations, and digital signatures.

PDF/A-2 also defines an extra level besides Level A and Level B:

Level U ("Unicode"): ensures that the visual appearance of a document will be preserved for the long term, and that all text is stored in UNICODE.

PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_3A,
    new PdfOutputIntent("Custom", "", "http://www.color.org","sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf, PageSize.A4.rotate());
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
        new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/A-3 example");
//Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(
    pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv","united_states.csv", new PdfName("text/csv"), parameters,
    PdfName.Data, false);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdf.addFileAttachment("united_states.csv", fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdf.getCatalog().put(new PdfName("AF"), array);
//Embed fonts
PdfFont font = PdfFontFactory.createFont(FONT, true);
PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);
// Create content
Table table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});
table.setWidthPercent(100);
BufferedReader br = new BufferedReader(new FileReader(DATA));
String line = br.readLine();
process(table, line, bold, true);
while ((line = br.readLine()) != null) {
    process(table, line, font, false);
}
br.close();
document.add(table);
//Close document
document.close();

Let's examine the different parts of this example.

Line 1-5: We create a PdfADocument (PdfAConformanceLevel.PDF_A_3A) and a Document.
Line 7: Making the PDF a Tagged PDF is a requirement for PDF/UA as well as for PDF/A-3A.
Line 8-12: Setting the language, the document title and the viewer preference to display the title is a requirement for PDF/UA.
Line 14-20: We add a file attachment using specific parameters that are required for PDF/A-3A.
Line 26-27: We embed the fonts which is a requirement for PDF/UA as well as for PDF/A.
Line 28-38: We've seen this code before in the UnitedStates example in chapter 1 (including the process() method).
Line 40: We close the document.

When we open the Attachments panel as shown in Figure 7.5, we see our original united_states.csv file that we can easily extract from the PDF.

If you have a document that has a picture of a fox and a dog, iText can't add any missing alt text for those images, because iText can't see that fox nor that dog. iText only sees pixels, it can't interpret the image.
If you are using a font that isn't embedded, iText doesn't know what that font looks like. If you don't provide the corresponding font program, iText can never embed that font.

We also need to pay attention when we merge existing PDF/A documents.

Merging PDF/A documents

Figure 7.6 shows how we merged the two PDF/A level A documents we created in the previous sections.

When we look at the structure of the tags, we see that the <P>aragraph is now followed by a <Table>. The MergePDFADocuments shows how it's done.

PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
    PdfAConformanceLevel.PDF_A_1A,
    new PdfOutputIntent("Custom", "", "http://www.color.org","sRGB IEC61966-2.1", new FileInputStream(INTENT)));
//Setting some required parameters
pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
        new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/A-1a example");
//Create PdfMerger instance
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.addPages(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.addPages(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
//Merge
merger.merge();
//Close the documents
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();

This example is assembled using parts of two examples we've already seen before:

Lines 1 to 11 are almost identical to the first part of the UnitedStates_PDFA_3a example we've used in the previous section, except that we now use PdfAConformanceLevel.PDF_A_1A and that we don't need a Document object.
Lines 12 to 25 are identical to the last part of the 88th_Oscar_Combine example of the previous chapter. Note that we use a PdfDocument instance instead of a PdfADocument; the PdfADocument will check if the source documents comply.

Summary

Tags:

Java

PDF/A

Manipulating existing PDFs (iText 7)

↧

How to add alternative text for an image in Tagged PDF?

May 31, 2016, 2:17 am

≫ Next: Tagged PDF: Adding Alt to the Structure Tree

≪ Previous: Chapter 7: Creating PDF/UA and PDF/A documents

Posted on StackOverflow on Dec 2, 2015 by tsforsure

Please take a look at the AddAltTags example.

In this example, we take a PDF with images of a fox and a dog where the Alt keys are missing: no_alt_attribute.pdf

Code can't recognize a fox or a dog, so we create a new document with Alt attributes saying "Figure without an Alt description": added_alt_attributes.pdf)

We add this description by walking through the structure tree, looking for structural elements marked as /Figure elements:

public void manipulatePdf(String src, String dest) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
    PdfDictionary catalog = pdfDoc.getCatalog().getPdfObject();
    PdfDictionary structTreeRoot = catalog.getAsDictionary(PdfName.StructTreeRoot);
    manipulate(structTreeRoot);
    pdfDoc.close();
}

public void manipulate(PdfDictionary element) {
    if (element == null) {
        return;
    }
    if (PdfName.Figure.equals(element.get(PdfName.S))) {
        element.put(PdfName.Alt, new PdfString("Figure without an Alt description"));
    }
    PdfArray kids = element.getAsArray(PdfName.K);
    if (kids == null) {
        return;
    }
    for (int i = 0; i < kids.size(); i++) {
        manipulate(kids.getAsDictionary(i));
    }
}

You can easily port this Java example to C#:

Get the root dictionary from the PdfDocument object,
Get the root of the structure tree (a dictionary),
Loop over all the kids of every branch of that tree,
When a lead is a figure, add an /Alt entry.

Click this link if you want to see how to answer this question in iText 5.

Category:

Tags:

iText Version:

↧

Tagged PDF: Adding Alt to the Structure Tree

May 31, 2016, 2:17 am

≫ Next: Creating a simple PDF/UA document

≪ Previous: How to add alternative text for an image in Tagged PDF?

Files: