Class XMLProfiler

java.lang.Object
org.apache.tika.parser.xml.XMLProfiler
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class XMLProfiler extends Object implements org.apache.tika.parser.Parser

This parser enables profiling of XML. It captures the root entity as well as entity uris/namespaces and entity local names in parallel arrays.

This parser is not part of the default set of parsers and must be "turned on" via a tika config:

<properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"/> <parser class="org.apache.tika.parser.xml.XMLProfiler"/> </parsers> </properties>

This was initially designed to profile xmp and xfa in PDFs. Further work would need to be done to extract other types of xml and/or xmp in other file formats. Please open a ticket.

See Also:
  • Field Details

    • ROOT_ENTITY

      public static org.apache.tika.metadata.Property ROOT_ENTITY
    • ENTITY_URIS

      public static org.apache.tika.metadata.Property ENTITY_URIS
    • ENTITY_LOCAL_NAMES

      public static org.apache.tika.metadata.Property ENTITY_LOCAL_NAMES
  • Constructor Details

    • XMLProfiler

      public XMLProfiler()
  • Method Details

    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
      Specified by:
      getSupportedTypes in interface org.apache.tika.parser.Parser
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Specified by:
      parse in interface org.apache.tika.parser.Parser
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException