Package org.apache.tika.parser.xml
Class XMLProfiler
java.lang.Object
org.apache.tika.parser.xml.XMLProfiler
- All Implemented Interfaces:
Serializable,org.apache.tika.parser.Parser
This parser enables profiling of XML. It captures the root entity as well as entity uris/namespaces and entity local names in parallel arrays.
This parser is not part of the default set of parsers and must be "turned on" via a tika config:
<properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"/> <parser class="org.apache.tika.parser.xml.XMLProfiler"/> </parsers> </properties>
This was initially designed to profile xmp and xfa in PDFs. Further work would need to be done to extract other types of xml and/or xmp in other file formats. Please open a ticket.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic org.apache.tika.metadata.Propertystatic org.apache.tika.metadata.Propertystatic org.apache.tika.metadata.Property -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType>getSupportedTypes(org.apache.tika.parser.ParseContext context) voidparse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
-
Field Details
-
ROOT_ENTITY
public static org.apache.tika.metadata.Property ROOT_ENTITY -
ENTITY_URIS
public static org.apache.tika.metadata.Property ENTITY_URIS -
ENTITY_LOCAL_NAMES
public static org.apache.tika.metadata.Property ENTITY_LOCAL_NAMES
-
-
Constructor Details
-
XMLProfiler
public XMLProfiler()
-
-
Method Details
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context) - Specified by:
getSupportedTypesin interfaceorg.apache.tika.parser.Parser
-
parse
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException - Specified by:
parsein interfaceorg.apache.tika.parser.Parser- Throws:
IOExceptionSAXExceptionorg.apache.tika.exception.TikaException
-