Class PackageParser

java.lang.Object
org.apache.tika.parser.AbstractEncodingDetectorParser
org.apache.tika.parser.pkg.PackageParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class PackageParser extends org.apache.tika.parser.AbstractEncodingDetectorParser
Parser for various packaging formats. Package entries will be written to the XHTML event stream as <div class="package-entry"> elements that contain the (optional) entry name as a <h1> element and the full structured body content of the parsed entry.

User must have JCE Unlimited Strength jars installed for encryption to work with 7Z files (see: COMPRESS-299 and TIKA-1521). If the jars are not installed, an IOException will be thrown, and potentially wrapped in a TikaException.

See Also:
  • Constructor Details

    • PackageParser

      public PackageParser()
    • PackageParser

      public PackageParser(org.apache.tika.detect.EncodingDetector encodingDetector)
  • Method Details

    • handleEntryMetadata

      protected static org.apache.tika.metadata.Metadata handleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException, IOException, org.apache.tika.exception.TikaException
      Throws:
      SAXException
      IOException
      org.apache.tika.exception.TikaException
    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException
    • setDetectCharsetsInEntryNames

      @Field public void setDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)
      Whether or not to run the default charset detector against entry names in ZipFiles. The default is true.
      Parameters:
      detectCharsetsInEntryNames -
    • isDetectCharsetsInEntryNames

      public boolean isDetectCharsetsInEntryNames()