Class XSSFExcelExtractorDecorator

java.lang.Object
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
All Implemented Interfaces:
OOXMLExtractor
Direct Known Subclasses:
XSSFBExcelExtractorDecorator

public class XSSFExcelExtractorDecorator extends AbstractOOXMLExtractor
  • Field Details

    • hfHelper

      protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelper
      Allows access to headers/footers from raw xml strings
    • formatter

      protected final org.apache.poi.ss.usermodel.DataFormatter formatter
    • sheetParts

      protected final List<org.apache.poi.openxml4j.opc.PackagePart> sheetParts
    • metadata

      protected org.apache.tika.metadata.Metadata metadata
    • parseContext

      protected org.apache.tika.parser.ParseContext parseContext
  • Constructor Details

    • XSSFExcelExtractorDecorator

      public XSSFExcelExtractorDecorator(org.apache.tika.parser.ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
  • Method Details

    • configureExtractor

      protected void configureExtractor(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor, Locale locale)
    • getXHTML

      public void getXHTML(ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws SAXException, org.apache.xmlbeans.XmlException, IOException, org.apache.tika.exception.TikaException
      Description copied from interface: OOXMLExtractor
      Parses the document into a sequence of XHTML SAX events sent to the given content handler.
      Specified by:
      getXHTML in interface OOXMLExtractor
      Overrides:
      getXHTML in class AbstractOOXMLExtractor
      Throws:
      SAXException
      org.apache.xmlbeans.XmlException
      IOException
      org.apache.tika.exception.TikaException
      See Also:
    • buildXHTML

      protected void buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException, org.apache.xmlbeans.XmlException, IOException
      Description copied from class: AbstractOOXMLExtractor
      Populates the XHTMLContentHandler object received as parameter.
      Specified by:
      buildXHTML in class AbstractOOXMLExtractor
      Throws:
      SAXException
      org.apache.xmlbeans.XmlException
      IOException
      See Also:
      • XSSFExcelExtractor.getText()
    • addDrawingHyperLinks

      protected void addDrawingHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart)
    • extractHyperLinks

      protected void extractHyperLinks(org.apache.poi.openxml4j.opc.PackagePart sheetPart, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException
      Throws:
      SAXException
    • extractHeaderFooter

      protected void extractHeaderFooter(String hf, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException
      Throws:
      SAXException
    • processShapes

      protected void processShapes(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException
      Throws:
      SAXException
    • processSheet

      public void processSheet(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler, org.apache.poi.xssf.model.Comments comments, org.apache.poi.xssf.model.StylesTable styles, org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException
      Throws:
      IOException
      SAXException
    • getMainDocumentParts

      protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts() throws org.apache.tika.exception.TikaException
      In Excel files, sheets have things embedded in them, and sheet drawings which have the images
      Specified by:
      getMainDocumentParts in class AbstractOOXMLExtractor
      Throws:
      org.apache.tika.exception.TikaException