Class HadoopDataInputStream
- java.lang.Object
-
- java.io.InputStream
-
- org.apache.flink.core.fs.FSDataInputStream
-
- org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream
-
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.flink.core.fs.ByteBufferReadable
public final class HadoopDataInputStream extends org.apache.flink.core.fs.FSDataInputStream implements org.apache.flink.core.fs.ByteBufferReadableConcrete implementation of theFSDataInputStreamfor Hadoop's input streams. This supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).
-
-
Field Summary
Fields Modifier and Type Field Description static intMIN_SKIP_BYTESMinimum amount of bytes to skip forward before we issue a seek instead of discarding read.
-
Constructor Summary
Constructors Constructor Description HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)Creates a new data input stream from the given Hadoop input stream.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intavailable()voidclose()voidforceSeek(long seekPos)Positions the stream to the given location.org.apache.hadoop.fs.FSDataInputStreamgetHadoopInputStream()Gets the wrapped Hadoop input stream.longgetPos()intread()intread(byte[] buffer, int offset, int length)intread(long position, ByteBuffer byteBuffer)intread(ByteBuffer byteBuffer)voidseek(long seekPos)longskip(long n)voidskipFully(long bytes)Skips over a given amount of bytes in the stream.-
Methods inherited from class java.io.InputStream
mark, markSupported, nullInputStream, read, readAllBytes, readNBytes, readNBytes, reset, transferTo
-
-
-
-
Field Detail
-
MIN_SKIP_BYTES
public static final int MIN_SKIP_BYTES
Minimum amount of bytes to skip forward before we issue a seek instead of discarding read.The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.
The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.
- See Also:
- Constant Field Values
-
-
Method Detail
-
seek
public void seek(long seekPos) throws IOException- Specified by:
seekin classorg.apache.flink.core.fs.FSDataInputStream- Throws:
IOException
-
getPos
public long getPos() throws IOException- Specified by:
getPosin classorg.apache.flink.core.fs.FSDataInputStream- Throws:
IOException
-
read
public int read() throws IOException- Specified by:
readin classInputStream- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classInputStream- Throws:
IOException
-
read
public int read(@Nonnull byte[] buffer, int offset, int length) throws IOException- Overrides:
readin classInputStream- Throws:
IOException
-
available
public int available() throws IOException- Overrides:
availablein classInputStream- Throws:
IOException
-
skip
public long skip(long n) throws IOException- Overrides:
skipin classInputStream- Throws:
IOException
-
getHadoopInputStream
public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()
Gets the wrapped Hadoop input stream.- Returns:
- The wrapped Hadoop input stream.
-
forceSeek
public void forceSeek(long seekPos) throws IOExceptionPositions the stream to the given location. In contrast toseek(long), this method will always issue a "seek" command to the dfs and may not replace it byskip(long)for small seeks.Notice that the underlying DFS implementation can still decide to do skip instead of seek.
- Parameters:
seekPos- the position to seek to.- Throws:
IOException
-
skipFully
public void skipFully(long bytes) throws IOExceptionSkips over a given amount of bytes in the stream.- Parameters:
bytes- the number of bytes to skip.- Throws:
IOException
-
read
public int read(ByteBuffer byteBuffer) throws IOException
- Specified by:
readin interfaceorg.apache.flink.core.fs.ByteBufferReadable- Throws:
IOException
-
read
public int read(long position, ByteBuffer byteBuffer) throws IOException- Specified by:
readin interfaceorg.apache.flink.core.fs.ByteBufferReadable- Throws:
IOException
-
-