Class StreamingFileSink<IN>
- java.lang.Object
-
- org.apache.flink.api.common.functions.AbstractRichFunction
-
- org.apache.flink.streaming.api.functions.sink.legacy.RichSinkFunction<IN>
-
- org.apache.flink.streaming.api.functions.sink.filesystem.legacy.StreamingFileSink<IN>
-
- Type Parameters:
IN- Type of the elements emitted by this sink
- All Implemented Interfaces:
Serializable,org.apache.flink.api.common.functions.Function,org.apache.flink.api.common.functions.RichFunction,org.apache.flink.api.common.state.CheckpointListener,org.apache.flink.streaming.api.checkpoint.CheckpointedFunction,org.apache.flink.streaming.api.functions.sink.legacy.SinkFunction<IN>
@Internal public class StreamingFileSink<IN> extends org.apache.flink.streaming.api.functions.sink.legacy.RichSinkFunction<IN> implements org.apache.flink.streaming.api.checkpoint.CheckpointedFunction, org.apache.flink.api.common.state.CheckpointListenerDeprecated.Useorg.apache.flink.connector.file.sink.FileSinkinstead.Sink that emits its input elements toFileSystemfiles within buckets. This is integrated with the checkpointing mechanism to provide exactly once semantics.When creating the sink a
basePathmust be specified. The base directory contains one directory for every bucket. The bucket directories themselves contain several part files, with at least one for each parallel subtask of the sink which is writing data to that bucket. These part files contain the actual output data.The sink uses a
BucketAssignerto determine in which bucket directory each element should be written to inside the base directory. TheBucketAssignercan, for example, use time or a property of the element to determine the bucket directory. The defaultBucketAssigneris aDateTimeBucketAssignerwhich will create one new bucket every hour. You can specify a customBucketAssignerusing thesetBucketAssigner(bucketAssigner)method, after callingforRowFormat(Path, Encoder)orforBulkFormat(Path, BulkWriter.Factory).The names of the part files could be defined using
OutputFileConfig. This configuration contains a part prefix and a part suffix that will be used with the parallel subtask index of the sink and a rolling counter to determine the file names. For example with a prefix "prefix" and a suffix ".ext", a file named"prefix-1-17.ext"contains the data fromsubtask 1of the sink and is the17thbucket created by that subtask.Part files roll based on the user-specified
RollingPolicy. By default, aDefaultRollingPolicyis used for row-encoded sink output; aOnCheckpointRollingPolicyis used for bulk-encoded sink output.In some scenarios, the open buckets are required to change based on time. In these cases, the user can specify a
bucketCheckInterval(by default 1m) and the sink will check periodically and roll the part file if the specified rolling policy says so.Part files can be in one of three states:
in-progress,pendingorfinished. The reason for this is how the sink works together with the checkpointing mechanism to provide exactly-once semantics and fault-tolerance. The part file that is currently being written to isin-progress. Once a part file is closed for writing it becomespending. When a checkpoint is successful the currently pending files will be moved tofinished.If case of a failure, and in order to guarantee exactly-once semantics, the sink should roll back to the state it had when that last successful checkpoint occurred. To this end, when restoring, the restored files in
pendingstate are transferred into thefinishedstate while anyin-progressfiles are rolled back, so that they do not contain data that arrived after the checkpoint from which we restore.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classStreamingFileSink.BucketsBuilder<IN,BucketID,T extends StreamingFileSink.BucketsBuilder<IN,BucketID,T>>Deprecated.The base abstract class for theStreamingFileSink.RowFormatBuilderandStreamingFileSink.BulkFormatBuilder.static classStreamingFileSink.BulkFormatBuilder<IN,BucketID,T extends StreamingFileSink.BulkFormatBuilder<IN,BucketID,T>>Deprecated.A builder for configuring the sink for bulk-encoding formats, e.g.static classStreamingFileSink.DefaultBulkFormatBuilder<IN>Deprecated.Builder for the vanillaStreamingFileSinkusing a bulk format.static classStreamingFileSink.DefaultRowFormatBuilder<IN>Deprecated.Builder for the vanillaStreamingFileSinkusing a row format.static classStreamingFileSink.RowFormatBuilder<IN,BucketID,T extends StreamingFileSink.RowFormatBuilder<IN,BucketID,T>>Deprecated.A builder for configuring the sink for row-wise encoding formats.
-
Constructor Summary
Constructors Constructor Description StreamingFileSink(StreamingFileSink.BucketsBuilder<IN,?,? extends StreamingFileSink.BucketsBuilder<IN,?,?>> bucketsBuilder, long bucketCheckInterval)Deprecated.Creates a newStreamingFileSinkthat writes files to the given base directory with the give buckets properties.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidclose()Deprecated.static <IN> StreamingFileSink.DefaultBulkFormatBuilder<IN>forBulkFormat(org.apache.flink.core.fs.Path basePath, org.apache.flink.api.common.serialization.BulkWriter.Factory<IN> writerFactory)Deprecated.Creates the builder for aStreamingFileSinkwith bulk-encoding format.static <IN> StreamingFileSink.DefaultRowFormatBuilder<IN>forRowFormat(org.apache.flink.core.fs.Path basePath, org.apache.flink.api.common.serialization.Encoder<IN> encoder)Deprecated.Creates the builder for aStreamingFileSinkwith row-encoding format.voidinitializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)Deprecated.voidinvoke(IN value, org.apache.flink.streaming.api.functions.sink.legacy.SinkFunction.Context context)Deprecated.voidnotifyCheckpointAborted(long checkpointId)Deprecated.voidnotifyCheckpointComplete(long checkpointId)Deprecated.voidsnapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)Deprecated.-
Methods inherited from class org.apache.flink.api.common.functions.AbstractRichFunction
getIterationRuntimeContext, getRuntimeContext, open, setRuntimeContext
-
-
-
-
Constructor Detail
-
StreamingFileSink
@VisibleForTesting public StreamingFileSink(StreamingFileSink.BucketsBuilder<IN,?,? extends StreamingFileSink.BucketsBuilder<IN,?,?>> bucketsBuilder, long bucketCheckInterval)
Deprecated.Creates a newStreamingFileSinkthat writes files to the given base directory with the give buckets properties.
-
-
Method Detail
-
forRowFormat
@Internal public static <IN> StreamingFileSink.DefaultRowFormatBuilder<IN> forRowFormat(org.apache.flink.core.fs.Path basePath, org.apache.flink.api.common.serialization.Encoder<IN> encoder)
Deprecated.Creates the builder for aStreamingFileSinkwith row-encoding format.- Type Parameters:
IN- the type of incoming elements- Parameters:
basePath- the base path where all the buckets are going to be created as sub-directories.encoder- theEncoderto be used when writing elements in the buckets.- Returns:
- The builder where the remaining of the configuration parameters for the sink can be
configured. In order to instantiate the sink, call
StreamingFileSink.RowFormatBuilder.build()after specifying the desired parameters.
-
forBulkFormat
@Internal public static <IN> StreamingFileSink.DefaultBulkFormatBuilder<IN> forBulkFormat(org.apache.flink.core.fs.Path basePath, org.apache.flink.api.common.serialization.BulkWriter.Factory<IN> writerFactory)
Deprecated.Creates the builder for aStreamingFileSinkwith bulk-encoding format.- Type Parameters:
IN- the type of incoming elements- Parameters:
basePath- the base path where all the buckets are going to be created as sub-directories.writerFactory- theBulkWriter.Factoryto be used when writing elements in the buckets.- Returns:
- The builder where the remaining of the configuration parameters for the sink can be
configured. In order to instantiate the sink, call
StreamingFileSink.BulkFormatBuilder.build()after specifying the desired parameters.
-
initializeState
public void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context) throws ExceptionDeprecated.- Specified by:
initializeStatein interfaceorg.apache.flink.streaming.api.checkpoint.CheckpointedFunction- Throws:
Exception
-
notifyCheckpointComplete
public void notifyCheckpointComplete(long checkpointId) throws ExceptionDeprecated.- Specified by:
notifyCheckpointCompletein interfaceorg.apache.flink.api.common.state.CheckpointListener- Throws:
Exception
-
notifyCheckpointAborted
public void notifyCheckpointAborted(long checkpointId)
Deprecated.- Specified by:
notifyCheckpointAbortedin interfaceorg.apache.flink.api.common.state.CheckpointListener
-
snapshotState
public void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context) throws ExceptionDeprecated.- Specified by:
snapshotStatein interfaceorg.apache.flink.streaming.api.checkpoint.CheckpointedFunction- Throws:
Exception
-
invoke
public void invoke(IN value, org.apache.flink.streaming.api.functions.sink.legacy.SinkFunction.Context context) throws Exception
Deprecated.
-
-