Description:

Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles

Additional Details...

Tags: hadoop, HDFS, get, fetch, ingest, source, sequence file

Properties:

In the list below, the names of required properties appear in bold. Anyother properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language (or simply EL), and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

NameDescriptionDefault ValueValid ValuesELSensitive
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.NoNo
DirectoryThe HDFS directory from which files should be readNoNo
Recurse SubdirectoriesIndicates whether to pull files from subdirectories of the HDFS directorytrue
  • true
  • false
NoNo
Keep Source FileDetermines whether to delete the file from HDFS after it has been successfully transferredfalse
  • true
  • false
NoNo
File Filter RegexA Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetchedNoNo
Filter Match Name OnlyIf true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparisontrue
  • true
  • false
NoNo
Ignore Dotted FilesIf true, files whose names begin with a dot (".") will be ignoredtrue
  • true
  • false
NoNo
Minimum File AgeThe minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored0 secNoNo
Maximum File AgeThe maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignoredNoNo
Polling IntervalIndicates how long to wait between performing directory listings0 secNoNo
Batch SizeThe maximum number of files to pull in each iteration, based on run schedule.100NoNo
IO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop ConfigurationNoNo
FlowFile ContentIndicate if the content is to be both the key and value of the Sequence File, or just the value.VALUE ONLY
  • VALUE ONLY
  • KEY VALUE PAIR
NoNo

Relationships:

NameDescription
passthroughIf this processor has an input queue for some reason, then FlowFiles arriving on that input are transferred to this relationship
successAll files retrieved from HDFS are transferred to this relationship