Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles
hadoop, HDFS, get, fetch, ingest, source, filesystem
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language, and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.
Name | Default Value | Valid Values | Description |
---|---|---|---|
Hadoop Configuration Resources | A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. | ||
Directory | The HDFS directory from which files should be read | ||
Recurse Subdirectories | true |
| Indicates whether to pull files from subdirectories of the HDFS directory |
Keep Source File | false |
| Determines whether to delete the file from HDFS after it has been successfully transferred |
File Filter Regex | A Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetched | ||
Filter Match Name Only | true |
| If true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparison |
Ignore Dotted Files | true |
| If true, files whose names begin with a dot (".") will be ignored |
Minimum File Age | 0 sec | The minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored | |
Maximum File Age | The maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored | ||
Polling Interval | 0 sec | Indicates how long to wait between performing directory listings | |
Batch Size | 100 | The maximum number of files to pull in each iteration, based on run schedule. | |
IO Buffer Size | Amount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration |
Name | Description |
---|---|
passthrough | If this processor has an input queue for some reason, then FlowFiles arriving on that input are transferred to this relationship |
success | All files retrieved from HDFS are transferred to this relationship |