Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles
Tags: hadoop, HDFS, get, fetch, ingest, source, filesystem
Properties:
In the list below, the names of required properties appear in bold. Anyother properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language (or simply EL), and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.
Name | Description | Default Value | Valid Values | EL | Sensitive |
---|---|---|---|---|---|
Hadoop Configuration Resources | A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. | No | No | ||
Directory | The HDFS directory from which files should be read | No | No | ||
Recurse Subdirectories | Indicates whether to pull files from subdirectories of the HDFS directory | true |
| No | No |
Keep Source File | Determines whether to delete the file from HDFS after it has been successfully transferred | false |
| No | No |
File Filter Regex | A Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetched | No | No | ||
Filter Match Name Only | If true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparison | true |
| No | No |
Ignore Dotted Files | If true, files whose names begin with a dot (".") will be ignored | true |
| No | No |
Minimum File Age | The minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored | 0 sec | No | No | |
Maximum File Age | The maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored | No | No | ||
Polling Interval | Indicates how long to wait between performing directory listings | 0 sec | No | No | |
Batch Size | The maximum number of files to pull in each iteration, based on run schedule. | 100 | No | No | |
IO Buffer Size | Amount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration | No | No |
Relationships:
Name | Description |
---|---|
passthrough | If this processor has an input queue for some reason, then FlowFiles arriving on that input are transferred to this relationship |
success | All files retrieved from HDFS are transferred to this relationship |