Description:

Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. If so, routes the FlowFile to 'duplicate' with an attribute named 'original.identifier' that specifies the original FlowFile's"description", which is specified in the <FlowFile Description> property. If the FlowFile is not determined to be a duplicate, the Processor routes the FlowFile to 'non-duplicate'

Additional Details...

Tags: experimental, hash, dupe, duplicate, dedupe

Properties:

In the list below, the names of required properties appear in bold. Anyother properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language (or simply EL), and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

NameDescriptionDefault ValueValid ValuesELSensitive
Cache Entry IdentifierA FlowFile attribute, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the value used to identify duplicates; it is this value that is cached${hash.value}YesNo
FlowFile DescriptionWhen a FlowFile is added to the cache, this value is stored along with it so that if a duplicate is found, this description of the original FlowFile will be added to the duplicate's "original.flowfile.description" attributeYesNo
Age Off DurationTime interval to age off cached FlowFilesNoNo
Distributed Cache ServiceThe Controller Service that is used to cache unique identifiers, used to determine duplicatesNoNo

Relationships:

NameDescription
non-duplicateIf a FlowFile's Cache Entry Identifier was not found in the cache, it will be routed to this relationship
duplicateIf a FlowFile has been detected to be a duplicate, it will be routed to this relationship
failureIf unable to communicate with the cache, the FlowFile will be penalized and routed to this relationship