Avinash Yerramsetty Posted August 11, 2017 Author Share Posted August 11, 2017 Hi All, I observed difference in MD5 when compared to normal discovery job and Streaming discovery job and also change in the counts Provided the same inputs for the both the jobs. -Can any one please let me know what all values does eCapture consider to generate MD5 hash values for Normal discovery and Streaming Discovery ? Details: Version : 2017.3.5 Application Version : 17.4.10010.1809 No of workers used : 4 ( Add enabled) Input for Discovery Job : PST file( 94 Mb) Thanks. Avi Link to comment Share on other sites More sharing options...
Avinash Yerramsetty Posted August 11, 2017 Share Posted August 11, 2017 Hi All, I observed difference in MD5 when compared to normal discovery job and Streaming discovery job and also change in the counts Provided the same inputs for the both the jobs. -Can any one please let me know what all values does eCapture consider to generate MD5 hash values for Normal discovery and Streaming Discovery ? Details: Version : 2017.3.5 Application Version : 17.4.10010.1809 No of workers used : 4 ( Add enabled) Input for Discovery Job : PST file( 94 Mb) Thanks. Avi Link to comment Share on other sites More sharing options...
S.W.A.T. Engineer Michael Avakian Posted August 11, 2017 S.W.A.T. Engineer Share Posted August 11, 2017 Avi, Thank you for the details. The values available to calculate the hash are the same between the two job types for emails, however, the algorithm used between the two is slightly different due to a technology difference. For emails, you can see the fields used in the discovery options. For loose files or non email type files, the entire file is hashed. Because of this difference, streaming jobs and standard jobs cannot deduplicate against each other. Link to comment Share on other sites More sharing options...
S.W.A.T. Engineer Michael Avakian Posted August 11, 2017 S.W.A.T. Engineer Share Posted August 11, 2017 Avi, Thank you for the details. The values available to calculate the hash are the same between the two job types for emails, however, the algorithm used between the two is slightly different due to a technology difference. For emails, you can see the fields used in the discovery options. For loose files or non email type files, the entire file is hashed. Because of this difference, streaming jobs and standard jobs cannot deduplicate against each other. Link to comment Share on other sites More sharing options...
Avinash Yerramsetty Posted August 25, 2017 Author Share Posted August 25, 2017 Hi Micheal, Thank you for your reply. Also I found count difference in Normal and Streaming discovery when I gave same input data for both the jobs. Counts : Normal Discovery : 14056 Streaming Discovery : 14052 Input for Discovery/Streaming Job : PST file( 94 Mb) As said above is that eCapture is using two different methods to extract data. If so can you please explain the differences? Thanks, Avi, Link to comment Share on other sites More sharing options...
Avinash Yerramsetty Posted August 25, 2017 Author Share Posted August 25, 2017 Hi Micheal, Thank you for your reply. Also I found count difference in Normal and Streaming discovery when I gave same input data for both the jobs. Counts : Normal Discovery : 14056 Streaming Discovery : 14052 Input for Discovery/Streaming Job : PST file( 94 Mb) As said above is that eCapture is using two different methods to extract data. If so can you please explain the differences? Thanks, Avi, Link to comment Share on other sites More sharing options...
Vaibhav Posted September 6, 2017 Share Posted September 6, 2017 Good one Avi. Link to comment Share on other sites More sharing options...
Vaibhav Posted September 6, 2017 Share Posted September 6, 2017 Good one Avi. Link to comment Share on other sites More sharing options...
Sharmarke Afgab Posted September 7, 2017 Share Posted September 7, 2017 Avi, One other item that can contribute to count differences is the method of extraction of embedded items between the two engines. For example, standard discovery extracts more fembedded file types at the top level while streaming goes deeper and extracts within several layers of embedded files. The two methods are mostly similar in file type support but different in certain extraction methods. Link to comment Share on other sites More sharing options...
Sharmarke Afgab Posted September 7, 2017 Share Posted September 7, 2017 Avi, One other item that can contribute to count differences is the method of extraction of embedded items between the two engines. For example, standard discovery extracts more fembedded file types at the top level while streaming goes deeper and extracts within several layers of embedded files. The two methods are mostly similar in file type support but different in certain extraction methods. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.