Jump to content

jfishcasselsbrock.com

Members
  • Posts

    373
  • Joined

  • Days Won

    5

Posts posted by jfishcasselsbrock.com

  1. De-Duplication based on the custodian of the document.

    If a document was created by a custodian, all other versions of the document should be marked as duplicate, no mater the order of ingestion/processing.

    This was an option in Allegro.

  2. Hello,

    I've experienced the same issue with version 2017.4.1. The fix is to regenerate the Publish case again. For some reason, this version does not always export out the natives properly or setup the database properly the first time.

     

  3. Hello,

    If I include this field in a Data extract export job, when loading into Eclipse Web, (either by direct push or manually), Eclipse does not perform its standard native file identification.

    This results in the native identification fields not being populated with values.

    I also noticed that if I save a template to disk for a data Extract export job, the NativeFileType field automatically appears in the list of fields set to export when I use the template ini file in a new data extract export job.

    I did not include it in the template ini file.

  4. I've also noticed that you can't run near duplicate detection right after running email threading. You have to wait a few minutes before you do. This is due in part to the way that the values are imported into Eclipse after CAAT processes the docs. The eclipse indexing agents and scheduler need to complete the import before running another CAAT operation.

  5. I've also noticed that you can't run near duplicate detection right after running email threading. You have to wait a few minutes before you do. This is due in part to the way that the values are imported into Eclipse after CAAT processes the docs. The eclipse indexing agents and scheduler need to complete the import before running another CAAT operation.

  6. Also, if you have to nest complex Boolean searches, using the proper brackets to nest is important. All goes back to math and logical nesting statements/order of operations.

    (a +b)/(c+d). Is not the same as a+b/c+d

    the one difference is the order of nested operations. Eclipse evaluates nested statements from left to right so it's important to put the nested statements that filter the most at the start of a complex Boolean search in order to reduce the procesing time to evaluate the search and search results.

  7. Also, if you have to nest complex Boolean searches, using the proper brackets to nest is important. All goes back to math and logical nesting statements/order of operations.

    (a +b)/(c+d). Is not the same as a+b/c+d

    the one difference is the order of nested operations. Eclipse evaluates nested statements from left to right so it's important to put the nested statements that filter the most at the start of a complex Boolean search in order to reduce the procesing time to evaluate the search and search results.

  8. On the subject of CAAT and extracted text, it is most important to make sure the option to include or remove whitspaces is set the same between scanned images and native edocs and emails. Also, the format of the email header fields is tantamount. Avoid processing email to HTML as the extracted text of the email header format is changed and CAAT will have issues combining threads.

    as an example, a thread like could be split as separate threads:

     

    from: jeff

    to: john

    sent: Jan 1 2017

    subject: hello

     

    Hi

     

     

    from: john

    to: jeff Sent: on Jan 2 2017

    Subject: re hello

     

    Right back at ya

     

     

    Original message from jeff

     

    hi

     

     

     

    this issue is more apparent for scanned emails that contain a header of the user's name from outlook or some gif images from an online mail service such as Gmail or hotmail.

     

    the email header has to be normalized in order for CAAT email threading to work properly.

     

  9. On the subject of CAAT and extracted text, it is most important to make sure the option to include or remove whitspaces is set the same between scanned images and native edocs and emails. Also, the format of the email header fields is tantamount. Avoid processing email to HTML as the extracted text of the email header format is changed and CAAT will have issues combining threads.

    as an example, a thread like could be split as separate threads:

     

    from: jeff

    to: john

    sent: Jan 1 2017

    subject: hello

     

    Hi

     

     

    from: john

    to: jeff Sent: on Jan 2 2017

    Subject: re hello

     

    Right back at ya

     

     

    Original message from jeff

     

    hi

     

     

     

    this issue is more apparent for scanned emails that contain a header of the user's name from outlook or some gif images from an online mail service such as Gmail or hotmail.

     

    the email header has to be normalized in order for CAAT email threading to work properly.

     

×
×
  • Create New...