Discovery must include requests for email and related attachments as necessities in the current electronic communication age. Document requests no longer focus exclusively on boxes of paper and file cabinets of reports. The idea that a case can be handled without giving careful consideration to multiple media types, is not only naïve, it’s risky.
So, how do you handle the variety of sources maintained on laptops; stand-alone hard drives; pen drives, CD’s/DVD’s, servers, etc. when you are more comfortable with paper? It is clear that the sizes of productions are now demanding review hours that reach beyond almost any reasonable cost vs. results ratio. How do you adjust to a law practice style that requires an attorney to read each document? It is simple. Prioritize and invest in products that are customized to assist in keeping prices down, and increasing the effectiveness of the lawyer time expended.
Set up Protocols: Reinventing the wheel with each new piece of litigation is a great waste of time and money. When commencing discovery, an open line of communication with opposing counsel will mean a level playing field for the exchange of data. Discussions need to take place to identify how material will be exchanged. For example, will electronic data be provided in native format . (Native means in the format in which it was created, i.e. Word; Excel; Pictures; ) or in TIFF or PDF image format? Will you each be providing OCR (optical character recognition) to allow for text searching? What metadata fields will each party provide and how will document relationships, i.e. email/attachment, be maintained? This is also the time to provide a clear definition of what a “document” is when paper documents are scanned for inclusion into litigation management software. The paper needs to be scanned with a unitization that follows the document breaks (average 4-8 pages per record) to allow for quicker review and indexing in your software product of choice (i.e. CTSummation; Concordance; Excel are some examples). Likewise, be prepared to provide your materials in the manner requested by opposing counsel.
It can be a huge advantage to your case if you accept data from your own client in a native format so that search terms can be used to drill down to substantive materials. To do the review, however, you need to have a product that will facilitate it, or a hosting service to give you a temporary resource for review. Printing electronic media files to paper to manually review is cost prohibitive, and does not maximize work product hours by saving retrieval/review time in the future. There is no need to universally process electronic documents into a “paper/tiff’d” production version, if they aren’t responsive. Firm ownership of software can be accomplished for a few thousand dollars. The number of licenses, which determines price, is dependent on the number of people that will need to have access at any one time. An unlimited number of cases can be accommodated on the software, prohibited only by the amount of space the law practice has on its server.
Data Collection: Processing electronic material is more sophisticated than running paper through a copy machine. As outlined below there are several steps that can be taken at the start of the discovery process which will help to prioritize the data that will need attorney review.
It is imperative that the collection and subsequent chain of custody be defensible in court. Knowledge and foresight about how the data will be used is essential so that, for example, collections are done to maintain the primary/attachment relationship of a record. If an email has a spreadsheet attached, the connection must be maintained throughout the collection and review.
Identifying what metadata fields will be produced by each side is preferable at the initial stage of discovery. Dozens of metadata fields are available that provide no substantive data – so what is the point of producing them? Basics, such as From, To, Date Sent, Date Created, Date Modified; File Extension; and Folder Names may be important to the case. As mentioned early on, protocols that identify useful fields at the start are valuable in forestalling discovery disagreement, and motions to compel, later on.
Metadata may include important items such as who created or edited a document, and when it was created, edited, saved or printed. Sometimes prior versions, deletions and hidden comments may be retained as application metadata in an electronic file. The new e-discovery amendments explicitly recognize the existence and discoverability of metadata. Application metadata, however, can be recovered only from electronic files in native file format, other electronic files that have expressly preserved the metadata. A paper document that is printed will usually not show any of the metadata. (1)
Search Terms: Building a list of search terms that is agreed upon by all parties can eliminate arguments about the responsiveness issue while, at the same time, pairing down the potential universe of documents that need review. Search terms can be used to review a data set (i.e. Company Server) to collect only those records that have a “hit” for the terms. The first assembly of possible privileged documents could be as simple as searching for the names and email addresses of attorneys involved in the matter. It is done mechanically and can be the first stage of prioritizing documents that may need to be withheld.
Construction of the search terms using connectors like “near” and “adjacent” will reduce the number of false positives. A false positive is a hit on a word like Main Street – when what you are looking for is an individual named William Street. A simple change in a search from “or” to “and” can result in thousands of dollars in search and processing costs. As a recent example, 90% of the records provided by a client proved to be irrelevant to the case, and were eventually eliminated when the client admitted to their editorial changes to the attorney’s stated search term directive.
Denisting is the function whereby all operating systems and application files are removed from the data pool. Many internal Information Technology departments withhold these types of files when producing data, but in those cases where a full copy is taken of a server or laptop, they may still be included. This increases the size of the production while offering little substantive data. When discussing data collection, identify any system files (i.e. Window Systems) that can be removed in batches.
Deduplication allows for the productions from each witness/custodian to be compared, either within itself or across the universe of custodians to identify and remove exact duplicates. It is possible to significantly reduce what you need to review IF the parties agree to deduplicate whatever data is produced. If you are intent on living in a paper world, keep in mind that paper either originating in paper or printed to paper, cannot be deduplicated by non-manual means.
Near DeDuplication is a term that is used to describe the automatic grouping of documents by discussion threads and concepts. Specialized software using complex algorithms performs the filtering process and provides a solar system pattern on a screen to illustrate the proximity of one word to another. This algorithm procedure theoretically allows you to focus in quickly on groups of documents likely to be non-responsive, privileged, or substantive. The cost for this type of sophisticated analysis is in-keeping with the expected attorney time it saves (significant). It also requires a commitment to learning the process since the reviewer needs to accurately interpret the results. At this stage in the game, expanding the use of search terms in a consistent, defensible process, would be a significant step in the right direction. Wisconsin attorneys, for the most part, are not there yet.
Cost and volume are directly linked in the world of ediscovery. Calculating those costs can be difficult when gigabytes of data can be drastically affected by the amount of color, graphics or videos that may be contained on a hard drive. Guesstimates of ratios, based on gigabytes of data can range around 80,000 – 125,000 “pages” (if tiff’d) per 10 gigabyte of data. The percentage of duplication is impacted by the corporate environment and its tendency to send emails to numerous recipients, resulting in a high duplication rate. Speed of processing can be dependent on the ratio of encryptions, passwords and proprietary software contained in a data set. It is not uncommon for emails with attachments to average 22,000 records PER gigabyte of data. With numbers such as this, even the smallest of cases can generate an abundance of data to be reviewed.
Tracking records. If you are currently using Microsoft products to track discovery records, eliminate the use of Word or WordPerfect tables. Excel is able to export easily to the standard litigation discovery tools, and would provide the option of transitioning up the technology ladder, if and when you decide to take that step. As clients become more cognizant of the paperless practice, the use of available software can provide a much more powerful way to retrieve and maintain all elements of a case. The integrity of the data is locked down, and the flexibility to respond on the fly is immediate as compared to manual review of folders or boxes of paper.
Cost sharing of the processing can also help all parties maintain legal budgets. If using a third party vendor, each side can pull from the same data set, as long as each sign off on the conflict created by such a collaborative effort.
Discovery practices have evolved as the media types that are being discovered has changed. The million page case that happened once in a lifetime is happening at a much higher degree of frequency. The question of whether it is worth pursuing litigation on a matter is often directed by the desired end result as well as the cost to get there. Keeping in mind the intricacies of managing the more complicated elements of today’s document productions can allow discovery to proceed in a manner that combines thoughtful priorities and eliminates excess. Logical uses of ediscovery can focus attorney time and paralegal organizational skills on the most substantive data that is worthy of their time and the client’s expense.
Shawn R. Olley is the owner of Midwest Paralegal Services, Inc. a contract paralegal firm and full service litigation support facility. Experienced paralegals provided on a contract basis are available for any project scope or practice area. Midwest also provides paper scanning; ediscovery; forensics and trial presentation expertise. Midwest has offices in Milwaukee and Madison.
Endnotes:
Metadata: A New Tool in e-Discovery
http://www.lexbe.com/hp/indepth-e-discovery-rule-metadata.htm