SeisCap Duo Encapsulation Standard

November 30, 2007


November 30, 2007 - There has been some confusion about the SeisCap Duo being a new format. Kelman's KQ, Seitel's SUOF and the CSEG SeisCap Duo are all the same with trivial differences. The basic extractor will work for all three flavours of the index file.

Processors access data using only the byte offset and length of the record to extract shot records from the .field file.

Kelman provides the following documentation for anyone wishing to extract KQ data. All three formats use 36 bytes of the index file to describe each record.

May 24, 2007 - We have removed the create_seiscap.exe from this web site. Please note that it took C&C only a week to write the code to perform the encapsulation.

December 5, 2006 - Here is my PowerPoint Presentation for the Mackenzie Delta SeisCap Project Status. To date we have completed 100 our our 500 lines. The most expensive step is the cost of scanning microfilm. We think we can reduce this from our current costs of $.50 down to the 5 to ten cents per image.

Novermber 20, 2006 - Here is a copy of an article that should be published in the December issue of the CSEG Recorder starting at page 34. It explains basic archive principles and how to convert tape shot records into record oriented encapsulated files suitable for storage on any media. Best of all, it's free!

October 30, 2006 - Found and fixed a bug in the de-encapsulator program. Under some conditions, the program would fail to open up a file. Here is the new version for seiscap_extractor.exe.

October 29, 2006 - Does anyone out there read this stuff? Do you think it has value? I'm looking for a donation of $30k for both the SeisCap creator and the SeisCap extractor. He's what we will deliver:

  1. Provide additional details with any error messages
  2. Extract Kelmans's KQ or Seitel's SUOF data.
  3. Verify KQ and SUOF format. Has it changed over time? EnCana had over ~2000 encapsulated files go south and have to be re archived due to a 1024 byte problem when our backup tape drives were changed. You must routinely very that your data has not changed. Create a batch version of this program.
  4. Port both programs to both Unix and Linux.
  5. Add additional file types .zip
September 5, 2006 - Building a case to Encapsulate and Verify 355 2D seismic ines. Submitting a proposal at $200 per 2D seismic line and create the following quality control products: The data will be verified as complete and ready to process. You no longer have to wait days or weeks to have all your data collected. Here is my PowerPoint Presentation Who is ready to place all your seismic data, segy files, field tapes, ob's all organized on your hard drive

June 29, 2006 - Build a web document to show how to Encapsulate data at EnCana. This link will provide details about the naming conventions and how we prepare the data to be encapsulated.

June 8, 2006 - Here's how to run the create_seiscap.exe file:

  1. Create a comma separated list of file names (and path), File associations, and Comments.
  2. Run the batch program. I place the program at the root level of my D: drive and then execute
    d:create_seiscap.exe P172685.BURNT_MALLIK_M3D.archive.csv P172685.BURNT_MALLIK_M3D.seiscap.field
    
  3. This will create three files, the data file, the index and a log file (human readable report). Note that the log file is a copy of the last file in the data file. Here's what it looks like:
  4. We are also adding a file association for compressed files, .zip Note that the program design is recursive, we can encapsulate encapsulated files. You can take all those encapsulated files from Kelman or Seitel and do another level of encapsulation!
May 5, 2006 - Met with data management and declared some naming conventions and created a work flow document on the steps to Archive your data. Consistant names are required to facilite inventory reports. Anyone who is archiving data for EnCana, please use these standards!

May 2, 2006 - Terry has fixed a bug and provided another option. They are:

I have also been busy building some scripts to automatically create Containers that can be accessed from both unix and from the Windows side. I even create the info.txt file for each and every seismic line as well as deposit the SEGP file into the Survey_audited file. It's slick. Click here to see what the containers look like. We create directories for Archive, Basic, Business, Design, Field_shots, Gathers, Stacks, Survey_audited and Survey_original. This 3D is around 25gig and the field shots have been archived into around 4gig chunks.

April 20, 2006 - Here's a PowerPoint for my recommendation for EnCana's Geophysical Advisory Group (GAG) to approve the SeisCap encapsulation method and to continue to work with Archive companies (Seitel, Kelman), Field Instrument Manufacturers (ARAM, I/O, Sercel) and Seismic Processors (C&C, Geo-X, Sensor). Note the slide that shows the costs of Encapsulation going from $40,000 down to $0 when data are encapsulated directly in the field.

April 19, 2006 - Made numerous changes, SeisCaped ~ 30gig of field of data. Seems to be working! Here's what else has been done.

March 8, 2006 - Bill Leakey of Seitel Solutions has provided additional reasons why we do not like Vanguard as a format.

March 6, 2006 - The check sum method has been upgraded to 16bit CRC that is more robust than the 8bit XOR used by Kelman and Seitel. The SeisCap extractor now has the option to verify the check sums. This is especially useful when data are transferred across a network. We have also a second text file that contains key information relative to this line. We are going to add a first line that contains an inventory of the archive. Defined archives now exist for prestack and historical.

February 13, 2006 - Cleaned up the description, reserved a byte for future use. We are now working on a check program for data verification.

February 8, 2006 - The SeisCap Viewer has now been updated to handle field data in segy and segd formats. We will be working on examples of sega, segb and segc.

February 2, 2006 - The Proposed SeisCap Encapsulation standard now consists of only two files, a data and an index file. The log of the description of the encapsulation process is now the last file of the data file and contains enough information that can be used to recover the data if the index file is lost.

Seismic Archive Goals

  1. The data will reside in a format suitable for transfer over networks.
  2. The data will contain a signature that will ensure that modifications to the data can be detected at any time in the future.
  3. The data contains the necessary information required to perform data verification, embedded in the format of that data.
  4. The format has the ability to track errors that occurred in the archiving process.
  5. The archive format does not need to be extended or modified to support additional data formats as they evolve.
  6. The archive process logs track the processes and actions that took place during the archive process.
  7. The Original Data is not altered. The records remain in the original record order and all EOF (End of File) markers are preserved.
  8. This single format of the archive is independent of the format and media type of the input dataset.
  9. Information is not added or deleted from the binary data stream. Data order is preserved.
  10. The data can easily be restored to the original or like media. It is trivial to decode this format. I'll define trivial as less than a day!

What's Wrong with Existing Standards?

In a lot of ways, magnetic tape is our perfect format for encapsulation. It is a sequential, self defining, record orientated storage mechanism that does not depend upon the format definition of the input data. Data are stored based on sequential record order and size of the records. However, tape has two major problems:
  1. The media is not permanent, the magnetic coating doesn't always stay glued. Tape stiction is a major problems, especially in damp climates. Back in the old days, tape manufacturers recommended the rotation of the physical tape every six months. We stopped doing this during a downsize period.
  2. Tape is sequential, you have to spin the full tape in order to retrieve the last file. Disk has the advantage that the last record can be retrieved as quick as the first.
The industry is moving to DVD's for small data sets and to hard drives for large data sets. IDE drives are available in 500gig sizes (February 2006). For a couple of thousand dollars you can build a 2Terra Byte Raid V server. Everyone appreciates the random access ability of disk.

The SeisCap encapsulation standard is independant of media type. It can change with the future. So why do we need yet another encapsulation format? What is wrong with the existing formats? You can refer to SEG Technical Standards for more information on what has been blessed at the SEG. I was given a copy of an old NDS description of Lacey, Vanguard as well as a slew of other transfer formats. It's worth reading this document. Here's a brief review of existing encapsulation techniques that I have been able to evaluate.

SEG RODE

RODE was initially published in 1996 as a method to encapsulate well logs. The original author has been reputed to say that he would not recommend it for Seismic Data. This method has been adopted by several Oil Companies, Shell, Mobil... numerous vendors (CGG) etc. We do not support the use of RODE for the following reasons:
  1. It's very complicated. It's expensive to purchase software to decode RODE. It takes around six months to write a program to decode RODE and it will only work your current example of RODE. We are providing free software for SeisCap format. Look at the source code and see how trivial it is to write your own software!
  2. RODE is not a standard. There is too much "wiggle room". Everyone's version of RODE is different. Every time you see a different RODE, you have to get your programmer to fix his code.
  3. I have even talked with a Geophysicist within SHELL and he does not recommend using RODE.
  4. The only companies that I can find that recommend RODE are the software vendors that have spent the investment to write the code that supports RODE.
  5. RODE is a format defined around magnetic tape.

VANGUARD - Dual files per shot

Vanguard is a method to transfer files from seismic field tapes to CD or DVD. It takes all the records from tape and splits them up into a header file and a record file. Here's why we don't like it:
  1. Does not represent a single format. Formats have to be defined for each new format. Formats defined so far for sega through segd. To identify the header/data pairs they usually contain a common prefix, 'FILE' or just 'F' and a suffix that separates the header and data portion. For example an extension of '.hb' and '.sb' for segb data. Notice that the selection of letters is critical, the header record has to sort before the data record.
  2. No check sum information. Double the number of files to have to look after in an archive
Processing shops generally like to work with Vanguard. Our encapsulation method provides all the advantages of a proper archive while preserving all the benefits of Vanguard. Note that our examples below have archived vanguard files. We have even preserved the directory paths. (Paths lacking data have been deleted). You can even use our viewer to verify the field file numbers.

If you still are not convinced, Bill Leakey of Seitel Solutions has authored the following document to explain why people may not understand the problem we are trying to solve.

LACEY - Eight byte preamble header

Lacey is an encapsulation format for any type of a data. It's very simple and consists of a byte stream that is created by all blocks of data are concatenated together prefixing each with an eight byte preamble header. The first four bytes are a sequence number of the block within the data set and the second is the byte sixe of the block directly following the preamble header. Support software is required to extract the blocks correctly by reading the preamble header to determine the size of the next block. Here's why we don't like it:
  1. Lack of check sum data verification
  2. Only defined for data blocks less than 2 gig
  3. No identification for each type of record
  4. Cannot handle different record types
We recommend that all Lacey data be reformated into the SeisCap encapsulation format.

Other Encapsulation Techniques

The following techniques have also been examined and rejected:
  1. Geo-x : Four byte preamble header and postamble trailer. Similar to Lacey but provides the ability to retrieve a file in the reverse direction.
Please let me know if you know of another way to encapsulate seismic data. I'd like to be as complete as I can.

The Proposed SeisCap Encapsulation Format

The SeisCap Encapsulation Format consists of only two files, a data file and an index file. The data file contains enough information for self extraction. The separate index file contains record file lengths and checksum information. Data that has been electronically sent can be verified.

File Description
xxx.yyy.field The bytes read from the tape with NOTHING added between the data blocks. The first file contains a list of the original file names file numbers. The second, optional file *info.txt contains a key summary of the data in the archive. The last file contains the log file of Human readable archiving, who did it, when and includes the first 32bytes of field data plus a list of the record sizes just in case the index file is lost or corrupt. (This use to be the .log file). Data to include field shot data, original field tape labels, field survey,Observers notes, chaining notes, drillers logs, scanned mylar of oldest processed section that includes a side label.
xxx.field.idx Provides information to find each record/file boundary within the data file. Includes integrity checks (16bit CRC checksum). Note that Kelman uses .index and Seitel use .indx so we have to be different as well and use .idx for the index file. This way, a quick glance at the archive and you will know what version you are looking at!
xxx.field.log OPTIONAL separate file. Normally it's the last file in the data stream. It contains the human readable information at the time of encapsulation. It contains a 32 byte hex dump of each record that can be used to identify field file numbers. For data with a file type of SGY, the EBCDIC header dumped out here. Each record length is also recorded so data can be recovered in case the idx file is lost or missing.

Note: Archive data sets will also be created for the following additional data types:

All 4 byte values are stored big-endian (native for the Sun, backwards for Intel)

Index Header

Index Header is a single occurance at the beginning of the file. It contains the version number and the total number of Files and Records.

Byte offset 0 1 2 3 Format Description
0 000 000 000 004 32 bit int Version (currently 4)
4 32 bit int Number of Files
8 32 bit int Number of Records

Note: One segy file consists of a record for the EBCDIC header, a record for the binary (line) header and a record for each trace header and trace data. The index Record is repeated for each record and EOF of the SeisCap.

Index Record

Byte Offset 0 1 2 3 Format Description
0 000 000 000 001 32 bit int File Number 1, 2, 3...
4 32 bit int Field File Number (1-9999)
8 32 bit int Record Number
12 64 bit int Start Location
20 64 bit int Record Length (0=EOF)
28 4 bytes 8 bit Format 16 bit CRC Checksum 8 bit Status
32 o 4 bytes Reserved Optional 3 ascii character file suffix,eg sgd, sgy, tif, zip...

Here are a list of the Format codes currently defined, contact the author if you would like any others.

Format Codes

File Suffix Format Code (byte)
csv v
sega a
segb b
segc c
segd d
segy, sgy y
jpg j
tif t
png g
xml m
xls x
ppt p
doc o
lst, txt, prn l
segp, ukooa, survey, sur s
pdf f
zip z

Status Codes

Here is a preliminary list of status codes: Please let me know if you want any more.

Status Code Description
0 Unknown Error
1 Good data - no errors
2 Short file
3 ???

Important Information for the SeisCap Format

  1. The first record is a simple ascii flat file that contains a list of original file name, File type (TXT, BIN, SEGY, SEGA, SEGB, SEGC, SEGD) with an optional Notes field. Here is an example for this first file. You can use a Spread Sheet program, Notepad or Wordpad, more or type command to view this file. This file contains a lot of useful information. I prefer to use Workpad because it will properly open up unix text files. This file is to be comma deliminated .csv file.

    Or, here's how I do this from unix, I perform six steps:

    1. Create the info.txt file for the Meta data directly from EDM
    2. Use the find command to create a list of files
    3. Re order my sorted list so the important data is at the top
    4. Run my Perl script to build the list of files with the appropriate list of files
    5. Manually edit adding additional comments
    6. Create a DOS .bat file to execute the create_seiscap.exe
  2. The Data Checksum is now a 16 bit CRC checksum for each record of the data file. Please note that the EBCDIC header, Binary Header, Trace Header plus the data are considered to be records for SEGY format data.
  3. The Status will normally be a 1 if there are no read errors, otherwise it is a 0
  4. If an EOF exists on tape, or a series of EOF's, the record length will be zero.
  5. The log file is the last file in the data file and contains useful, human readable data, the size for each record is located here just in case the index file is corrupt. NOTE: all the data can easily be recovered! Here is an example of the log file for field data and another example example for stack data archive. Note the full EBCDIC header is displayed for each segy file.

Advantages of the SeisCap Data Encapsulation Format

  1. This format is in the Public Domain and free for anyone to use. An SeisCap creator and a reader has been provided by C&C Systems in Calgary. We even provide the C source code so you can write your own program.
  2. The maximum data record and file size can now exceed 2gig.
  3. Simple, 36 byte index record, de-encapsulators are available for free, you can write your own in a couple of hours. Look at our source code to see how simple it is. All you have to do is read the Record Length as a 32 bit data field for data less than 2 gig and as a 64 bit field for larger data sets.
  4. Format will allow for record sizes or greater than 2gig.
  5. All record types can be encapsulated, it doesn't have to be just seismic. We encapsulate all different data types, graphic images, word processing reports etc.
  6. It has been pointed out to me that this format will allow multiple index files to be created. You can create your own index, perhaps one that contains XY's.
  7. This archive format is a modern version of what Kelman, Seitel, Devon, EnCana... have been using for more than 10 years. Your old decoder will still work. There are millions of encapsulated files already in this format. No one knows about it because it has not been published. This web site now publishes the standard. I plan on submitting this "standard" first to the CSEG and then to the SEG Standards Committee for ratification. It should be noted that Seitel are currently using format 1, Kelman uses format 2 and 3. Hence we are now proposing format 4. Here are a few reasons why format 4 is "better":
    1. Supports files larger than 2gig for Files and Records
    2. Robust 16bit checksum is used
    3. A single byte has been reserved for future use
    4. The 3 letter suffix for each record is stored in the index file (optional)
    5. The first file contains a list of all the file names. All files can be restored back to their original names.
    6. The second file, info.txt contains a summary of important information for the archive. A single line summary of the inventory of the data is included.
    7. Format as defined can encapsulate ANY type of digital data!
    8. The log file now contains enough information to reconstruct the data if the index file is lost or corrupt. The log file is to be appended to the .stack and .field data stream as the last file. All data are stored in a single file.
    9. More format codes have been defined.
    10. All processing centers in Calgary can read data in this archive format.

The SeisCap Extractor

SeisCap Viewer for field SEGD data

Here is an example of a simple archive extractor written by C&C systems here in Calgary. This Windows XP program is free and you can download it by right clicking on this link and saving the executable file (it's now grown to 2.9meg in size).
SeisCap Extractor for Field SEGD
We are encouraging anyone else who has written a better to place it into the public domain!

We are working on preparing a data set you can use to see how this all works. We will have a series of SEGA, SEGB, SEGC, SEGD, SEGY files you can use to verify your application.

Click on the image to see the full size version.

This archive contains a total of 399 files. The first file is a table of contents and it looks like this. It contains three columns, a file name, a file suffix and a description field. For sega, segb, segc and segd the first 32 bytes will be dumped into a hex viewer. See how easy it is to check your data, we can easily see the field file number eg 0327 and the format of the data, in this case 8058, 32 bit IEEE demultiplexed.

The Processors out there will like this format. You can easily verify the field file number for both the header and data files.

This archive file contains all data required for processing, survey, chaining, observer and drillers notes. In this case they have been scanned. Even the label of the scanned tape has been captured into our archive file. For this example, we have taken a CD of Vanguard data to the AEC standard and archived the data!

SeisCap Viewer for field SEGY data

This archive contains a total on only 11 files. It has the same basic data files, tape lables, chaining, driller's and observor's notes as well as the Field survey. The field shots are now a single segy file.
SeisCap Extractor for Field SEGY
The last file is the log of this archive (at the time of encapsulation. Examine this log and you will view the first 32bytes of data for each File and Record as well as the length of each record. For text files, the spaces and carriage returns have been removed.

IMPORTANT - this single file contains all the information to re construct your data! The index file is not critical! It's a nice to have (contains data verification information (check sums for all the records)


SeisCap Viewer for stack SEGY data

This archive contains a total on only 23 files for the final processing report. In addition to the segy stack files it includes:
SeisCap Extractor for Stack SEGY
  1. final survey,
  2. The Seismic Line label as a graphic .png file
  3. The Processing text file
  4. The Processing Quality Control report (16meg)
  5. The Survey as provided to the processor
  6. All shot records with all geometry and statics in the trace headers and not applied to the data. This file can be used for further reprocessing. I might even process the data using this file!
  7. The seismic segy files, note that they are in EnCana Workstation segy format and need only to be attached in SeisX or SeisWare!
  8. Chaining, drillers and observer notes.
  9. Log of the archive. Note the record and record length can be used in a pinch to de-encapsulate this file if the index file is lost. Click here for the log file

The data can easily be extracted or the view button will automatically launch your associated raster graphics viewer, word processor or segy viewer. Pssst, we are building a segy viewer that only requires number of samples and data sample format. Should work for most data sets without having to read traces and shot points.

Here's what the files look like, there is a stack file that contains all the data, and index file that contains pointers to the data with check sum information and a program to extract the data.

-rwxr-xr-x   1 ekeyser  expl       2101584 Feb  1 09:18 A44263.MKD-2.200601.cnc.idx
-rwxr-xr-x   1 ekeyser  expl     625381298 Feb  1 09:19 A44263.MKD-2.200601.cnc.stack
-rwxr-xr-x   1 ekeyser  expl        775680 Feb  1 09:19 seiscap_extractor.exe

That's it...
Site Owner: Eric Keyser