Proposed SEGY (SEGZ) changes

January 7, 2008


January 7, 2008 - Updated the spreadsheets containing a comparison between different interpretations of the segy standard, just follow this link

April 4, 2007. Times have changed. We need to move to a more modern way to describe a Seismic Data Exchange Format. We need to support:

After years of looking at SEGY files, the only agreement we can all come to is: Geophysicists are demanding flexibility in how we describe our data. Instead of standardizing our data, all we have to do is standardize how we describe our data. Follow this link to see how this is possible.

Recommendation for a Proposed SEGY rev(2). This simplified SEGY definition will contain only those items that can be agreed upon:

  1. The byte structure, 3200 bytes for Textual Description, 400 bytes for line based information and 240 bytes for traced based information.
  2. The Sample Interval, Bytes 17-18, Binary16
  3. The Number of samples, Bytes 21-22, Binary16
In order to give every company the flexibility they desire, it is proposed we use two signature bytes (399,400 of binary header) to uniquely identify the specified format. Further details are contained here for SEGY rev(2)

EnCana's proposed Workstation SEGY standard consists of three parts, The file name definition, the definition of the Workstation segy and an example of the EBCDIC header. What makes this standard work, is the validation script called check.pl. It needs to run before you transfer the data! The rest of this document outlines the how and the why this document has evolved.

For those of you who want to program to this standard, you may find it easier to examine the different parts, 2D, 3D, Gathers and Shots with the following spread sheets.

The Excel spreadsheets above show the differences from various versions of different segy formats. They also contain a list of names. The idea is that the header entries are no longer to be defined just by the byte-location and description, but instead by a name. This is important since with the ability (or necessity) to define the byte positions where each trace header name is located in the headers, the byte locations can no longer be relied upon to define which header entry we are referring to, so the fixed set of names may well become essential.

These data files make a good reference.

Properly implemented, data in this format needs only to be attached to users of SeisX and SeisWare after the file name has been updated with eg: .MIG.0.sgy. The perl script encana2segy.pl has been provided to rename (and update SeisX fields) for those of you who do not want to code up the SeisX specific stuff.

Data is to be provided to either on an ftp site, CD, DVD, hard drive, apple ipod, usb dongle etc. A second copy of the data is required for any data not sent directly to the EnCana Data Management Group.


Workstation SEGY Proposal Updates (changes)


Background

In 1975 Barry, Cavers and Kneale proposed a set of standards for digital exchange of Seismic Stack Data. Today, practically all that is left of the original 27 year old standard is the format of the data layout. Segy data contains a 3200 byte descriptive header, a 400 byte binary (line specific data) a 240 trace header followed by variable length data. The original standard pertained to 9 track 1/2 tape and IBM computers. SEGY Rev0 covered IBM floating point.

In June 1994 a group a Standards Committee was formed by the C.S.E.G. in Calgary. They re examined the standard and made their recommendations here. In October 1999, Doug Bath published an article in the CSEG recorder "This article outlines a GENERIC SEGY format for stacked data, providing essential information and workstation data formats while maintaining the original SEGY specifications as much as possible". This previous format recommended using integers and scalers of 1000. Our new format is now recommending using IEEE floating points where appropriate in the headers as well as the data samples. We agree with the CSEG that the number 6 should be used for IEEE float, not the number 5 as defined by Landmark and the SEG. This may cause you grief if you try to load data into Landmark without first changing the format code. Remember, Photon (SeisX) uses the number 5 to signify 8bit integer.

The data were originally stored as integer and IBM Floating point for numbers and EBCDIC for text descriptions. Today the standards are disk, CD, DVD, DLT etc with numbers stored as integer or IEEE float and ASCII for text. Even IBM has made the switch to the IEEE standard.

Any data on disk can be read more efficiently in IEEE than in IBM floating point!

Both the former AEC and PCP had their own standards. In fact, PCP had two standards, one for 2D and another for 3D. AEC had one standard for both and included many (but not all) fields as defined by SeisX. The following proposal will define a new single standard for segy at EnCana.

This new proposal includes all of the fields defined by the 2002 SEG Rev 1 standard except for a few minor fields:

Adherence to the specifications outlined below will permit both 2D and 3D stacked seismic lines to be directly attached to a SeisX project thus removing one of the time consuming SeisX project management steps.


Proposal

Naming Convention

It is proposed that EnCana adopt the former PCP naming convention for the following reasons: The following naming convention be used: Note that the EDM line line is assigned by EnCana's operation group. It will represent the original historical line name. 2D Examples:

F73379.ECA-VER-993D-02P.3D-verger.mig.200004.abc.sgy
A44267.MKD-5.2D-MKD-5.f-fk-mig-100.200302.gox.sgy
P172939.02-SBL-2.2D-SBL-2.f-fk-mig-94.200302.gox.sgy
A43572.01-KUG-13.2D-KUG-13.f-fk-mig.200302.gox.sgy
A44263.MKD-2N.2D.f-fk-str.200302.gox.sgy
The external line names must adhere to unix naming conventions. Spaces, dots, brackets, special characters, are not to be used! Date is now specified as year and then month, this makes more sense when the data are sorted by this field.

If the data are brute stack, the descriptor temp should be included in the stack type field. These data will automatically be removed from out system after a period of time.

The common name is be added to a 3D data set. Here is a 3D Example:

P173286.JENSEN3D.3D-JENSENMERGE.test-istk.200302.gox.sgy
If a 3D has to be broken into separate parts it should be named as follows:
P81848.PCP-MERG-003DM-08.3D-JENSEN.f-ma-mig-1of5.200203.gox.sgy
P81848.PCP-MERG-003DM-08.3D-JENSEN.f-ma-mig-2of5.200203.gox.sgy
P81848.PCP-MERG-003DM-08.3D-JENSEN.f-ma-mig-2of5.200203.gox.sgy
P81848.PCP-MERG-003DM-08.3D-JENSEN.f-ma-mig-2of5.200203.gox.sgy
P81848.PCP-MERG-003DM-08.3D-JENSEN.f-ma-mig-2of5.200203.gox.sgy
Note the 3D prefix, this will permit the 3D's to sort together in a list.

Inventory List

In order to identify data on our ftp data server and to verify that we have the complete file, an ascii text file is required containing email addresses for the processor, geophysicist, file name and file size. This file can be emailed to the Geophysicist and should exist with the data. The format for the file is date.processor_email.txt and an example file will look like:

more 04mar30.obriend@geo-x.com.txt
obriend@geo-x.com eric.keyser@encana.com P12345.MODEL1-876.2D.mig.200303.gox.sgy 1424000
obriend@geo-x.com eric.keyser@encana.com P12345.MODEL1-876.2D.mig.200303.gox.txt   14415
obriend@geo-x.com eric.keyser@encana.com P12345.MODEL1-876.2D.ro.200303.gox.sgy 1424000
obriend@geo-x.com eric.keyser@encana.com P12345.MODEL1-876.2D.ro.200303.gox.txt   14415
This file will be treated as space delimited to simplify validation scripts.

Prefix Letter

Changes to the reference numbers will be required for EnCana Workstation. We have decided to use a series of prefix characters to preserve data integrity. The new prefix codes are:
Prefix Line_ID (Reference number)
A XAEC Alberta Energy Company
P XPCE Pan Canadian
F EnCana Workstation (International)
D EnCana Workstation (Domestic)
Blank Don't know

3200 byte Descriptive Header

It is recommended that the Workstation SEGY format adopt the somewhat free format for the Descriptive (formally EBCDIC) header as defined by AEC. The PCP standard was very rigid and sometimes PCP's processing partners found there was no room to record important descriptive data. The 3200 byte ASCII Descriptive header serves two very important roles, it provides:
  1. General line information to assist with data loading to an interpretation workstation
  2. Information conventionally written on the processing side label on traditional paper sections a llowing the interpreter to browse the Descriptive header for information about a line or 3D volume. By switching to ASCII format this header can easily be viewed. SeisX already uses the last two lines to store descriptive data in ASCII format. (Actually only the last 130 bytes). Be sure to leave these fields blank or loading into SeisX will trash them out.
This header is used to store Job Identification Header (Record 1, 3200 bytes, ASCII coded block) with the following information: Here is an example of an EnCana
EBCDIC header filled out by Mark Baxter of C&C.

See how he has left the last two lines blank (SeisX will trash these locations) In order to complete the SeisX definition, the last 130 bytes of the EBCDIC header must be filled out as ASCII. Blanks are ok but the following is recommended:

Binary and Trace Header definition

It is recommended that following guidelines be used for a new segy stack standard:
  1. Start off with the SeisX segy_v3.fmt definition. All floating point numbers are stored as IEEE, today's computer standard. Note: the code -3 in the format descriptions refers to IEEE Big Endian four byte floating point (ie a float on the SUN). SeisX format files have the following advantages:
    • Format is well understood and many companies already generate this format
    • Data loading is eliminated for users of SeisX/SeisWare. Data are attached not loaded
    • Data loading is simplified for Landmark users with only one format defined
    • Check programs already exist for that format
    • Data can be previewed directly with SeisX before they are loaded
  2. Adopt the SEG Y rev1 data exchange standard, using reals instead of scaled integers
  3. Adopt the AEC standard for those fields not specified by SeisX or SEG Y rev1.
  4. The proposed new Workstation format will look like:
    • 400 bytes of the Binary (line constant) header as defined here.
    • 240 bytes of trace header definition stored here.
    • Variable length data with floating point numbers now stored as IEEE (change from IBM float). A job constant scaler is to be applied scaling the RMS for the line to the number 4000. This will enable multiple versions of the same data to be compared. Why the number 4000? Data can be converted directly to 16bit with minimum clipping.

Validation -- Check Program

A validation program is essential to ensure EnCana's standards are met. Seismic data can easily be loaded if key fields are filled out in a consistant format. Feed back from processing contractors have indicated that our prior standards (especially the EBCDIC header) were a bit much. There are two levels of checks, Lite and Heavy. The Lite check will be used for data received by thire parties and the heavy check for data from our preferred seismic partners. The Lite checks are: The Workstation heavy checks are: The exact details of the fields that will be checked are contained in the Reference section below and are labelled as column Key. Items labelled with a one will be checked and it is expect the vendor will correct his data to this standard. EnCana Workstation has provided the check.pl program to validate your data. You have the source code available and can determine exactly what the checks are!

The signature byte will enable software to automatically reformat to any future standard.


Reference

So what are we missing? This proposal contains 100% of the SeisX definition. Click here to see what is missing from our former standards.

The support documents for the AEC standard.

The support documents for the PCP segy standard. Comments from interested parties are located here.
Site Owner: Eric Keyser