Proposed XML SEG-Y Keyword Standard

August 21, 2006


Background

The SEG Y Data Exchange (rev 0) was first published in 1975 and achieved widespread usage. However, it seems that every company created their own proprietary variation. With the advent of the Workstation in the late 1980's it became obvious to anyone loading data that the SEGY standards are not being rigorously adhered to. Perhaps we are focusing on the wrong item, we should be focused on how to describe our SEGY files.

In 1987 The Canadian Society of Exploration Geophysicists (CSEG) formed a technical committee to address this issue. The C.S.E.G. Geophysical Workstation SEGY Standard was published in the October 1989 and again in June 1994 of the CSEG Recorder. Ten years later in 1999, Doug Bath published another article in the CSEG recorder outlining a GENERIC SEGY format.

The Society of Exploration Geophysicists (SEG) Standards Committee was formed and as a result, the SEG-Y (rev 1) was released in May 2002. It attempted to update the original definition for modern usage eg.(IEEE floats are now allowed for data), the textual file can now be ASCII and the concept of an Extended Textual File header was introduced (multiple blocks of 3200 bytes).

ARAM (Seismic field instruments) tells me they require a longer Line and Trace header definition. We also need larger and more accurate numbers. Dennis Meisinger publishes an interesting article in 2004, pointing out issues with floating point confusion between IEEE and IBM floats. Clearly we need a better way to label and describe our data.

The SEGY Delimma

Here are a few reasons that the SEGY rev(1) format has yet to be accepted by the industry: Here are a few reasons the Geophysical community has difficulty on agreeing on any single standard:

Possible Solutions

So what can we all agree upon? The more I talk to geophysicists the more I am coming to the realization that we geophysicists are all different and all have different takes on what a SEGY standard should look like.

Not much as it turns out, just the file layout and two fields! Some of us would change this, the headers are not long enough and 16bit numbers are not sufficient to describe number of samples or the sample interval for our modern data sets.

  1. The byte structure, 3200 bytes for Textual Description, 400 bytes for line based information and 240 bytes for traced based information.
  2. The Sample Interval, Bytes 17-18, Binary16
  3. The Number of samples, Bytes 21-22, Binary16
It's been suggested by Don Robinson (Open Spirit) and Shayne Stogrin (Zokero) that we look at XML as a better way to describe SEGY files. XML has the following advantages: So what does an XML keyword file look like? Here's a link to an XML file I created to describe those fields defined in the EnCana_101 segy format for SeisWare. This file describes all you need to know in order to extract values from EnCana's segy data sets. It even describes the length of the headers. Here are a list of some of the the attributes. All it takes is for Shayne to incorporate the XML in addition to the SeisWare keyword definition. If you think about this a little bit, this little self defining XML file can enable any length of EBCDIC, File binary or Trace header!

Here's what being recommended

  1. The CSEG and the SEG create an XML standard to describe SEGY files. Application software developers will then provide the ability to automatically capture data.
  2. Two signature bytes be declared to uniquely identify an XML keyword file.

Signature Bytes

Photon (now Paradigm) pioneered this proposal by declaring the number 91 in the Binary header byte location 399. Paradigm now uses the number 92 as their key to tell their software how to interpret data for SeisX version 4. (Note: Zokero's SeisWare uses the same signature byte)

EnCana has selected the number 101 in byte position 400 as their signature location. Data can be both SeisX/SeisWare internal format and can store information deemed important by us. Data loading has been eliminated when the SEGY headers are correct!

Note: Bytes 399-400 are currently Unassigned in the SEG Y rev 1 definition but are defined as the time to first sample in the 1994 CSEG SEGY format. Conflict! Dave Morgan (Geo-X / Divestco) suggests that we could "reserve" the multiples of 100 for use for those Processors that follow the CSEG 1994 SEGY implementation. This would ensure backwords comptability. It should be noted that Talisman and EnCana use store the time to first sample at the trace level in bytes 109-110.

The solution is simple, all we need is someone to keep track of signature bytes and documentation different SEGY definitions. Publish your standard and you get a number! Software can read these numbers and take appropriate action! (data loading can be eliminated for all of us)

To make this even better, Don Robinson and Shayne Stogrin (Zokero) have suggested that vendors provide an xml file that defines the byte locations. This would provide a great deal of flexibility and should allow different vendors to read each others SEGY files without having to write custom code for each individual standard.

Here are the numbers (signature bytes) I have to date:

  1. SeisX Version 3.x.x SEGY Byte 399=91, Byte 400=0. The original SeisX format segy_v3.fmt. Remember, it's data format is WRONG, the number 1 is used to represent 32 bit IEEE float. These same data can be read by SeisWare SeisX3 KeyWord file. These files are used by the different companies to extract information from the segy files. They are the Key!
  2. SeisX Version 4.x.x SEGY Byte 399-92, Byte 400=0. Paradigm has corrected their problem above and have selected the number 6 for 32 bit IEEE float and the number 1 for IBM float. Zokero's SeisWare has also adopted this convention. Their format description is located here. Here is SeisX 4: SeisX format segy_v4.fmt or the SeisWare Keyword file SeisWare.kwd. I just compared what is different between version 3 and version 4. Zokero has added both Elevation and Fold as additional fields, Paradigm has added only the Elevation. Both Paradigm and Zokero have extended Line Name field from 20 to 32 characters. All good ideas. Now I'd like to encourge both of them to use our trace definitions for statics and fold, survey datum, central meridian, survey grid etc...
  3. EnCana workstation SEGY Byte 399=92, Byte 400=100. EnCana has extended the SeisWare/SeisX format for numerous, additional fields trying to follow the SEGY rev(1) as closely as possible.

SEGY Documents

Here is a list of documentation for various SEGY definitions that I have been able to locate. If you know of any others that I can publish, please let me know:

Pre Stack (Field) SEGY document

Here is a link to this Web site's previous SEGY documentation, I've created a partial record of the emails re this issue. The can be reviewed correspondance you might be interested in examining.


Site Owner: Eric Keyser
Last Updated: July 25, 2006