Proposed XML SEG-Y Keyword Standard
August 21, 2006
Background
The SEG Y Data Exchange (rev 0) was first published in 1975 and achieved widespread usage. However, it seems that every company created their own proprietary variation. With the advent of the Workstation in the late 1980's it became obvious to anyone loading data that the SEGY standards are not being rigorously adhered to. Perhaps we are focusing on the wrong item, we should be focused on how to describe our SEGY files.In 1987 The Canadian Society of Exploration Geophysicists (CSEG) formed a technical committee to address this issue. The C.S.E.G. Geophysical Workstation SEGY Standard was published in the October 1989 and again in June 1994 of the CSEG Recorder. Ten years later in 1999, Doug Bath published another article in the CSEG recorder outlining a GENERIC SEGY format.
The Society of Exploration Geophysicists (SEG) Standards Committee was formed and as a result, the SEG-Y (rev 1) was released in May 2002. It attempted to update the original definition for modern usage eg.(IEEE floats are now allowed for data), the textual file can now be ASCII and the concept of an Extended Textual File header was introduced (multiple blocks of 3200 bytes).
ARAM (Seismic field instruments) tells me they require a longer Line and Trace header definition. We also need larger and more accurate numbers. Dennis Meisinger publishes an interesting article in 2004, pointing out issues with floating point confusion between IEEE and IBM floats. Clearly we need a better way to label and describe our data.
The SEGY Delimma
Here are a few reasons that the SEGY rev(1) format has yet to be accepted by the industry:Here are a few reasons the Geophysical community has difficulty on agreeing on any single standard:
- Extracting information from the Extended Textual File is more complicated than extracting information from the binary header
- The format code definitions defined in the CSEG 1987, 1989, 1994 and 1999 documents were not communicated to the SEG Technical Standards Committee.
- Individual companies had already adapted the 1975 definition to their own needs and see little benefit in change.
- The SEGY rev(0) and rev (1) definitions state the binary header will contain only numbers, 8bit, 16 bit or 32 bit. All textual information is to be stored in the 3200 byte descriptive header. Various companies have used the binary header to store both ASCII characters and IEEE floating point numbers.
- In 1989 the CSEG in Calgary confirmed the format code of 5 for 36 bit float (UNIVAC computers) and the number 6 for 32 bit IEEE float. The 2002 SEG rev(1) has selected the number five. It's already been used. Talisman in Calgary supports the SEG and uses the number 5 for IEEE float.
Possible Solutions
So what can we all agree upon? The more I talk to geophysicists the more I am coming to the realization that we geophysicists are all different and all have different takes on what a SEGY standard should look like.Not much as it turns out, just the file layout and two fields! Some of us would change this, the headers are not long enough and 16bit numbers are not sufficient to describe number of samples or the sample interval for our modern data sets.
It's been suggested by Don Robinson (Open Spirit) and Shayne Stogrin (Zokero) that we look at XML as a better way to describe SEGY files. XML has the following advantages:
- The byte structure, 3200 bytes for Textual Description, 400 bytes for line based information and 240 bytes for traced based information.
- The Sample Interval, Bytes 17-18, Binary16
- The Number of samples, Bytes 21-22, Binary16
So what does an XML keyword file look like? Here's a link to an XML file I created to describe those fields defined in the EnCana_101 segy format for SeisWare. This file describes all you need to know in order to extract values from EnCana's segy data sets. It even describes the length of the headers. Here are a list of some of the the attributes.
- Self defining format (a lot like SeisX's Photon Ascii)
- Designed to describe and exchange (share) data (business to business)
- Makes data independent of hardware, software and application
- Data parsers are readily available (and free)
All it takes is for Shayne to incorporate the XML in addition to the SeisWare keyword definition. If you think about this a little bit, this little self defining XML file can enable any length of EBCDIC, File binary or Trace header!
- Header - File, EBCDIC or Trace
- Byte - The starting byte location (numbering starts with a 0)
- Type - 16BitInteger, 32BitInteger, IEEEFP, IBMFP, ASCII, EBCDIC ...
- Scale - Scaler, usefull for shotpoints need a decimal point, use a scaler of .01
- Size - In bytes for type of ASCII or EBCDIC
Here's what being recommended
- The CSEG and the SEG create an XML standard to describe SEGY files. Application software developers will then provide the ability to automatically capture data.
- Two signature bytes be declared to uniquely identify an XML keyword file.
Signature Bytes
Photon (now Paradigm) pioneered this proposal by declaring the number 91 in the Binary header byte location 399. Paradigm now uses the number 92 as their key to tell their software how to interpret data for SeisX version 4. (Note: Zokero's SeisWare uses the same signature byte)EnCana has selected the number 101 in byte position 400 as their signature location. Data can be both SeisX/SeisWare internal format and can store information deemed important by us. Data loading has been eliminated when the SEGY headers are correct!
Note: Bytes 399-400 are currently Unassigned in the SEG Y rev 1 definition but are defined as the time to first sample in the 1994 CSEG SEGY format. Conflict! Dave Morgan (Geo-X / Divestco) suggests that we could "reserve" the multiples of 100 for use for those Processors that follow the CSEG 1994 SEGY implementation. This would ensure backwords comptability. It should be noted that Talisman and EnCana use store the time to first sample at the trace level in bytes 109-110.
The solution is simple, all we need is someone to keep track of signature bytes and documentation different SEGY definitions. Publish your standard and you get a number! Software can read these numbers and take appropriate action! (data loading can be eliminated for all of us)
To make this even better, Don Robinson and Shayne Stogrin (Zokero) have suggested that vendors provide an xml file that defines the byte locations. This would provide a great deal of flexibility and should allow different vendors to read each others SEGY files without having to write custom code for each individual standard.
Here are the numbers (signature bytes) I have to date:
- SeisX Version 3.x.x SEGY Byte 399=91, Byte 400=0. The original SeisX format segy_v3.fmt. Remember, it's data format is WRONG, the number 1 is used to represent 32 bit IEEE float. These same data can be read by SeisWare SeisX3 KeyWord file. These files are used by the different companies to extract information from the segy files. They are the Key!
- SeisX Version 4.x.x SEGY Byte 399-92, Byte 400=0. Paradigm has corrected their problem above and have selected the number 6 for 32 bit IEEE float and the number 1 for IBM float. Zokero's SeisWare has also adopted this convention. Their format description is located here. Here is SeisX 4: SeisX format segy_v4.fmt or the SeisWare Keyword file SeisWare.kwd. I just compared what is different between version 3 and version 4. Zokero has added both Elevation and Fold as additional fields, Paradigm has added only the Elevation. Both Paradigm and Zokero have extended Line Name field from 20 to 32 characters. All good ideas. Now I'd like to encourge both of them to use our trace definitions for statics and fold, survey datum, central meridian, survey grid etc...
- EnCana workstation SEGY Byte 399=92, Byte 400=100. EnCana has extended the SeisWare/SeisX format for numerous, additional fields trying to follow the SEGY rev(1) as closely as possible.
SEGY Documents
Here is a list of documentation for various SEGY definitions that I have been able to locate. If you know of any others that I can publish, please let me know:
- The SEG 1975 rev(0) and the SEG 2002 rev(1) standards are available from the SEG in Tulsa.
- 1989 CSEG SEGY Standards, published in the CSEG Recorder June, 1994.
- Talisman has provided their SEGY documentation. A quick check of their document shows that their textual header is now ASCII and they follow the Doug Bath CSEG 1994 definition with the exception of the Data sample Format Code, Talisman follows the SEG rev(1) standard and use the number 5 for IEEE float.
- EnCana has provided their SEGY documentation, it's a superset of SeisX or SeisWare and these spreadsheets are for both Pre and Post Stack SEGY data.
- EnCana 400 byte line (binary) header Spread sheet
- EnCana 240 byte (trace) header Spread sheet
- Geo-X / Divestco, for a client that does not request a specific format, Geo-X / Divestco provides a format similar to the CSEG 1994 format for post stack and the SEG format (pre-rev 1) for pre-stack. Geo-X / Divestco default SEGY
Pre Stack (Field) SEGY document
Here is a link to this Web site's previous SEGY documentation, I've created a partial record of the emails re this issue. The can be reviewed correspondance you might be interested in examining.
- Geo-X ARAM field data. ARAM uses SEGY for their internal program file format for field data just like SeisX and SeisWare use another flavour of SEGY for their internal format. The complete Geo-X ARAM documentation can be found in file Aram Aries Tape and Disk Formats V2.zip Here are a few points you will encounter with ARAM's flavour of SEGY:
- Format code of 1 means IEEE float. (We think ARAM should consider changing to the number 5 or 6 )
- Note: For multiple shots that are combined into one file, the number of samples in the binary header may be WRONG. Use the value in the Trace Header. (The value from the binary header is the value from the first shot. If it's a test shot with a different length, you have a problem)
Site Owner: Eric Keyser
Last Updated: July 25, 2006