Proposed CSEGZ Standard

May 14, 2010


May 14, 2010 Geo Canada 2000 has just finished here in Calgary for the CSEG and CSPG. I had the chance to talk with some of the Geophysical Hardware manufacturers. They all acknowledged that they typically stored data in IEEE, little endian format and had to be converted to SEGD. Any time data are converted, something is lost. If SEGY had variable length format with the ability to define your own data fields, it would make a great field recording format.

Geophysicists have been asking for a replacement to the 35 year old SEGD field format.

  1. SEGY has started to become more of a standard for field data recording. Data is archived in the number format of acquisition and is not converted into Binary Coded Decimal.
  2. Numerous free viewers are available to look at your data. I am unaware of any free SEGD viewers.
  3. A standard way to read any SEGY like format containing a Descriptive Textual Header, a binary File header, a Trace Header and Data. Headers can either be separate, providing fast access to disk data sets or interleaved with the data (current SEGY standards)
  4. The option for Variable Length, Dynamic Headers. As the field instruments become more sophisticated, space for additional numbers are required. You only need to define what you are using instead of using space in the headers. The standard needs to have the ability to evolve over time.
  5. Additional data types, unsigned integers, IEEE floats for numbers stored in the line and trace headers. Creation of CUSTOM formats to handle your number compression. The ability to handle 64 bit numbers (eg. required to accurately specify UTM XY's)
  6. Built in mapping or keyword file called the DESCRIPTOR to enable automatic data loading to the workstation. The idea is that the header entries are no longer to be defined just by the byte-location and description, but instead by a name. This is important since with the ability (or necessity) to define the byte positions where each trace header name is located in the headers, the byte locations can no longer be relied upon to define which header entry we are referring to, so the fixed set of names may well become essential!
So what was the Motivation to set up a Technical Standards Sub Committe for SEG-Z? The recommended SEGZ format can be thought of as a segy object containing a preamble that describes the format of the data and headers:

Note: The DESCRIPTOR contains an ascii description of the organization of the data that can easily be parsed.

I have spent years trying to get geophysicists to agree on a SEGY standard with little success. The recommended SEGZ format contains only those items that we can all agree upon from the original SEGY definition:

  1. A Textual File Header
  2. A Binary File Header
  3. and a Trace Header
Note: We are now recommending that the information stored is now dynamic and theycan become variable length! It is recommended that all data processors to provide the client a mapping file that contains sufficient details to enable automatically workstation data loading.

The Proposed SEGZ contains a DESCRIPTOR header, in ASCII to describe the following data. The DESCRIPTOR could be appended to the beginning of a data file or could be a separate. If it is a separate file, it has the unique suffix ".fmt"

Considerable debate has occured on the format of the descriptor. After considering XML or Yaml, it was decided that we do not require this level of flexibility and rather should go for something rigid and easily parsed. Defaults are generally not allowed. The use of defaults tend to make things more complex and less human readable/understandable. Here is what a typical DESCRIPTOR might look like:

  SEGZ Format Definition V1
# This format definition is for EnCana WORKSTATION SEGY with IEEE floating point numbers and ASCII in the headers
# Dated August 20, 2008

 SECTION FileStructureDefinition
# SECTION is the keyword identifier for the beginning of a section
# Comment - This header defines a normal SEGY file
  Name,             Length, Description
  TextualHeader,      3200, # SEGY Rev(0) Default length of character line header (bytes)
  FileHeader,          400, # Length of binary line header (bytes)
  TraceHeader,         240, # Length of trace header (bytes)
  TRACE_SAMP_FORMAT, IEEE4, # Format 6 in the binary header over rides the value in binary header
  Endian,              BIG, # Big Endian data suitable for SUN type computers
 ENDSECTION

 SECTION LineHeaderDefinition
  Name,                    Byte, Type, Count,Scalar,Addend,Description
  LINE_ID,                    1, ASCII, 8      # Line_ID (Reference Number)
  FILE_SAMP_RATE,            17,  INT2         # Sample Interval
  FILE_SAMP_NUM,             21,  INT2         # Number of data samples per trace or the maximum number of Samples
  TRACE_SAMP_FORMAT,         25,  INT2         # Data sample format code 1=IBM floating point 6=IEEE Canadian
  LENGTH_SYS,                55,  INT2         # 1=metres 2=feet
  DOMAIN,                    69,  INT2         # Domain code: 0=time 1=amplitude 3=phase 4=depth
  SURVEY_DATUM,              71, ascii, 6      # Survey Datum eg NAD27 WGS84
  SURVEY_GRID,               77, ASCII, 6      # Survey Grid eg ATS2.6
  TIME2FIRST_SAMPLE,         97,  INT2         # Time to first sample in ms
  SEIS_DATUM,               125, IEEE4         # elevation
  SEIS_REPLACEMENT_VELOCITY,129, IEEE4         # Seismic replacement velocity
  LINE_NAME,                301, ASCII,20      # Internal Line Name
  GEOMETRY,                 393,  INT2         # 2=2D line 3=3D
 ENDSECTION

  SECTION TraceHeaderDefinition
  Name,       Byte,Type
  ITRACE_LINE,   1,INT4         # Trace sequence no within line
  SOURCE_POINT, 17,IEEE4        # Energy source point number (shot peg)
  CDP,          21,INT4         # Ensemble no. (CDP CMP CRP etc)
  TRTYPE,       29,INT2,        # 1=data 2=dead 3=dummy 4=time break 5=uphole (etc 
  HORI_NSUM,    33,INT2,        # No of horizontally summed traces in this trace
  REC_HT,       41,IEEE4        # Elevation at receiver group
  SOURCE_DEPTH, 49,IEEE4        # Depth of source below surface (+ve)
  CDP_X,        81,IEEE4        # X-coordinate of ensemble (CDP) for this trace
  CDP_Y,        85,IEEE4        # CPD coordinate (Y)
  SOURCE_STATIC,99,INT2,        # SOURCE_STATIC Source static correction [ms]
  REC_STATIC,  101,INT2,        # REC_STATIC Receiver group static correction [ms]
  TOTAL_STATIC,103,INT2,        # TOTAL_STATIC Total static applied [ms]
 ENDSECTION
ENDSEGZ
Complete Processing flow, velocity functions, mutes go here
ENDFILE

This is the format that Precision Seismic used to create their segy.
Any description can be located after the ENDSEGZ, it will be ignored by the program that reads this format file!

Recommended processing flow, velocity functions can go here
...
ENDFILE
Note the ENDSEGZ keyword identifier for the end of the (format) file, any text after this flag will be ignored. This needs to be included because you will know when you have got to the end of a file. If there is no ENDSEGZ then we can assume the file is corrupt!

Note: There has been considerable debate on the syntax for describing our SEGZ format. Some of us believe that we should be using XML or YAML, others believe we should be using simplier key=value syntax. Personally, I prefer the simplier, human readable syntax described above. This syntax has evolved from the segy format files first used by Photon (SeisX) pre 1990. A syntax that has it's foundation defined and used for almost twenty years is a pretty good place to start.

The complete list of Valid Names, descriptions and probable SEGY byte positions are stored here as a csv file.

I have also updated by spread sheets that compare various interpretations of SEGY and how they would relate to SEGZ.

August 28, 2008 I am a little excited... What we are really developing is a format definition for any trace sequential data set. So far I have taken a FreeUSP data set and can extract data from the USPfile header. This program can be used to define and extract usp data! It can even convert FreeUSP to the CSEG 1994 segy standard. Here is what the start of the FreeUSP format file looks like:

SEGZ Format Definition V1
 SECTION FileStructureDefinition
# SECTION is the keyword identifier for the beginning of a section
# Comment - This header defines a normal SEGY file
  Name,             Length, Type, Description
  TextualHeader,         0        # SEGY Rev(0) Default length of character line header (bytes), not used with FreeUSP
  FileHeader,            1, INT4  # Length of binary line header (bytes) GreenWord 1, Includes the HLH descriptive header
  TraceHeaderData,       1, INT4  # Length of trace header (bytes) GreenWord 2, Includes both the trace header and the data length
  TRACE_SAMP_FORMAT, IEEE4,       # 
  Endian,              BIG,       # Big Endian data suitable for SUN type computers
 ENDSECTION

 SECTION LineHeaderDefinition
  Name,                    Byte, Type, Count
  PrcDat,                   25, ASCII, 8      # PrcDat
  LINE_ID,                  33, ASCII, 8      # OACLine Line_ID (Reference Number) 
  LINE_NAME,                41, ASCII, 8      # JobNum Line_ID (Reference Number)
  NUM_TRACES,               49,  INT4         # NumTrc Number of traces in file
  NUM_REC,                  53,  INT4         # NumRec Number of traces in file
  FILE_SAMP_NUM,            57,  INT4         # SmpInt Number of data samples per trace or the maximum number of Samples
 ENDSECTION
ENDSEGZ
August 20, 2008 I have created a series of real examples that cover a range of different interpretations that I have observed over the years. I have placed into the public domain the Perl Script I have used to parse a segy file, extract and display header values. Just follow this link Please note that I need to update these examples with the thinking shown above!

August 19, 2008 Added support to store dual projections in the trace header. It is felt that WGS84 projection will be stored in addition to the local coordinate system. This will facilitate the support for ESRI products and even Google Earth!

August 18, 2008 ARAM just provided me with their documentation for their software version 3. The examples they provided were disk datasets (little endian with ASCII in the TRACE headers). The number format code is now the number 5, as declared by the SEGY rev(1). Follow the following links to examine these field data.

  1. ARAM DISK Format - I have coded up most of the ARAM data fields aram_disk.fmt This is the file that I a recommending be provided along with the segy data set. It's the segy definition and tells you how to read the file!
  2. The ARAM test DISK segy data file 00000007.sgy (right click and download this file if you like)
  3. The output from my segz2segy.pl script 00000007.txt
August 13, 2008 My script will now read ARAM field data as well. Yous can now easily check any data fields you want from ARAM segy field data. Just follow this link! Please note that the number of samples in this data example has been set to 20,001 in the Field header and the first thirty or so traces use that value. It then switches to 23,001 samples and then back to 3001 samples. The correlate flag switches from 1 to 2! I also checked the ARAM shot record and found it is not complete, some of the fields in their documentation are not filled out. This is a very good reason why ARAM should provide the SEGZ format file to declare the format of the data. This example demonstrates how EBCDIC data can be extraction from the ARAM TAPE trace headers. ARAM is sending me more test data. I will make sure that my script can decode any and all fields!

August 8, 2008 I have made some good progress, I have now added the code to automatically detect if the data are BIG or LITTLE endian and do the necessary byte swaps. My example is now a 3D that I snitched from a SeisWare little endian data set. The format file is here. Here is what the script writes out:

-Input SEGZ format name = _P107217_MRG_3D_CRP.FMIG.0.fmt

SEGZ Version number = V1

SEGZ  Header 0  NAME                      LENGTH  Description
SEGZ  Header 1  TEXTUAL-HEADER            3200    SEGY Rev(0) Default length of character line header (bytes)
SEGZ  Header 2  FILE-HEADER               400     Length of binary line header (bytes)
SEGZ  Header 3  TRACE-HEADER              240     Length of trace header (bytes)
SEGZ  Header 4  TRACE_SAMP_FORMAT         IBM4    Format 1 in the binary header over rides the value in binary header
SEGZ  Header 5  ENDIANESS                 LITTLE  Big Endian data suitable for SUN type computers

FILE  Header 0  NAME                           BYTE   TYPE   VECTOR SCALAR ADDEND Description
FILE  Header 1  FILE_SAMP_RATE                 17     INT2   1      1      0      Sample Interval
FILE  Header 2  FILE_SAMP_NUM                  21     INT2   1      1      0      Number of data samples per trace or the maximum number of Samples
FILE  Header 3  TRACE_SAMP_FORMAT              25     INT2   1      1      0      Data sample format code 1=IBM floating point
FILE  Header 4  SEIS_DATUM                     125    IEEE4  1      1      0      elevation
FILE  Header 5  SEIS_REPLACEMENT_VELOCITY      129    IEEE4  1      1      0      Seismic replacement velocity
FILE  Header 6  LINE_NAME                      301    ASCII  32     1      0      Internal Line Name
FILE  Header 7  GEOMETRY                       393    INT4   1      1      0      2=2D line 3=3D

TRACE Header 0  NAME                           BYTE   TYPE   VECTOR SCALAR ADDEND Description
TRACE Header 1  INLINE                         9      INT4   1      1      0      Ensemble no. (CDP CMP CRP etc)
TRACE Header 2  CROSSLINE                      13     INT4   1      1      0      Ensemble no. (CDP CMP CRP etc)
TRACE Header 3  ITRACE_FILE                    25     INT4   1      1      0      Trace sequence no within this file
TRACE Header 4  CDP_X                          81     IEEE4  1      1      0      X-coordinate of ensemble (CDP) for this trace
TRACE Header 5  CDP_Y                          85     IEEE4  1      1      0      CPD coordinate (Y)

TEXTUAL_HEADER =   3200
FILE_HEADER    =    400
TRACE_HEADER   =    240

-Input segy/segz data file  = _P107217_MRG_3D_CRP.FMIG.0.sgy
-Output 1994 CSEG SEGY file = _P107217_MRG_3D_CRP.FMIG.0.csgy

File is LITTLE endian

                FILE_SAMP_RATE=           2000
                 FILE_SAMP_NUM=             10
             TRACE_SAMP_FORMAT=              3
                    SEIS_DATUM=            800
     SEIS_REPLACEMENT_VELOCITY=           3100
                     LINE_NAME=_P107217_MRG_3D_CRP
                      GEOMETRY=              3

     INLINE   CROSSLINE ITRACE_FILE       CDP_X       CDP_Y 
     407.00      169.00        1.00   338630.00  6101030.00 
     407.00      170.00        2.00   338628.00  6100968.00 
     407.00      171.00        3.00   338626.00  6100905.00 
 ...
     408.00      143.00       40.00   338647.00  6102656.00 
     408.00      153.00       50.00   338625.00  6102031.00 
     408.00      163.00       60.00   338604.00  6101406.00 
     408.00      173.00       70.00   338582.00  6100782.00 
     408.00      183.00       80.00   338560.00  6100157.00 
     408.00      193.00       90.00   338538.00  6099533.00 
 ...
     498.00      173.00     7000.00   334984.00  6100907.00 
     511.00      172.00     8000.00   334466.00  6100988.00 
     524.00      171.00     9000.00   333949.00  6101069.00 
     530.00      206.00     9497.00   333633.00  6098891.00 

End of File _P107217_MRG_3D_CRP.FMIG.0.sgy
I am now working on defining the format file to read ARAM SEGY data. It's a bit of a trick because ARAM changes the number of samples per trace. This has caused some seismic processors grief!

August 6, 2008 I am now thinking that this format file should also contain a detailed text description for acquisition and processing details to be stored after the ENDSEGZ statement. Velocity Time-RMS card images or mute intervals could be stored as additional details in this file. I will provide some more examples as time permits.

July 28, 2008 I have written a perl script segz2segy.pl that will reformat a SEGZ file into the 1994 CSEGY file format. (numbers are either 16 or 32 bit integers and shot points are scaled by 1000. This format has been in common use here in Calgary. It was very easy to write a parser to handle the above format example. Check the source code above! I now need to add floating point numbers and to assemble several data examples.

Here is an example of a SEGY file that I have recently parsed. As the first step, I have made a list of all the format files that accompany the sgy, segy or segz files in the file called junk:

ls *fmt > junk
 ~ekeyser/Perl/segz2segy.pl | more

-Input SEGZ format name = A2370.65148.gox99.ustr.CD-08334.fmt
SEGZ Version number = V1
SEGZ  Header 0  NAME                      LENGTH  Description
SEGZ  Header 1  TEXTUAL-HEADER            3200    SEGY Rev(0) Default length of character line header (bytes)
SEGZ  Header 2  FILE-HEADER               400     Length of binary line header (bytes)
SEGZ  Header 3  TRACE-HEADER              240     Length of trace header (bytes)
SEGZ  Header 4  TRACE_SAMP_FORMAT         IBM4    Format 1 in the binary header over rides the value in binary header
SEGZ  Header 5  ENDIANESS                 BIG     Big Endian data suitable for SUN type computers

FILE  Header 0  NAME                           BYTE   TYPE   VECTOR SCALAR ADDEND Description
FILE  Header 1  FILE_SAMP_RATE                 17     INT2   1      1      0      Sample Interval
FILE  Header 2  FILE_SAMP_NUM                  21     INT2   1      1      0      Number of data samples per trace or the maximum number of Samples
FILE  Header 3  TRACE_SAMP_FORMAT              25     INT2   1      1      0      Data sample format code 1=IBM floating point
FILE  Header 4  LENGTH_SYS                     55     INT2   1      1      0      1=metres 2=feet
FILE  Header 5  TIME2FIRST_SAMPLE              399    INT2   1      1      0      Time to first sample in ms

TRACE Header 0  NAME                           BYTE   TYPE   VECTOR SCALAR ADDEND Description
TRACE Header 1  ITRACE_FILE                    5      INT4   1      1      0      Trace sequence no within this file
TRACE Header 2  SOURCE_POINT                   17     INT4   1      0.001  0      Energy source point number (shot peg)
TRACE Header 3  CDP                            21     INT4   1      1      0      Ensemble no. (CDP CMP CRP etc)
TRACE Header 4  HORI_NSUM                      33     INT2   1      1      0      No of horizontally summed traces in this trace
TRACE Header 5  REC_HT                         41     INT4   1      1      0      Elevation at receiver group
TRACE Header 6  SOURCE_HT                      45     INT4   1      1      0      Surface elevation at source point
TRACE Header 7  SOURCE_DEPTH                   49     INT4   1      1      0      Depth of source below surface (+ve)
TRACE Header 8  COORD_SCALAR                   71     INT2   1      1      0      Scalar for spatial coordinates (SOURCE_X to REC_Y)
TRACE Header 9  CDP_X                          81     INT4   1      1      0      X-coordinate of ensemble (CDP) for this trace
TRACE Header 10 CDP_Y                          85     INT4   1      1      0      CPD coordinate (Y)
TRACE Header 11 COORD_UNITS                    89     INT2   1      1      0      1=length (metres/feet) 2=lat/Lon 3=Decimal degrees 4=DMS 
TRACE Header 12 SOURCE_STATIC                  99     INT2   1      1      0      SOURCE_STATIC Source static correction [ms]
TRACE Header 13 REC_STATIC                     101    INT2   1      1      0      REC_STATIC Receiver group static correction [ms]
TRACE Header 14 TOTAL_STATIC                   103    INT2   1      1      0      TOTAL_STATIC Total static applied [ms]

TEXTUAL_HEADER =   3200
FILE_HEADER    =    400
TRACE_HEADER   =    240

-Input segy/segz data file  = A2370.65148.gox99.ustr.CD-08334.sgy
-Output 1994 CSEG SEGY file = A2370.65148.gox99.ustr.CD-08334.csgy

                FILE_SAMP_RATE=        2000.00
                 FILE_SAMP_NUM=        1500.00
             TRACE_SAMP_FORMAT=           1.00
                    LENGTH_SYS=           1.00
             TIME2FIRST_SAMPLE=        -100.00

ITRACE_FILE SOURCE_POINT         CDP   HORI_NSUM      REC_HT   SOURCE_HT SOURCE_DEPTH COORD_SCALAR       CDP_X       CDP_Y COORD_UNITS SOURCE_STATIC  REC_STATIC TOTAL_STATIC 
       1.00     1002.00        3.00        1.00      703.00      703.00        0.00        1.00   534487.00  6187685.00        1.00        0.00       68.00      154.00 
       2.00     1002.50        4.00        2.00      703.00      703.00        0.00        1.00   534486.00  6187671.00        1.00        0.00       68.00      154.00 
       3.00     1003.00        5.00        2.00      702.00      702.00        0.00        1.00   534486.00  6187657.00        1.00        0.00       68.00      153.00 
       4.00     1003.50        6.00        1.00      702.00      702.00        0.00        1.00   534485.00  6187643.00        1.00        0.00       66.00      151.00 
       5.00     1004.00        7.00        2.00      702.00      702.00       16.00        1.00   534484.00  6187629.00        1.00       85.00       63.00      148.00 
       6.00     1004.50        8.00        2.00      702.00      702.00        0.00        1.00   534483.00  6187615.00        1.00        0.00       62.00      147.00 
       7.00     1005.00        9.00        3.00      702.00      702.00        0.00        1.00   534482.00  6187601.00        1.00        0.00       61.00      146.00 
       8.00     1005.50       10.00        4.00      703.00      703.00        0.00        1.00   534481.00  6187587.00        1.00        0.00       65.00      151.00 
       9.00     1006.00       11.00        4.00      703.00      703.00        0.00        1.00   534481.00  6187573.00        1.00        0.00       69.00      155.00 
      10.00     1006.50       12.00        3.00      703.00      703.00        0.00        1.00   534480.00  6187559.00        1.00        0.00       69.00      155.00 
      20.00     1011.50       22.00        8.00      704.00      704.00        0.00        1.00   534471.00  6187419.00        1.00        0.00       66.00      153.00 
      30.00     1016.50       32.00        9.00      704.00      704.00        0.00        1.00   534463.00  6187279.00        1.00        0.00       68.00      155.00 
      40.00     1021.50       42.00       12.00      704.00      704.00        0.00        1.00   534455.00  6187140.00        1.00        0.00       66.00      154.00 
      50.00     1026.50       52.00       14.00      703.00      703.00        0.00        1.00   534446.00  6187000.00        1.00        0.00       65.00      150.00 
      60.00     1031.50       62.00       13.00      702.00      702.00        0.00        1.00   534438.00  6186860.00        1.00        0.00       70.00      156.00 
      70.00     1036.50       72.00       16.00      700.00      700.00        0.00        1.00   534430.00  6186721.00        1.00        0.00       69.00      156.00 
      80.00     1041.50       82.00       13.00      699.00      699.00        0.00        1.00   534421.00  6186581.00        1.00        0.00       73.00      161.00 
      90.00     1046.50       92.00       14.00      699.00      699.00        0.00        1.00   534412.00  6186441.00        1.00        0.00       73.00      161.00 
     100.00     1051.50      102.00       15.00      698.00      698.00        0.00        1.00   534404.00  6186302.00        1.00        0.00       73.00      161.00 
     200.00     1101.50      202.00       14.00      679.00      679.00        0.00        1.00   534314.00  6184905.00        1.00        0.00       82.00      178.00 
     300.00     1151.50      302.00       14.00      675.00      675.00        0.00        1.00   534223.00  6183509.00        1.00        0.00       50.00      133.00 
     400.00     1201.50      402.00       16.00      656.00      656.00        0.00        1.00   534125.00  6182113.00        1.00        0.00       70.00      168.00 
     500.00     1251.50      502.00       12.00      632.00      632.00        0.00        1.00   534022.00  6180717.00        1.00        0.00       87.00      183.00 
     585.00     1294.00      587.00        1.00      630.00      630.00        0.00        1.00   533934.00  6179531.00        1.00        0.00       84.00      174.00 
End of File A2370.65148.gox99.ustr.CD-08334.sgy
Note: This script has to run under a Little Endian Linux box. At present it assumes all data are BIG (SUN like) Endian byte order

What I am currently working on is the ability to handle Little Endian SEGY data (the data needs to be byte swapped).

Note: It would be really simple to add an export function to be able to dump any field into a tab delimeted excel type spreadsheet format. All the user would have to do is specify the format of the field being dumped. I'd add a parameter option to the program to dump the file. If someone asks, I will do the coding!

Check on this web sites for updates.

Introduction

This web site describes a new seismic data exchange format which we propose to name SEG-Z, and which is intended to be largely a replacement for SEG-Y. SEG-Z is clearly modelled on SEG-Y : there are many similarities in terms of terminolgy and file structure, and in fact the majority of SEG-Y files can be expressed as SEG-Z files, with the addition of a separate text file that describes where data is located within the file. (eg. linename.sgz, linename.sgztxt)

SEG-Y has been both one of the most successful seismic data exchange formats and also the most broken. It must be deemed successful in terms of its uptake throughout the seismic processing and interpretation industry, but the format itself could be deemed a failure since many (perhaps the majority of) so-called SEG-Y files do not follow the format correctly. Loading of data from SEG-Y files has consequently rarely been the straightforward process it was intended to be. Contrast the reading in of a SEG-Y file with other common file formats (for example, Jpeg, TIFF, PDF or Mpeg) - todays file formats should load seamlessly without the user having to know any details of the format itself; this is certainly not generally true of SEG-Y.

Part of the reason for the failure of SEG-Y was it's inflexibility : the format decreed specific byte locations for the various bits of information, and fixed byte-lengths for the components of the file structure. Consequently when users needed to add new types of information into the file beyond what was originally envisaged, it could only be done by breaking the format. Most processing houses had to do this, and all did it in different ways, leading to a sub-industry dedicated solely to seismic data loading. Another key failure was the inability of SEG-Y to be able to incorporate new ways of storing the data (for example, only 4-byte IBM-format floating point values were supported).

A key difference between SEG-Z and SEG-Y is that whilst the older format decrees what information can be placed in the file, and where it has to go, the new format defines a set of rules about how to describe where data has been placed. The user then has flexibility to add more information to the file without breaking the format. The other key difference is that the components of the file structure (specifically, the header lengths) no longer have a proscribed size, but can be any length, with the lengths of the components being specified at the start of the file or potentially as separate files.

Design Aims

Our aims in designing SEG-Z have been to provide the industry with a seismic data exchange format that is : In short, we wish to remove the Tower of Babel errected by SEG-Y, and let Geophysicists get on with their real job of seismic processing or interpretation.

The desirability of being able to read the majority of existing SEG-Y files as SEG-Z files may seem odd at first sight. However many of those SEG-Y files are not "true SEG-Y", so require custom enhancements to the reading software to load correctly. As the new SEG-Z format is able to describe the contents of the file precisely, most SEG-Y files will read correctly via SEG-Z reading software, with the addition of a separate descriptive text file. These text files should then be easily available to those who want to read in "true SEG-Y".

We intend only that SEG-Z be able to read "most" SEG-Y files, for two reasons. Firstly, if SEG-Z included support for all known datatypes used in SEG-Y over the last 30 years (3-byte floats is just one example), valid reading software should in principle be able to support all of those data types. We feel this could present a barrier to widespread acceptance of the format. Secondly, there are probably many datatypes used historically which are essentially unknown, or at least forgotten, and hence cannot be included. We do provide a mechanism within SEG-Z to be able to describe files which fall outside the format, so that at least the reading software can fail cleanly with a useful message to the user.

SEGZ Rules

SEG-Z Assumptions

There are a few assumptions inherent in the proposed SEG-Z format :

Separate DESCRIPTOR files to read SEG-Y

Existing SEG-Y files can be coded as SEG-Z-formatted files by specifying an external DESCRIPTOR file. This file will contain header length declarations that match the SEG-Y header lengths, and will define the byte positions for all header entries defined by the SEG-Y standard. Different DESCRIPTOR files are available reading for both Revision 0 and Revision 1 SEG-Y files.

It is anticipated that most SEG-Z reading software will give the user the option of specifying a separate DESCRIPTOR file. If the data file to be read does not start with "SEGZ" it should be presumed to be a SEG-Y formatted file, and read using the descriptions in the descriptor provided. If the data files to be read starts with "SEGZ" it should be assumed that both the Descriptor and the Identification header contained in the file is incorrect, and the information in the separate descriptor file should be used instead.

The Descriptor file must also contain the "ByteOrdering" definition, which would normally be present in the Identification header, which would be absent in the case of a SEGY file.

Identifying Header

The Identifcation header at the start of the file is SEGZ-Format-Definition-V1(coded in ASCII). The same byte ordering must be used throughout the file. Bytes 7-8 contains the 16bit version of the number 49 (the ASCII number "1"). Bytes 9 to 12 hold the length of the preamble, in bytes, expressed as a 4-byte 2's complement unsigned integer. The byte order is contained within the SECTION SEGZ-parameters section as per below:
 Name,                Type, Description
 Endianess,            BIG, Big Endian, data suitable for SUN type computers
 Endianess,         LITTLE, Little Endian, data suitable for PC type computers
The major focus of this committee (the HARD part) will be the creation of a standard set of names. This is where the RODE encapsulation format failed, it was too flexible! If you can't agree to a standard name, then please provide an alias table for the rest of us. The complete list of Valid Names, descriptions and probable SEGY byte positions are
stored here.

Here is a list of proposed number(character) formats.

Data Type Description
INT1 1-byte integer
INT2 2-byte integer
INT4 4-byte integer
INT8 8-byte integer
UINT1 1-byte unsigned integer
UINT2 2-byte unsigned integer
UINT4 4-byte unsigned integer
UINT8 8-byte unsigned integer
BCD2 2-Byte Binary Coded Decimal
BCD4 4-Byte Binary Coded Decimal
IEEE4 4-byte IEEE float
IEEE8 8-byte IEEE float
IBM4 4-byte IBM float
ASCII American Standard Code for Information Exchange
EBCDIC Extended Binary Coded Decimal Interchange Code
UNICODE Universal character set

The Data types are straight forward and can even handle 64 bit numbers!

PS: For the past six months we have had a working sub committee of the following individuals:

Significant input has been provided from the following individuals: Here is a link to the previous versions of the documentation, a lot has evolved over time!
Site Owner: Eric Keyser