Proposed CSEGZ Standard
May 14, 2010
May 14, 2010 Geo Canada 2000 has just finished here in Calgary for the CSEG and CSPG. I had the chance to talk with some of the Geophysical Hardware manufacturers. They all acknowledged that they typically stored data in IEEE, little endian format and had to be converted to SEGD. Any time data are converted, something is lost. If SEGY had variable length format with the ability to define your own data fields, it would make a great field recording format.Geophysicists have been asking for a replacement to the 35 year old SEGD field format.
So what was the Motivation to set up a Technical Standards Sub Committe for SEG-Z? The recommended SEGZ format can be thought of as a segy object containing a preamble that describes the format of the data and headers:
- SEGY has started to become more of a standard for field data recording. Data is archived in the number format of acquisition and is not converted into Binary Coded Decimal.
- Numerous free viewers are available to look at your data. I am unaware of any free SEGD viewers.
- A standard way to read any SEGY like format containing a Descriptive Textual Header, a binary File header, a Trace Header and Data. Headers can either be separate, providing fast access to disk data sets or interleaved with the data (current SEGY standards)
- The option for Variable Length, Dynamic Headers. As the field instruments become more sophisticated, space for additional numbers are required. You only need to define what you are using instead of using space in the headers. The standard needs to have the ability to evolve over time.
- Additional data types, unsigned integers, IEEE floats for numbers stored in the line and trace headers. Creation of CUSTOM formats to handle your number compression. The ability to handle 64 bit numbers (eg. required to accurately specify UTM XY's)
- Built in mapping or keyword file called the DESCRIPTOR to enable automatic data loading to the workstation. The idea is that the header entries are no longer to be defined just by the byte-location and description, but instead by a name. This is important since with the ability (or necessity) to define the byte positions where each trace header name is located in the headers, the byte locations can no longer be relied upon to define which header entry we are referring to, so the fixed set of names may well become essential!
- Add a keyword, mapping file (DESCRIPTOR) to the front of every segy that can describe any adaptation of SEGY since the original 1975 definition.
- Provide a new format that easily be extended to handle the needs of the future, ie variable header lengths, dynamic field definitions, trace header stored as a separate index file for fast access.
Note: The DESCRIPTOR contains an ascii description of the organization of the data that can easily be parsed.
I have spent years trying to get geophysicists to agree on a SEGY standard with little success. The recommended SEGZ format contains only those items that we can all agree upon from the original SEGY definition:
Note: We are now recommending that the information stored is now dynamic and theycan become variable length! It is recommended that all data processors to provide the client a mapping file that contains sufficient details to enable automatically workstation data loading.
- A Textual File Header
- A Binary File Header
- and a Trace Header
The Proposed SEGZ contains a DESCRIPTOR header, in ASCII to describe the following data. The DESCRIPTOR could be appended to the beginning of a data file or could be a separate. If it is a separate file, it has the unique suffix ".fmt"
Considerable debate has occured on the format of the descriptor. After considering XML or Yaml, it was decided that we do not require this level of flexibility and rather should go for something rigid and easily parsed. Defaults are generally not allowed. The use of defaults tend to make things more complex and less human readable/understandable. Here is what a typical DESCRIPTOR might look like:
SEGZ Format Definition V1 # This format definition is for EnCana WORKSTATION SEGY with IEEE floating point numbers and ASCII in the headers # Dated August 20, 2008 SECTION FileStructureDefinition # SECTION is the keyword identifier for the beginning of a section # Comment - This header defines a normal SEGY file Name, Length, Description TextualHeader, 3200, # SEGY Rev(0) Default length of character line header (bytes) FileHeader, 400, # Length of binary line header (bytes) TraceHeader, 240, # Length of trace header (bytes) TRACE_SAMP_FORMAT, IEEE4, # Format 6 in the binary header over rides the value in binary header Endian, BIG, # Big Endian data suitable for SUN type computers ENDSECTION SECTION LineHeaderDefinition Name, Byte, Type, Count,Scalar,Addend,Description LINE_ID, 1, ASCII, 8 # Line_ID (Reference Number) FILE_SAMP_RATE, 17, INT2 # Sample Interval FILE_SAMP_NUM, 21, INT2 # Number of data samples per trace or the maximum number of Samples TRACE_SAMP_FORMAT, 25, INT2 # Data sample format code 1=IBM floating point 6=IEEE Canadian LENGTH_SYS, 55, INT2 # 1=metres 2=feet DOMAIN, 69, INT2 # Domain code: 0=time 1=amplitude 3=phase 4=depth SURVEY_DATUM, 71, ascii, 6 # Survey Datum eg NAD27 WGS84 SURVEY_GRID, 77, ASCII, 6 # Survey Grid eg ATS2.6 TIME2FIRST_SAMPLE, 97, INT2 # Time to first sample in ms SEIS_DATUM, 125, IEEE4 # elevation SEIS_REPLACEMENT_VELOCITY,129, IEEE4 # Seismic replacement velocity LINE_NAME, 301, ASCII,20 # Internal Line Name GEOMETRY, 393, INT2 # 2=2D line 3=3D ENDSECTION SECTION TraceHeaderDefinition Name, Byte,Type ITRACE_LINE, 1,INT4 # Trace sequence no within line SOURCE_POINT, 17,IEEE4 # Energy source point number (shot peg) CDP, 21,INT4 # Ensemble no. (CDP CMP CRP etc) TRTYPE, 29,INT2, # 1=data 2=dead 3=dummy 4=time break 5=uphole (etc HORI_NSUM, 33,INT2, # No of horizontally summed traces in this trace REC_HT, 41,IEEE4 # Elevation at receiver group SOURCE_DEPTH, 49,IEEE4 # Depth of source below surface (+ve) CDP_X, 81,IEEE4 # X-coordinate of ensemble (CDP) for this trace CDP_Y, 85,IEEE4 # CPD coordinate (Y) SOURCE_STATIC,99,INT2, # SOURCE_STATIC Source static correction [ms] REC_STATIC, 101,INT2, # REC_STATIC Receiver group static correction [ms] TOTAL_STATIC,103,INT2, # TOTAL_STATIC Total static applied [ms] ENDSECTION ENDSEGZ Complete Processing flow, velocity functions, mutes go here ENDFILE This is the format that Precision Seismic used to create their segy. Any description can be located after the ENDSEGZ, it will be ignored by the program that reads this format file! Recommended processing flow, velocity functions can go here ... ENDFILENote the ENDSEGZ keyword identifier for the end of the (format) file, any text after this flag will be ignored. This needs to be included because you will know when you have got to the end of a file. If there is no ENDSEGZ then we can assume the file is corrupt!Note: There has been considerable debate on the syntax for describing our SEGZ format. Some of us believe that we should be using XML or YAML, others believe we should be using simplier key=value syntax. Personally, I prefer the simplier, human readable syntax described above. This syntax has evolved from the segy format files first used by Photon (SeisX) pre 1990. A syntax that has it's foundation defined and used for almost twenty years is a pretty good place to start.
The complete list of Valid Names, descriptions and probable SEGY byte positions are stored here as a csv file.
I have also updated by spread sheets that compare various interpretations of SEGY and how they would relate to SEGZ.
- 400 byte line (binary) header Spread sheet (updated Aug 7, 2008)
- 240 byte trace header Spread sheet (updated Aug 7, 2008)
August 28, 2008 I am a little excited... What we are really developing is a format definition for any trace sequential data set. So far I have taken a FreeUSP data set and can extract data from the USPfile header. This program can be used to define and extract usp data! It can even convert FreeUSP to the CSEG 1994 segy standard. Here is what the start of the FreeUSP format file looks like:
SEGZ Format Definition V1 SECTION FileStructureDefinition # SECTION is the keyword identifier for the beginning of a section # Comment - This header defines a normal SEGY file Name, Length, Type, Description TextualHeader, 0 # SEGY Rev(0) Default length of character line header (bytes), not used with FreeUSP FileHeader, 1, INT4 # Length of binary line header (bytes) GreenWord 1, Includes the HLH descriptive header TraceHeaderData, 1, INT4 # Length of trace header (bytes) GreenWord 2, Includes both the trace header and the data length TRACE_SAMP_FORMAT, IEEE4, # Endian, BIG, # Big Endian data suitable for SUN type computers ENDSECTION SECTION LineHeaderDefinition Name, Byte, Type, Count PrcDat, 25, ASCII, 8 # PrcDat LINE_ID, 33, ASCII, 8 # OACLine Line_ID (Reference Number) LINE_NAME, 41, ASCII, 8 # JobNum Line_ID (Reference Number) NUM_TRACES, 49, INT4 # NumTrc Number of traces in file NUM_REC, 53, INT4 # NumRec Number of traces in file FILE_SAMP_NUM, 57, INT4 # SmpInt Number of data samples per trace or the maximum number of Samples ENDSECTION ENDSEGZAugust 20, 2008 I have created a series of real examples that cover a range of different interpretations that I have observed over the years. I have placed into the public domain the Perl Script I have used to parse a segy file, extract and display header values. Just follow this link Please note that I need to update these examples with the thinking shown above!August 19, 2008 Added support to store dual projections in the trace header. It is felt that WGS84 projection will be stored in addition to the local coordinate system. This will facilitate the support for ESRI products and even Google Earth!
August 18, 2008 ARAM just provided me with their documentation for their software version 3. The examples they provided were disk datasets (little endian with ASCII in the TRACE headers). The number format code is now the number 5, as declared by the SEGY rev(1). Follow the following links to examine these field data.
August 13, 2008 My script will now read ARAM field data as well. Yous can now easily check any data fields you want from ARAM segy field data. Just follow this link! Please note that the number of samples in this data example has been set to 20,001 in the Field header and the first thirty or so traces use that value. It then switches to 23,001 samples and then back to 3001 samples. The correlate flag switches from 1 to 2! I also checked the ARAM shot record and found it is not complete, some of the fields in their documentation are not filled out. This is a very good reason why ARAM should provide the SEGZ format file to declare the format of the data. This example demonstrates how EBCDIC data can be extraction from the ARAM TAPE trace headers. ARAM is sending me more test data. I will make sure that my script can decode any and all fields!
- ARAM DISK Format - I have coded up most of the ARAM data fields aram_disk.fmt This is the file that I a recommending be provided along with the segy data set. It's the segy definition and tells you how to read the file!
- The ARAM test DISK segy data file 00000007.sgy (right click and download this file if you like)
- The output from my segz2segy.pl script 00000007.txt
August 8, 2008 I have made some good progress, I have now added the code to automatically detect if the data are BIG or LITTLE endian and do the necessary byte swaps. My example is now a 3D that I snitched from a SeisWare little endian data set. The format file is here. Here is what the script writes out:
-Input SEGZ format name = _P107217_MRG_3D_CRP.FMIG.0.fmt SEGZ Version number = V1 SEGZ Header 0 NAME LENGTH Description SEGZ Header 1 TEXTUAL-HEADER 3200 SEGY Rev(0) Default length of character line header (bytes) SEGZ Header 2 FILE-HEADER 400 Length of binary line header (bytes) SEGZ Header 3 TRACE-HEADER 240 Length of trace header (bytes) SEGZ Header 4 TRACE_SAMP_FORMAT IBM4 Format 1 in the binary header over rides the value in binary header SEGZ Header 5 ENDIANESS LITTLE Big Endian data suitable for SUN type computers FILE Header 0 NAME BYTE TYPE VECTOR SCALAR ADDEND Description FILE Header 1 FILE_SAMP_RATE 17 INT2 1 1 0 Sample Interval FILE Header 2 FILE_SAMP_NUM 21 INT2 1 1 0 Number of data samples per trace or the maximum number of Samples FILE Header 3 TRACE_SAMP_FORMAT 25 INT2 1 1 0 Data sample format code 1=IBM floating point FILE Header 4 SEIS_DATUM 125 IEEE4 1 1 0 elevation FILE Header 5 SEIS_REPLACEMENT_VELOCITY 129 IEEE4 1 1 0 Seismic replacement velocity FILE Header 6 LINE_NAME 301 ASCII 32 1 0 Internal Line Name FILE Header 7 GEOMETRY 393 INT4 1 1 0 2=2D line 3=3D TRACE Header 0 NAME BYTE TYPE VECTOR SCALAR ADDEND Description TRACE Header 1 INLINE 9 INT4 1 1 0 Ensemble no. (CDP CMP CRP etc) TRACE Header 2 CROSSLINE 13 INT4 1 1 0 Ensemble no. (CDP CMP CRP etc) TRACE Header 3 ITRACE_FILE 25 INT4 1 1 0 Trace sequence no within this file TRACE Header 4 CDP_X 81 IEEE4 1 1 0 X-coordinate of ensemble (CDP) for this trace TRACE Header 5 CDP_Y 85 IEEE4 1 1 0 CPD coordinate (Y) TEXTUAL_HEADER = 3200 FILE_HEADER = 400 TRACE_HEADER = 240 -Input segy/segz data file = _P107217_MRG_3D_CRP.FMIG.0.sgy -Output 1994 CSEG SEGY file = _P107217_MRG_3D_CRP.FMIG.0.csgy File is LITTLE endian FILE_SAMP_RATE= 2000 FILE_SAMP_NUM= 10 TRACE_SAMP_FORMAT= 3 SEIS_DATUM= 800 SEIS_REPLACEMENT_VELOCITY= 3100 LINE_NAME=_P107217_MRG_3D_CRP GEOMETRY= 3 INLINE CROSSLINE ITRACE_FILE CDP_X CDP_Y 407.00 169.00 1.00 338630.00 6101030.00 407.00 170.00 2.00 338628.00 6100968.00 407.00 171.00 3.00 338626.00 6100905.00 ... 408.00 143.00 40.00 338647.00 6102656.00 408.00 153.00 50.00 338625.00 6102031.00 408.00 163.00 60.00 338604.00 6101406.00 408.00 173.00 70.00 338582.00 6100782.00 408.00 183.00 80.00 338560.00 6100157.00 408.00 193.00 90.00 338538.00 6099533.00 ... 498.00 173.00 7000.00 334984.00 6100907.00 511.00 172.00 8000.00 334466.00 6100988.00 524.00 171.00 9000.00 333949.00 6101069.00 530.00 206.00 9497.00 333633.00 6098891.00 End of File _P107217_MRG_3D_CRP.FMIG.0.sgyI am now working on defining the format file to read ARAM SEGY data. It's a bit of a trick because ARAM changes the number of samples per trace. This has caused some seismic processors grief!August 6, 2008 I am now thinking that this format file should also contain a detailed text description for acquisition and processing details to be stored after the ENDSEGZ statement. Velocity Time-RMS card images or mute intervals could be stored as additional details in this file. I will provide some more examples as time permits.
July 28, 2008 I have written a perl script segz2segy.pl that will reformat a SEGZ file into the 1994 CSEGY file format. (numbers are either 16 or 32 bit integers and shot points are scaled by 1000. This format has been in common use here in Calgary. It was very easy to write a parser to handle the above format example. Check the source code above! I now need to add floating point numbers and to assemble several data examples.
Here is an example of a SEGY file that I have recently parsed. As the first step, I have made a list of all the format files that accompany the sgy, segy or segz files in the file called junk:
ls *fmt > junk ~ekeyser/Perl/segz2segy.pl | more -Input SEGZ format name = A2370.65148.gox99.ustr.CD-08334.fmt SEGZ Version number = V1 SEGZ Header 0 NAME LENGTH Description SEGZ Header 1 TEXTUAL-HEADER 3200 SEGY Rev(0) Default length of character line header (bytes) SEGZ Header 2 FILE-HEADER 400 Length of binary line header (bytes) SEGZ Header 3 TRACE-HEADER 240 Length of trace header (bytes) SEGZ Header 4 TRACE_SAMP_FORMAT IBM4 Format 1 in the binary header over rides the value in binary header SEGZ Header 5 ENDIANESS BIG Big Endian data suitable for SUN type computers FILE Header 0 NAME BYTE TYPE VECTOR SCALAR ADDEND Description FILE Header 1 FILE_SAMP_RATE 17 INT2 1 1 0 Sample Interval FILE Header 2 FILE_SAMP_NUM 21 INT2 1 1 0 Number of data samples per trace or the maximum number of Samples FILE Header 3 TRACE_SAMP_FORMAT 25 INT2 1 1 0 Data sample format code 1=IBM floating point FILE Header 4 LENGTH_SYS 55 INT2 1 1 0 1=metres 2=feet FILE Header 5 TIME2FIRST_SAMPLE 399 INT2 1 1 0 Time to first sample in ms TRACE Header 0 NAME BYTE TYPE VECTOR SCALAR ADDEND Description TRACE Header 1 ITRACE_FILE 5 INT4 1 1 0 Trace sequence no within this file TRACE Header 2 SOURCE_POINT 17 INT4 1 0.001 0 Energy source point number (shot peg) TRACE Header 3 CDP 21 INT4 1 1 0 Ensemble no. (CDP CMP CRP etc) TRACE Header 4 HORI_NSUM 33 INT2 1 1 0 No of horizontally summed traces in this trace TRACE Header 5 REC_HT 41 INT4 1 1 0 Elevation at receiver group TRACE Header 6 SOURCE_HT 45 INT4 1 1 0 Surface elevation at source point TRACE Header 7 SOURCE_DEPTH 49 INT4 1 1 0 Depth of source below surface (+ve) TRACE Header 8 COORD_SCALAR 71 INT2 1 1 0 Scalar for spatial coordinates (SOURCE_X to REC_Y) TRACE Header 9 CDP_X 81 INT4 1 1 0 X-coordinate of ensemble (CDP) for this trace TRACE Header 10 CDP_Y 85 INT4 1 1 0 CPD coordinate (Y) TRACE Header 11 COORD_UNITS 89 INT2 1 1 0 1=length (metres/feet) 2=lat/Lon 3=Decimal degrees 4=DMS TRACE Header 12 SOURCE_STATIC 99 INT2 1 1 0 SOURCE_STATIC Source static correction [ms] TRACE Header 13 REC_STATIC 101 INT2 1 1 0 REC_STATIC Receiver group static correction [ms] TRACE Header 14 TOTAL_STATIC 103 INT2 1 1 0 TOTAL_STATIC Total static applied [ms] TEXTUAL_HEADER = 3200 FILE_HEADER = 400 TRACE_HEADER = 240 -Input segy/segz data file = A2370.65148.gox99.ustr.CD-08334.sgy -Output 1994 CSEG SEGY file = A2370.65148.gox99.ustr.CD-08334.csgy FILE_SAMP_RATE= 2000.00 FILE_SAMP_NUM= 1500.00 TRACE_SAMP_FORMAT= 1.00 LENGTH_SYS= 1.00 TIME2FIRST_SAMPLE= -100.00 ITRACE_FILE SOURCE_POINT CDP HORI_NSUM REC_HT SOURCE_HT SOURCE_DEPTH COORD_SCALAR CDP_X CDP_Y COORD_UNITS SOURCE_STATIC REC_STATIC TOTAL_STATIC 1.00 1002.00 3.00 1.00 703.00 703.00 0.00 1.00 534487.00 6187685.00 1.00 0.00 68.00 154.00 2.00 1002.50 4.00 2.00 703.00 703.00 0.00 1.00 534486.00 6187671.00 1.00 0.00 68.00 154.00 3.00 1003.00 5.00 2.00 702.00 702.00 0.00 1.00 534486.00 6187657.00 1.00 0.00 68.00 153.00 4.00 1003.50 6.00 1.00 702.00 702.00 0.00 1.00 534485.00 6187643.00 1.00 0.00 66.00 151.00 5.00 1004.00 7.00 2.00 702.00 702.00 16.00 1.00 534484.00 6187629.00 1.00 85.00 63.00 148.00 6.00 1004.50 8.00 2.00 702.00 702.00 0.00 1.00 534483.00 6187615.00 1.00 0.00 62.00 147.00 7.00 1005.00 9.00 3.00 702.00 702.00 0.00 1.00 534482.00 6187601.00 1.00 0.00 61.00 146.00 8.00 1005.50 10.00 4.00 703.00 703.00 0.00 1.00 534481.00 6187587.00 1.00 0.00 65.00 151.00 9.00 1006.00 11.00 4.00 703.00 703.00 0.00 1.00 534481.00 6187573.00 1.00 0.00 69.00 155.00 10.00 1006.50 12.00 3.00 703.00 703.00 0.00 1.00 534480.00 6187559.00 1.00 0.00 69.00 155.00 20.00 1011.50 22.00 8.00 704.00 704.00 0.00 1.00 534471.00 6187419.00 1.00 0.00 66.00 153.00 30.00 1016.50 32.00 9.00 704.00 704.00 0.00 1.00 534463.00 6187279.00 1.00 0.00 68.00 155.00 40.00 1021.50 42.00 12.00 704.00 704.00 0.00 1.00 534455.00 6187140.00 1.00 0.00 66.00 154.00 50.00 1026.50 52.00 14.00 703.00 703.00 0.00 1.00 534446.00 6187000.00 1.00 0.00 65.00 150.00 60.00 1031.50 62.00 13.00 702.00 702.00 0.00 1.00 534438.00 6186860.00 1.00 0.00 70.00 156.00 70.00 1036.50 72.00 16.00 700.00 700.00 0.00 1.00 534430.00 6186721.00 1.00 0.00 69.00 156.00 80.00 1041.50 82.00 13.00 699.00 699.00 0.00 1.00 534421.00 6186581.00 1.00 0.00 73.00 161.00 90.00 1046.50 92.00 14.00 699.00 699.00 0.00 1.00 534412.00 6186441.00 1.00 0.00 73.00 161.00 100.00 1051.50 102.00 15.00 698.00 698.00 0.00 1.00 534404.00 6186302.00 1.00 0.00 73.00 161.00 200.00 1101.50 202.00 14.00 679.00 679.00 0.00 1.00 534314.00 6184905.00 1.00 0.00 82.00 178.00 300.00 1151.50 302.00 14.00 675.00 675.00 0.00 1.00 534223.00 6183509.00 1.00 0.00 50.00 133.00 400.00 1201.50 402.00 16.00 656.00 656.00 0.00 1.00 534125.00 6182113.00 1.00 0.00 70.00 168.00 500.00 1251.50 502.00 12.00 632.00 632.00 0.00 1.00 534022.00 6180717.00 1.00 0.00 87.00 183.00 585.00 1294.00 587.00 1.00 630.00 630.00 0.00 1.00 533934.00 6179531.00 1.00 0.00 84.00 174.00 End of File A2370.65148.gox99.ustr.CD-08334.sgyNote: This script has to run under a Little Endian Linux box. At present it assumes all data are BIG (SUN like) Endian byte orderWhat I am currently working on is the ability to handle Little Endian SEGY data (the data needs to be byte swapped).
Note: It would be really simple to add an export function to be able to dump any field into a tab delimeted excel type spreadsheet format. All the user would have to do is specify the format of the field being dumped. I'd add a parameter option to the program to dump the file. If someone asks, I will do the coding!
Check on this web sites for updates.
Introduction
This web site describes a new seismic data exchange format which we propose to name SEG-Z, and which is intended to be largely a replacement for SEG-Y. SEG-Z is clearly modelled on SEG-Y : there are many similarities in terms of terminolgy and file structure, and in fact the majority of SEG-Y files can be expressed as SEG-Z files, with the addition of a separate text file that describes where data is located within the file. (eg. linename.sgz, linename.sgztxt)SEG-Y has been both one of the most successful seismic data exchange formats and also the most broken. It must be deemed successful in terms of its uptake throughout the seismic processing and interpretation industry, but the format itself could be deemed a failure since many (perhaps the majority of) so-called SEG-Y files do not follow the format correctly. Loading of data from SEG-Y files has consequently rarely been the straightforward process it was intended to be. Contrast the reading in of a SEG-Y file with other common file formats (for example, Jpeg, TIFF, PDF or Mpeg) - todays file formats should load seamlessly without the user having to know any details of the format itself; this is certainly not generally true of SEG-Y.
Part of the reason for the failure of SEG-Y was it's inflexibility : the format decreed specific byte locations for the various bits of information, and fixed byte-lengths for the components of the file structure. Consequently when users needed to add new types of information into the file beyond what was originally envisaged, it could only be done by breaking the format. Most processing houses had to do this, and all did it in different ways, leading to a sub-industry dedicated solely to seismic data loading. Another key failure was the inability of SEG-Y to be able to incorporate new ways of storing the data (for example, only 4-byte IBM-format floating point values were supported).
A key difference between SEG-Z and SEG-Y is that whilst the older format decrees what information can be placed in the file, and where it has to go, the new format defines a set of rules about how to describe where data has been placed. The user then has flexibility to add more information to the file without breaking the format. The other key difference is that the components of the file structure (specifically, the header lengths) no longer have a proscribed size, but can be any length, with the lengths of the components being specified at the start of the file or potentially as separate files.
Design Aims
Our aims in designing SEG-Z have been to provide the industry with a seismic data exchange format that is :In short, we wish to remove the Tower of Babel errected by SEG-Y, and let Geophysicists get on with their real job of seismic processing or interpretation.
- Flexible enough that is should serve us well for the next few decades.
- Easy enough to use and implement that it is quickly adopted as a working standard throughout the industry
- Able to save arbitrary length header structures, eg as are now typically output from processing software using dynamically sized header arrays.
- Able to save header entries which are vectors instead of single values
- Able to save header entries in a wide range of datatypes
- As the majority of seismic data is currently archived in the myriad versions of SEG-Y, we think it desirable to be able to read in those files using SEG-Z reading software, with minimal user input.
The desirability of being able to read the majority of existing SEG-Y files as SEG-Z files may seem odd at first sight. However many of those SEG-Y files are not "true SEG-Y", so require custom enhancements to the reading software to load correctly. As the new SEG-Z format is able to describe the contents of the file precisely, most SEG-Y files will read correctly via SEG-Z reading software, with the addition of a separate descriptive text file. These text files should then be easily available to those who want to read in "true SEG-Y".
We intend only that SEG-Z be able to read "most" SEG-Y files, for two reasons. Firstly, if SEG-Z included support for all known datatypes used in SEG-Y over the last 30 years (3-byte floats is just one example), valid reading software should in principle be able to support all of those data types. We feel this could present a barrier to widespread acceptance of the format. Secondly, there are probably many datatypes used historically which are essentially unknown, or at least forgotten, and hence cannot be included. We do provide a mechanism within SEG-Z to be able to describe files which fall outside the format, so that at least the reading software can fail cleanly with a useful message to the user.
SEGZ Rules
- Fields are comma-delimited, case insensitive.
- White Space is legal
- The letters SEGZ-Format-Definition-V1 are the first line of the descriptor. In addition to signature bytes, the version number is also contained on this line.
- The first line of each preamble stanza should be a list of TAGS. Tags currently include : Name, Length, Byte, Type, Count, Scalar, Addend, Description. Some tags can be optional (Scalar and Addend) and some may only apply to certain stanzas (eg Length applies to the SEG-Z parameters stanza). Reading software must recognise the tags that are described by the format. SEG-Z files can contain new tags, but the reading software must bypass any fields for tags that it doesn't recognise. The SEG-Z standard will be deemed to be broken if REQUIRED TAGS (eg Name, Byte..) are missing from the file.
- Normally, Defaults are not allowed, in order to define SEGY with SEG-Z, the Textual-header length is 3200, File-header length is 400, Trace-header length is 240.
- Default for SCALAR is 1.0, and ADDEND is absent is 0.0. COUNT is assumed to be 1. Scaler is a multiplier, a scaler of 10 means you multiply the number in the field by the number 10. A scaler of 0.25 means you multiply (same as divide by 4). The order of operations is to multiply first and then apply the addend. (applied after the scaler)
- Comments start with the character #
- Values stored in the header over ride the data. Values stored in a separate format file over ride both the preamble header and and the Line header.
SEG-Z Assumptions
There are a few assumptions inherent in the proposed SEG-Z format :
- Bytes are 8 bits long
- The DESCRIPTOR header is coded in ASCII
- Existing commonly used header entries are now defined not by byte position within the header, but by an agreed list of header entry names. Wheras in the past, a plethora of different names has been used to refer to the integer value in bytes 17-20 of the trace header, in SEG-Z this will always be referred to as "SOURCE_POINT" and it can be anywhere in the header (and no longer has to be an integer).
- The Identification header for the Descriptor will only be defined as ASCII. We assume that no more information will be needed in this header other than that originally provided for, since all it needs to do is describe how to read the Descriptor, which can then contain all other file structure descriptions.
Separate DESCRIPTOR files to read SEG-Y
Existing SEG-Y files can be coded as SEG-Z-formatted files by specifying an external DESCRIPTOR file. This file will contain header length declarations that match the SEG-Y header lengths, and will define the byte positions for all header entries defined by the SEG-Y standard. Different DESCRIPTOR files are available reading for both Revision 0 and Revision 1 SEG-Y files.It is anticipated that most SEG-Z reading software will give the user the option of specifying a separate DESCRIPTOR file. If the data file to be read does not start with "SEGZ" it should be presumed to be a SEG-Y formatted file, and read using the descriptions in the descriptor provided. If the data files to be read starts with "SEGZ" it should be assumed that both the Descriptor and the Identification header contained in the file is incorrect, and the information in the separate descriptor file should be used instead.
The Descriptor file must also contain the "ByteOrdering" definition, which would normally be present in the Identification header, which would be absent in the case of a SEGY file.
Identifying Header
The Identifcation header at the start of the file is SEGZ-Format-Definition-V1(coded in ASCII). The same byte ordering must be used throughout the file. Bytes 7-8 contains the 16bit version of the number 49 (the ASCII number "1"). Bytes 9 to 12 hold the length of the preamble, in bytes, expressed as a 4-byte 2's complement unsigned integer. The byte order is contained within the SECTION SEGZ-parameters section as per below:Name, Type, Description Endianess, BIG, Big Endian, data suitable for SUN type computers Endianess, LITTLE, Little Endian, data suitable for PC type computersThe major focus of this committee (the HARD part) will be the creation of a standard set of names. This is where the RODE encapsulation format failed, it was too flexible! If you can't agree to a standard name, then please provide an alias table for the rest of us. The complete list of Valid Names, descriptions and probable SEGY byte positions are stored here.Here is a list of proposed number(character) formats.
Data Type Description INT1 1-byte integer INT2 2-byte integer INT4 4-byte integer INT8 8-byte integer UINT1 1-byte unsigned integer UINT2 2-byte unsigned integer UINT4 4-byte unsigned integer UINT8 8-byte unsigned integer BCD2 2-Byte Binary Coded Decimal BCD4 4-Byte Binary Coded Decimal IEEE4 4-byte IEEE float IEEE8 8-byte IEEE float IBM4 4-byte IBM float ASCII American Standard Code for Information Exchange EBCDIC Extended Binary Coded Decimal Interchange Code UNICODE Universal character set The Data types are straight forward and can even handle 64 bit numbers!
PS: For the past six months we have had a working sub committee of the following individuals:
Significant input has been provided from the following individuals:
- Eric Keyser - EnCana, Calgary
- Johathan Ravens - Claritas, New Zealand
- Reginald Beardsley - Consultant, Houston
Here is a link to the previous versions of the documentation, a lot has evolved over time!
- Mike Dennis, John Townsley, Dave Morgan - Divestco
- Shane Stogrin - Zokero
- Rick Macdonald, Doug Horton - CGGVERITAS
- Carmine Militano, Terry Perkin and Bob Loblaw - C&C Systems
- Dave D'Amico - Talisman
- James Chornopsky - Paradigm
- Randy Selzler - BP
Site Owner: Eric Keyser