Proposed CSEGZ Standard

March 19, 2008


Geophysicists the world over have been asking for the following enhancements to SEGY:
  1. A standard way to read any SEGY like format containing a Descriptive Textual Header, File header, Trace Header and Data.
  2. Optional, variable Length Headers. As the field instruments become more sophisticated, space for additional numbers are required.
  3. Additional data types, unsigned integers, IEEE floats for numbers stored in the line and trace headers. The ability to handle 64 bit numbers (eg. required to accurately specify UTM XY's)
  4. Built in mapping or keyword file to enable automatic data loading to the workstation. The idea is that the header entries are no longer to be defined just by the byte-location and description, but instead by a name. This is important since with the ability (or necessity) to define the byte positions where each trace header name is located in the headers, the byte locations can no longer be relied upon to define which header entry we are referring to, so the fixed set of names may well become essential!
The recommended SEGZ format can be thought of as a segy object containing a preamble that describes the format of the data and headers:

Note: The PREAMBLE contains an ascii description of the organization of the data that can easily be parsed.

I have spent years trying to get geophysicists to agree on a SEGY standard with little success. The recommended SEGZ format contains only those items that we can all agree upon from the original SEGY definition:

  1. A Textual File Header (was the EBCDIC header, usually free form ASCII Text (sometimes fields are hardwired)
  2. A Binary File Header ( Store numbers and characters in fixed byte locations)
  3. and a Trace Header
Note: We are now recommending that these can become variable length! It is now recommended that all data processors to provide the client a mapping file that contains sufficient details to enable automatically workstation data loading.

The Proposed SEGZ contains fixed 12 bytes of information with the last four bytes containing the length of the Preamble to describe the SEGY like definition. The first four Signature bytes contain the Ascii characters "SEGZ", followed by a reserved byte (a null) in byte 5, byte 6 contains the Big/Little Endian flag ("B" or "L") and bytes 7-8 contains the 16bit version number 49, (the ASCII number "1" - so you can examine the start of a segz file with a text editor. This is followed by a four byte integer contains the length of the Preamble Definition.

The endianess must be checked before the length of the preamble is read or you will extract the wrong number. The endianess is controlled by the B or L flag as well as a check of the 16 bit version number. A further check should be made by examining the Trace Sample Format of the data (should be a number between 1 and 6). This is what SeisX and SeisWare use to automatically to detect the endianess of your data.

Update Information

So do we have other comments in order to make this format even better?

March 19, 2008 - We now have a program to take a SEGZ file, perform some validation checks, write out some statistics, the preamble data dictionary and the segy file.

Here is if Perl script segz2segy.pl Perl script. Here is how to run it on our data samples:

ls *segz
_P107217_MRG_3D_CRP.FMIG.0.segz       A29250.80-C-2.ksp93.um.CD-31807.segz
A2370.65148.gox99.ustr.CD-08334.segz  A31523.90G01-03.un.fm.CD-30506.segz

ls *segz > junk
~ekeyser/Perl/segz2segy.pl
SEGZ ENDIAN PREAMBLE SAMPLES TRACES FILE_NAME
SEGZ   L     2269      10      9497 _P107217_MRG_3D_CRP.FMIG.0.segz
SEGZ   B      931      10       585 A2370.65148.gox99.ustr.CD-08334.segz
SEGZ   B     1245      10       319 A29250.80-C-2.ksp93.um.CD-31807.segz
SEGZ   B     1728      10       775 A31523.90G01-03.un.fm.CD-30506.segz
-fini
I will spend some effort and parsing the preamble and generalizing the code. At least it now works to take a SEGZ file and will convert it back to the SEGY file I started with!

March 11, 2008 - Shayne Stogrin has provided a list of comments, most of which I have addressed (and agreed to!)

Other than that I don't think it would be a monumental programming task to get SeisWare to read these files.

SEGZ Valid Format Codes

FORMAT Number CODE
IBM4 1
INT4 2
INT2 3
IEEE4 6
INT1 8 (bits of course)

Note: A number CODE of 5 as defined in SEGY rev(0) is nolonger recognized. The reason is simple, the number 5 has been previously used for other number representations, 36 bit number for the univac and 8 bit integer by Paradigm!

March 6, 2008 - I have finally fixed up by data examples to support this new format. (Please let me know if there is a problem) Examples of what the format file might look like for SEGZ:

SEGZ-parameters {
#Comment - This header defines a normal SEGY file
 Name,            Length,Description
 Textual-header,    3200, SEGY Rev(0) Default, length of character line header (bytes)
 File-header,        400, Length of binary line header (bytes)
 Trace-header,       240, Length of trace header (bytes)
 Trace-trailer,        0, Length of trace trailer, Normally not used
}
File-header-definition {
 Name,           Byte,Type,Vector,Scalar,Addend,Description
 FILE_SAMP_RATE,   17,INT2,1,1,0,Sample Interval
 FILE_SAMP_NUM,    21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples
 TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating point
}
Trace-header-definition {
 Name,Byte,Type,Vector,Scalar,Addend,Description
 ITRACE_LINE,1,INT4,1,1,0,Trace sequence no within line
 ITRACE_FILE,5,INT4,1,1,0,Trace sequence no within this file
 CDP,21,INT4,1,1,0,Ensemble no. (CDP CMP CRP etc)
 HORI_NSUM,33,INT2,1,1,0,No of horizontally summed traces in this trace
 SOURCE_HT,41,INT4,1,1,0,Surface elevation at source point
 CDP_X,73,INT4,1,1,0,X-coordinate of ensemble (CDP) for this trace
 CDP_Y,77,INT4,1,1,0,CPD coordinate (Y)
 SOURCE_POINT,53,IEEE4,1,1,0,Energy source point number (shot peg)
}
Use MicroSoft Word Pad to open up the segz file and you will see this. Note the signature bytes SEGZ followed by two nulls and then the letters B1. This tells you that the data are Big Endian and the SEGZ version in 1 (in ascii). The next four bytes are the 32 binary integer for the length of the preamble! Simple, eh!

Here is a table with several examples of SEGY files defined as SEGZ, the SEGZ binary header (12 bytes), the variable length format file and the segy file.

SEGZ Header Preamble SEGY
3D Little Endian X X X
2D Big Endian simple example X X X
2D Big Endian X X X
2D Big Endian numerous defined fields X X X

In order to create the SEGZ file I used the unix utility cat to append all the above files to a single SEGZ file. Next step is to create a SEGZ validation program. But first we need some rules:

Defaults for File-header-definition are:
 Name,           Byte,Type,Vector,Scalar,Addend,Description
 FILE_SAMP_RATE,   17,INT2,1,1,0,Sample Interval
 FILE_SAMP_NUM,    21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples
 TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating point
Note - While the headers are real, the seismic data has been removed and now contains only ten samples for each trace.

Here is the current list of Valid Names and descriptions, please examine this list and suggest corrections, updates or additions. This is where we need your help

SEGYZ Viewer/Dumper
February 22, 2008 - We have our first SEGY/Z viewer and dumper operational as seen on the series of images to the right.

SEGZ has the advantage of a small text file as shown under the SEG-Z tab that tells what (field name), where (byte location) and format type (integer floating point etc).

By having the fields pre declared, the user of segy data no longer has to inspect and test each and every data field.

Once these data are read, it becomes trivial to press a button to reformat the data into your unique standard of segy!

This dumper will read data from any segy format file including OPSeis with it's trace trailers!



February 20, 2008 - Jonathan makes a series of small points, name changes are made to more closely follow the names in SEGY rev(1), endianess now defined in the SEGZ Header before any numbers are read. February 11, 2008 - It has been suggested that we define a Velocity header that can easily be parsed similar to the format definition. Here's an example of :

Velocity-definition {
CDP,TIME,TIME_DATUM,TIME_SURFACE,RMS,INT_VEL,AVE_VEL,DEPTH
11,142,42,86,2310,2310,2310,99
,215,115,159,2626,2955,2606,207
,322,222,266,2897,3258,2868,381
,422,322,366,3077,3511,3044,557
,545,445,489,3100,3167,3075,752
,717,617,661,3190,3433,3168,1047
,828,728,772,3213,3347,3194,1233
,962,862,906,3235,3359,3218,1458
...
51,138,38,82,2400,2400,2400,98
,192,92,136,2761,3233,2731,186
,299,199,243,2919,3108,2897,352
,410,310,354,2987,3131,2970,526
,579,479,523,3100,3324,3085,807
,740,640,684,3100,3100,3088,1056
,867,767,811,3168,3512,3155,1279
,1001,901,945,3281,3896,3260,1540
,1143,1043,1087,3303,3446,3284,1785
...
}
The above will easily load into a spread sheet for cdp's 11 and 51. What do you think? Is this a good idea? a bad idea? Should we do something similar for the mute pattern?

January 16, 2008 - Randy Selzler (BP) - Recommends the endian of binary types need to be addressed explicitly in the preamble description. The endian of binary types needs to be addressed by the standard and probably specified explicitly in the preamble. I'd avoid draconian rules that force external representation to always be big (or small). Field by field control is probably overkill. That leaves a flag per dataset that specifies big or little. It is recommended that the endianess should be verified by examining the format code (TRACE_SAMP_FORMAT) in the binary header. All you have to do is test for a number between one and six. Workstation vendors make this test so interpretation projects can be a mixture and Big and Little Endian and the interpreter doesn't even know it!

January 10, 2008 - Reginald Beardsley6(su) I think a key part of this should be an official mapping of vendor keywords to the SEG-Z header name (done by the vendor). There's no reason to get concerned about names matching. What matters is getting the usage correct. The key to that is describing the content of a header properly, so it is unambiguous what it contains and what the representation is.

January 7, 2008 - Jonathan Ravens (GLOBE Claritas Development NZ) provided an extensive list of Suggested Names and SEG description. I combined this file with most of the fields from a SeisX/SeisWare definition as well as a few survey names.

This is our initial stab at proposing a list of names. What do you think? What have we missed? The above tables were created by examining several different interpretations of SEGY. These files can be examined in spreadsheet form: It is very important that we need to agree on a list of names, this isn't just a neater system. The old system of declaring specific byte locations and number format clearly is not working. The New Proposed System will provide the flexibility for the future and can even able data to be properly loaded. Doug Bath (ITS segytool) believes we are on the right track and it should not be hard to update his software to read and write this new format. This will also make life easier for seismic folk to converse without having to translate (eg between SOURCE_POINT and ep, or ITRACE_FILE and tracr, to use a couple of su (Colorado School of Mines seismic unix) examples.

My next step is to generate a series of data examples, I will window the data down to ten samples so we can see how this works. I will write two perl scripts that will create a SEGZ file by combining a SeisWare format file and a SEGY file, (seismicline.sgy, seismicline.fmt) and a second script that will take a SEGZ data file and split it into a SEGY file and a format file.

December 20, 2007 - Carmine Militano, Terry Perkin and Bob Loblaw (C&C Systems) Propose a variable length PREAMBLE to be appended to the front of SEGY files that contain a list of STANDARD NAMES to describe the FORMAT of the data. We are proposing that SEGZ Version 1 will handle most SEGY's are are currently in existance. The length of the Text-header is assumed to be 3200, the File-header 400 and the Trace-header 240. Version 1 would be used for most data. Here is what a typical Preamble might look like:

File-header {
 Name,Byte,Type,Vector,Scalar,Addend,Description
 FILE_SAMP_RATE,17,INT2,1,1,0,Sample Interval
 FILE_SAMP_NUM,21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples
 TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating point 
}
Trace-header {
 Name,Byte,Type,Vector,Scalar,Addend,Description
 SOURCE_POINT,25,INT4,1,.25,-100,Energy source point number (for some old Amoco data)
 CDP,21,INT4,1,1,0,Ensemble number (CDP CMP CRP etc)
 CDP_X,81,INT4,1,1,0,CoordinateX
 CDP_Y,85,INT4,1,1,0,CoordinateY
}

Note: The File-header-definition above can be considered to be the default. If you don't provide an overide then this is how the binary header is to be defined. These are the ONLY fields that I can find consistant agreement between the numerous different segy files I have seen in my career.

Due to the confusion of what floating point number format the actual data are in, we have provided for a Number-format-definition to over ride what is stored in the Binary (File) header.

We are also advocating the creation of a Version 2 that will contain the SEGZ-header-definition for variable length headers as well as the ability to handle trace trailers (OPSeis field data). Here is what this would look like:

SEGZ-parameters {
#Comment - This header defines a normal SEGY file
 Name,            Length,Description
 Textual-header,    3200, SEGY Rev(0) Default, length of character line header (bytes)
 File-header,        400, Length of binary line header (bytes)
 Trace-header,       240, Length of trace header (bytes)
 Trace-trailer,      960, Opseis format for field data 
 Trace_Sample_Rate, IBM4, Format 1 in the binary header
} 

The major focus of this committee (the HARD part) will be the creation of a standard set of names. This is where the RODE encapsulation format failed, it was too flexible! If you can't agree to a standard name, then please provide an alias table for the rest of us. I will work on providing an alias table for Colorado School of Mines su and Stanford's SEP processing documentation.

Here is a list of proposed number(character) formats. Can you think of any number types we might be missing?
Data Type Description
INT1 1-byte integer
INT2 2-byte integer
INT4 4-byte integer
INT8 8-byte integer
UINT1 1-byte unsigned integer
UINT2 2-byte unsigned integer
UINT4 4-byte unsigned integer
UINT8 8-byte unsigned integer
BCD2 2-Byte Binary Coded Decimal
BCD4 4-Byte Binary Coded Decimal
IEEE4 4-byte IEEE float
IEEE8 8-byte IEEE float
IBM4 4-byte IBM float
ASCII American Standard Code for Information Exchange
EBCDIC Extended Binary Coded Decimal Interchange Code
UNICODE Universal character set

Here's an example of a File/Line/Binary/Reel header for some typical segy data:
Name Byte Type Vector Scalar Addend Description
FILE_SAMP_RATE 17 INT2 1 1 0 Sample Interval
FILE_SAMP_NUM 21 INT2 1 1 0 Number of data samples per trace or the maximum number of Samples
TRACE_SAMP_FORMAT 25 INT2 1 1 0 Data sample format code 1=IBM floating point
SEIS_DATUM 125 IEEE4 1 1 0 Seismic Datum elevation
SEIS_REPLACEMENT_VELOCITY 129 IEEE4 1 1 0 Seismic Replacement Velocity
LINE_NAME 301 ASCII 32 0 0 LineName

Here's an example for a trace header definition:
Name Byte Type Vector Scalar Addend Description
INLINE 9 INT4 1 1 0 Original field record number 3D inline number
CROSSLINE 13 INT4 1 1 0 Original field record number 3D inline number
SOURCE_POINT 17 IEEE4 1 1 0 Energy source point number
SOURCE_POINT 25 INT4 1 0.25 -100 Energy source point number (for some old Amoco data)
SOURCE_POINT 17 INT4 1 1000 0 Energy source point number (CSEG SEGY)
CDP 21 INT4 1 1 0 Ensemble number (CDP CMP CRP etc)
ITRACE_LINE 1 UINT4 1 1 0 TraceCounter for line
TRTYPE 31 INT2 1 1 0 DeadTraceFlag, 1=data 2=dead 3=dummy 4=time break 5=uphole (etc)
HORI_NSUM 33 INT2 1 1 0 Fold, No of horizontally summed traces in this trace
REC_HT 41 IEEE4 1 1 0 Elevation, Elevation at receiver group
CDP_X 81 IEEE4 1 1 0 CoordinateX, X-coordinate of ensemble (CDP) for this trace
CDP_Y 85 IEEE4 1 1 0 CoordinateY

Note how I have provided an example of how we can properly scale a shot point field (CSEG 1994 segy standard) and how we can calculate a shot point field from a trace counter in some old amoco data (four traces per shot point, first trace is shotpoint -100).

December 13, 2007 - Zig Doborzynski (EnCana) - provided the Opseis Documentation for a SEGY format that uses a 960 byte trace trailer. (This is an acquisition system in common use during the 1990's in the southern US). In order to handle these data, we have added another 32 bit unsigned integer to the SEGZ header to handle segy trace trailers. Zig does point out that the proposed SEGZ headers can be a maximum size of 4.294 gigabytes, probably larger than we really require.

December 12, 2007 - Jonathan Ravens (GLOBE Claritas Development NZ) has put a lot of thought into coming up with the following improvements for consideration:

The hard thing about this process is can we all agree on how we are going to describe our data fields and our number types? Take a look at this Field Naming Conventions (comma separated file) to appreciate the different ways we describe the same item. I have captured how Zokero, Paradigm, ARAM and CGGVeritas describe the same number fields. My suggestion is to come up with a few rules to be used to define field names: If we can agree, we can have a new standard.

November 20, 2007 - Tom Feigs (EnCana) suggests that there should be an auxiliary file (meta) that describes the format of the data. That way, you can avoid modifying the original SEGY record of legacy data. Basic archive principal states that the "archive" always remains as the original.

This second file, perhaps call seismic_line_name.xml sits next to seismic_line_name.sgy on your online file system. Here is an example xml file that describes part of the EnCana segy format standard. It would be logical to have such a meta file that describes the internal segy format for automatic data loading. This file could be part of the variable length EBCDIC header for the proposed SEGZ format.

May 8, 2007 - Mike Dennis (Divestco - WinPics) We need a flag in the binary header to define the number of samples at the trace level. It is proposed a flag of SamplesPerTrace=0 in the binary header be used to indicate that the number of samples are read from the trace header. At present, workstation software (WinPics, SeisWare and SeisX) read the number of samples from the binary header, the trace header is ignored. The Workstation Vendors prefer that all traces contain the same number of samples.

May 1, 2007 - John Townsley (Divestco - WinPics) points out that we need a definition of the UTM Zone or the Coordinate System. The names have been added, including the international EPSG codes www.epsg.org

April 27, 2007 - Dave Morgan (Divestco) makes the observation that the CSEGZ definition can be thought of as generalized SEGY object, with additional data types and variable length headers.

April 19, 2007 - Rick makes the suggestion that the mapping file should become part of the Text header and easy for a computer program to parse. We now have the tags map and /map to surround the map file format. It should be simple for the Workstation data loading to automatically parse the Text header and build it's format file. Here is what a Talisman like Mapping file (comma separated data fileds) would look like.

April 18, 2007 - Rick Macdonald (cggveritas) hopes "the format of your info to be strict and program-readable so we could read it directly instead of converting it manually to our mapping file format every time."

I have provided a Perl script to demonstrate one way to reformat the Proposed SEGYZ mapping file into SeisWare Keywords.

April 18, 2007 - Shayne Stogrin comments "One other minor thing. I am not really sure what to say about it, but the whole Scale thing I find a bit confusing. I am not confused by it, but it is non-transparent. When I think of scale I think that you should apply that scale to something, as opposed to the fact that the "scale" has already been applied. I personally prefer a scale factor that you need to apply to get the real data, so when you have a scale of 100, I would rather see a scalar of 0.01. I think the use of scale is definitely because of historical reasons, and I won't quibble with them. I just wonder if maybe a better terminology is all that's needed.

April 18, 2007 - Doug Horton (cggveritas) reports that they suggest "that we need to be able to handle big/little endian formats in the numbers definition".

Good idea. We can easily do this by fixing the position of the DataFormat Integer16Bit in Bytes 25-26 of the Line Header (SEGY byte position). These numbers are in the range of 1 to 6. A simple test with the two byte swap of this data field will identify the endianess of the data. (That's how SeisX and SeisWare automatically identifies whether the data came from SUN like machine or a PC).

April 17, 2007 - Shayne Stogrin (Zokero) makes the following suggestions

Prefix Header Definition

The following table is an example of what a SEGY file would look like converted to CSEGZ.

LABEL 4 ASCII characters SEGZ
Descriptive (text) Header Integer32Bit 3200
File Header Integer32Bit 400
Trace Header Integer32Bit 240

There are two different ways that the loading information needs to be specified. We require specifications for Numbers and for Text description. Here is how to describe numbers for files that are formatted like a SeisX or a SeisWare segy file:

CSEGZ Numbers Definition

Only one number position will be defined as fixed and that is the DataFormat code in bytes 25-26 of the Binary Header. (SEGY byte position). It is needed in order to provide the convenient number in order to identify if the data are Big or Little Endian.

Please note we have a Scalar field that needs some clarification. Previous versions of SEGY have used this field in order to handle fractional numbers. The 1999 CSEG standard declared that shot points were to be multiplied by the number 1000. Eg, shot point 100.25 would be stored as the number 100,250. Workstation software needs to divide the number by the Scalar.

Numbers Header Byte Type Scalar Comments
SampleInterval File 17 Integer16Bit 1 Sample Interval
SamplesPerTrace File 21 Integer16Bit 1 Number of data samples per trace or the maximum number of Samples
DataFormat File 25 Integer16Bit 1 Data sample format code, 1=IBM floating point, 6=IEEE floating point
SeismicDatum File 125 FloatIEE32Bit 1 Seismic Datum elevation
SeismicReplacementVelocity File 129 FloatIEE32Bit 1 Seismic Replacement Velocity
LineSequenceNumber Trace 9 Integer32Bit 1 Original field record number, 3D inline number
TraceSequenceNumber Trace 13 Integer32Bit 1 Original field trace number, 3D cross line number
ShotSequenceNumber Trace 17 FloatIEE32Bit 1 Energy source point number
CDPNumber Trace 21 Integer32Bit 1 Ensemble number (CDP CMP CRP etc)
TraceCounter Trace 25 Integer32Bit 1 Trace number within the ensemble
DeadTraceFlag Trace 31 Integer16Bit 1 Number of vertically summed traces
Fold Trace 33 Integer16Bit 1 Number of horizontally stacked traces
Elevation Trace 41 FloatIEE32Bit 1 Receiver group elevation
CoordinateX Trace 81 FloatIEE32Bit 1 Group Coordinate X
CoordinateY Trace 85 FloatIEE32Bit 1 Group Coordinate Y

Here is how to describe character (text) information:

CSEGZ Text Description (formally the EBCDIC Header)

Characters Header Byte Size Type
LineName File 301 32 StringASCII
Description Text 3120 80 StringASCII

Talisman like CSEGZ Description

Talisman (and many CSEG members) like to adhere to the original SEGY rev(0) definition that specifies all numbers specified in the header are integers. They use Scalars to handle the decimal point.

Numbers Header Byte Type Scalar
SampleInterval File 17 Integer16Bit 1
SamplesPerTrace File 21 Integer16Bit 1
DataFormat File 25 Integer16Bit 1
LineSequenceNumber Trace 9 Integer32Bit 1
TraceSequenceNumber Trace 13 Integer32Bit 1
ShotSequenceNumber Trace 17 Integer32Bit 1000
CDPNumber Trace 21 Integer32Bit 1
DeadTraceFlag Trace 31 Integer16Bit 1
Fold Trace 33 Integer16Bit 1
Elevation Trace 41 Integer32Bit 100
CoordinateX Trace 81 Integer32Bit 100
CoordinateY Trace 85 Integer32Bit 100

Note that the shot points are multiplied by 1000 and the CDP bin coordinates by 100. Click here to view what the Talisman file would look like This file makes use of the Characters format in order to extract information such as Line Name out of the Descriptive (text) header. .

Data Field Naming Conventions

To make this work between different computers, we will need to standardize on what we call the different items. If we cannot agree what to call different items, at least we need to be as a minimum consistent. I have assembled the following table to illustrate some of the different names that are currently being used. Here is a link to the naming conventions spread sheet. We can see that every company likes to be a little bit different. The real trick with standards is to see if we can all come to consensus of what to call the various items!

Proposed SEGZ SeisWare/Zokero SeisX/Paradigm ARAM-Airies Veritas
SampleInterval Sample Interval SAMPLE_INTERVAL Sample Interval SAMPLE_RATE
SamplesPerTrace Samples Per Trace Undefined Number of Samples per Trace SAMPLES_PER_TRACE
DataFormat Data Format Undefined Sample Format Code FORMAT_CODE
SeismicDatum Seismic Datum SEIS_DATUM Undefined CDP_DATUM
SeismicReplacementVelocity Seismic Replacement Velocity SEIS_RELACEMENT_VELOCITY Undefined CDP_VELR
LineSequenceNumber Line Sequence Number LINE_SEQUENCE_NUMBER File Number INLINE
TraceSequenceNumber Trace Sequence Number TRACE_SEQUENCE_NUMBER Trace Number Within File CROSS_LINE
ShotSequenceNumber Shot Sequence Number SHOT_SEQUENCE_NUMBER Source point Number CDP_FSTN
CDPNumber CDP Number CDP_NUMBER Undefined GROUP
TraceCounter Trace Counter Undefined Undefined GROUP_TRACE
DeadTraceFlag Dead Trace Flag Undefined Number of Composites VSUM
Fold Fold Undefined Horizontal Sums per trace CDP_FOLD
Elevation Elevation Undefined Undefined CDP_ELEV
CoordinateX UTM X UTMX Undefined CDP_X
CoordinateY UTM Y UTMY Undefined CDP_Y
EPSGCode
SurveyDatum
SurveyGrid

Valid Data Types

Of course for this to all work we have to agree on our naming convention. Here is what is proposed for Valid Data types. Please contact the author if you can suggest better names:

Data Types Text
Integer8Bit 8 bit integer
Integer16Bit 16 bit integer
Integer32Bit 32 bit integer
Integer64Bit 64 bit integers
FloatIEE32Bit 32 bit real number
FloatIEEE64Bit 64 bit real number
FloatIBM32Bit 32 bit IBM real number
FloatIBM64Bit 64 bit IBM real number
StringASCII TEXT
StringEBCDIC TEXT
StringUnicode TEXT

The Data types are straight forward and can even handle 64 bit numbers!

Perl Script to generate SeisWare KeyWord files

Here is a simple script build_kwd.pl that will take as input our CSEGZ mapping file and generate a SeisWare KeyWord (.kwd) file that can be used to load your seismic into SeisWare.

First we need to look at what the CSEGZ mapping file looks like: Here is how it works:

bash-2.03$ more veritas_3d_map.csv
Numbers,Header,Byte,Type,Scalar
SampleInterval,File,17,Integer16Bit,1
SamplesPerTrace,File,21,Integer16Bit,1
DataFormat,File,25,Integer16Bit,1
SeismicDatum,File,125,FloatIEEE32Bit,1
SeismicReplacementVelocity,File,129,FloatIEEE32Bit,1
LineSequenceNumber,Trace,9,Integer32Bit,1
TraceSequenceNumber,Trace,13,Integer32Bit,1
ShotSequenceNumber,Trace,17,FloatIEEE32Bit,1
CDPNumber,Trace,21,Integer32Bit,1
TraceCounter,Trace,25,Integer32Bit,1
DeadTraceFlag,Trace,31,Integer16Bit,1
Fold,Trace,33,Integer16Bit,1
Elevation,Trace,41,FloatIEEE32Bit,1
CoordinateX,Trace,81,FloatIEEE32Bit,1
CoordinateY,Trace,85,FloatIEEE32Bit,1
Characters,Header,Byte,Size,Type
LineName,File,301,32,StringASCII
Description,Text,3120,80,StringASCII
This example script will create the keywords to load most of the data into SeisWare. Use the -f option to specify the mapping file to use and direct the output to the file veritas_3d.kwd.
bash-2.03$ ~ekeyser/Perl/build_kwd.pl -f veritas_3d_map.csv > veritas_3d.kwd
Let's take a look at our new mapping file
bash-2.03$ more veritas_3d.kwd
# Self describing ascii 

struct HeaderWord {
Name Header Byte Type Scale
}
struct HeaderString {
Name Header Byte Size Type
}
HeaderWord "Line Sequence Number"  Trace 8 32BitInteger 1
HeaderWord "Trace Sequence Number" Trace 12 32BitInteger 1
HeaderWord "UTM X" Trace 80 16BitInteger 1
HeaderWord "UTM Y" Trace 84 16BitInteger 1
HeaderString "Line Name" File 300 32 ASCII
HeaderString "Description" EBCDIC 3119 80 ASCII
Just point the SeisWare data loader to this file and the data can be loaded! It should be a trivial exercise to edit this script to generate all the KeyWords that SeisWare can accept!
Site Owner: Eric Keyser