Proposed CSEGZ Standard
March 19, 2008
Geophysicists the world over have been asking for the following enhancements to SEGY:The recommended SEGZ format can be thought of as a segy object containing a preamble that describes the format of the data and headers:
- A standard way to read any SEGY like format containing a Descriptive Textual Header, File header, Trace Header and Data.
- Optional, variable Length Headers. As the field instruments become more sophisticated, space for additional numbers are required.
- Additional data types, unsigned integers, IEEE floats for numbers stored in the line and trace headers. The ability to handle 64 bit numbers (eg. required to accurately specify UTM XY's)
- Built in mapping or keyword file to enable automatic data loading to the workstation. The idea is that the header entries are no longer to be defined just by the byte-location and description, but instead by a name. This is important since with the ability (or necessity) to define the byte positions where each trace header name is located in the headers, the byte locations can no longer be relied upon to define which header entry we are referring to, so the fixed set of names may well become essential!
- Add a keyword mapping file to the front of every segy that can describe any adaptation of SEGY since the original 1975 definition.
- Provide a new format that easily be extended to handle the needs of the future, ie variable header lengths
Note: The PREAMBLE contains an ascii description of the organization of the data that can easily be parsed.
I have spent years trying to get geophysicists to agree on a SEGY standard with little success. The recommended SEGZ format contains only those items that we can all agree upon from the original SEGY definition:
Note: We are now recommending that these can become variable length! It is now recommended that all data processors to provide the client a mapping file that contains sufficient details to enable automatically workstation data loading.
- A Textual File Header (was the EBCDIC header, usually free form ASCII Text (sometimes fields are hardwired)
- A Binary File Header ( Store numbers and characters in fixed byte locations)
- and a Trace Header
The Proposed SEGZ contains fixed 12 bytes of information with the last four bytes containing the length of the Preamble to describe the SEGY like definition. The first four Signature bytes contain the Ascii characters "SEGZ", followed by a reserved byte (a null) in byte 5, byte 6 contains the Big/Little Endian flag ("B" or "L") and bytes 7-8 contains the 16bit version number 49, (the ASCII number "1" - so you can examine the start of a segz file with a text editor. This is followed by a four byte integer contains the length of the Preamble Definition.
The endianess must be checked before the length of the preamble is read or you will extract the wrong number. The endianess is controlled by the B or L flag as well as a check of the 16 bit version number. A further check should be made by examining the Trace Sample Format of the data (should be a number between 1 and 6). This is what SeisX and SeisWare use to automatically to detect the endianess of your data.
Update Information
So do we have other comments in order to make this format even better?March 19, 2008 - We now have a program to take a SEGZ file, perform some validation checks, write out some statistics, the preamble data dictionary and the segy file.
Here is if Perl script segz2segy.pl Perl script. Here is how to run it on our data samples:
ls *segz _P107217_MRG_3D_CRP.FMIG.0.segz A29250.80-C-2.ksp93.um.CD-31807.segz A2370.65148.gox99.ustr.CD-08334.segz A31523.90G01-03.un.fm.CD-30506.segz ls *segz > junk ~ekeyser/Perl/segz2segy.pl SEGZ ENDIAN PREAMBLE SAMPLES TRACES FILE_NAME SEGZ L 2269 10 9497 _P107217_MRG_3D_CRP.FMIG.0.segz SEGZ B 931 10 585 A2370.65148.gox99.ustr.CD-08334.segz SEGZ B 1245 10 319 A29250.80-C-2.ksp93.um.CD-31807.segz SEGZ B 1728 10 775 A31523.90G01-03.un.fm.CD-30506.segz -finiI will spend some effort and parsing the preamble and generalizing the code. At least it now works to take a SEGZ file and will convert it back to the SEGY file I started with!March 11, 2008 - Shayne Stogrin has provided a list of comments, most of which I have addressed (and agreed to!)
Other than that I don't think it would be a monumental programming task to get SeisWare to read these files.
- Minor thing but "Scaler" is spelled "Scalar".
- I think that the Textual-header in the SEGZ-parameters needs a type associated with it (ASCII, EBCDIC or UNICODE).
- Meaning of Scalar needs to be defined. In the example, it is specified as a divisor. In SeisWare we use the Scalar as a multiplier. I believe that a multiplier is more flexible and intuitive. In his example you could use a Scalar of 0.25 rather than the 4. Using the divisor method, if I wanted to scale my data by 10 times I would have to provide a Scalar of 0.1 which I think would just be confusing. Alternately you could add a Divisor field. If a Divisor field is added the order of the scaling and division operations would need to be strictly defined.
- I think that you need to give an explicit table of the accepted types of trace data (e.g. 1 = IBM4) just so there is no room for misinterpretation with previous "standards". It might be worth giving a code to IEEE8 before someone decides to invent their own.
- I would remove IBM8 as a data type. Unless someone knows something I don't there is no such animal.
- The meaning of Addend needs to be precisely defined (i.e. whether it is applied before or after the scalar).
- If the preamble part is to be used as a format file I think that it needs to be a little more rigorously defined so that it can be easily verified as a proper format file. For example you might make it mandatory that "SEGZ-parameters" is the first word on the first line.
SEGZ Valid Format Codes
FORMAT Number CODE IBM4 1 INT4 2 INT2 3 IEEE4 6 INT1 8 (bits of course) Note: A number CODE of 5 as defined in SEGY rev(0) is nolonger recognized. The reason is simple, the number 5 has been previously used for other number representations, 36 bit number for the univac and 8 bit integer by Paradigm!
March 6, 2008 - I have finally fixed up by data examples to support this new format. (Please let me know if there is a problem) Examples of what the format file might look like for SEGZ:
SEGZ-parameters { #Comment - This header defines a normal SEGY file Name, Length,Description Textual-header, 3200, SEGY Rev(0) Default, length of character line header (bytes) File-header, 400, Length of binary line header (bytes) Trace-header, 240, Length of trace header (bytes) Trace-trailer, 0, Length of trace trailer, Normally not used } File-header-definition { Name, Byte,Type,Vector,Scalar,Addend,Description FILE_SAMP_RATE, 17,INT2,1,1,0,Sample Interval FILE_SAMP_NUM, 21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating point } Trace-header-definition { Name,Byte,Type,Vector,Scalar,Addend,Description ITRACE_LINE,1,INT4,1,1,0,Trace sequence no within line ITRACE_FILE,5,INT4,1,1,0,Trace sequence no within this file CDP,21,INT4,1,1,0,Ensemble no. (CDP CMP CRP etc) HORI_NSUM,33,INT2,1,1,0,No of horizontally summed traces in this trace SOURCE_HT,41,INT4,1,1,0,Surface elevation at source point CDP_X,73,INT4,1,1,0,X-coordinate of ensemble (CDP) for this trace CDP_Y,77,INT4,1,1,0,CPD coordinate (Y) SOURCE_POINT,53,IEEE4,1,1,0,Energy source point number (shot peg) }Use MicroSoft Word Pad to open up the segz file and you will see this. Note the signature bytes SEGZ followed by two nulls and then the letters B1. This tells you that the data are Big Endian and the SEGZ version in 1 (in ascii). The next four bytes are the 32 binary integer for the length of the preamble! Simple, eh!Here is a table with several examples of SEGY files defined as SEGZ, the SEGZ binary header (12 bytes), the variable length format file and the segy file.
SEGZ Header Preamble SEGY 3D Little Endian X X X 2D Big Endian simple example X X X 2D Big Endian X X X 2D Big Endian numerous defined fields X X X In order to create the SEGZ file I used the unix utility cat to append all the above files to a single SEGZ file. Next step is to create a SEGZ validation program. But first we need some rules:
Defaults for File-header-definition are:
- Fields are comma-delimited, case insensitive.
- White Space is legal
- The letters SEGZ-parameters is the first word on the first line of the preamble.
- The first line of each preamble stanza should be a list of TAGS. Tags currently include : Name, Length, Byte, Type, Count, Scalar, Addend, Description. Some tags can be optional (Scalar and addend) and some may only apply to certain stanzas (eg Length applies to the SEG-Z parameters stanza). Reading software must recognise the tags that are described by the format. SEG-Z files can contain new tags, but the reading software must bypass any fields for tags that it doesn't recognise. The SEG-Z standard will be deemed to be broken if REQUIRED TAGS (eg Name, Byte..) are missing from the file.
- Defaults for Textual-header length is 3200, File-header length is 400, Trace-header length is 240 and Trace-trailer is 0, ie what the SEGY header lengths are.
- If SCALAR is absent, use 1.0, and if ADDEND is absent, use 0.0, if COUNT is absent use 1. Scaler is a multiplier, a scaler of 10 means you multiply the number in the field by the number 10. A scaler of 0.25 means you multiply (same as divide by 4). The order of operations is to multiply first and then apply the addend. (applied after the scaler)
- Comments start with the character #
- Values stored in the header over ride the data. Values stored in a separate format file over ride both the preamble header and and the Line header.
- We re-addressed the question of whether XML should be used. We decided against it, because XML would requires more complicated coding (or to import other code such as libxml), and because XML would make the stanzas difficult for people to read, and would make them quite a big bigger, although that is a relatively minor issue. Using white space to line up the columns in the current preamble stanza format will make it very human-readable. So we still vote for NOT using XML.
Name, Byte,Type,Vector,Scalar,Addend,Description FILE_SAMP_RATE, 17,INT2,1,1,0,Sample Interval FILE_SAMP_NUM, 21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating pointNote - While the headers are real, the seismic data has been removed and now contains only ten samples for each trace.Here is the current list of Valid Names and descriptions, please examine this list and suggest corrections, updates or additions. This is where we need your help
February 22, 2008 - We have our first SEGY/Z viewer and dumper operational as seen on the series of images to the right.
SEGYZ Viewer/Dumper ![]()
SEGZ has the advantage of a small text file as shown under the SEG-Z tab that tells what (field name), where (byte location) and format type (integer floating point etc).
By having the fields pre declared, the user of segy data no longer has to inspect and test each and every data field.
Once these data are read, it becomes trivial to press a button to reformat the data into your unique standard of segy!
This dumper will read data from any segy format file including OPSeis with it's trace trailers!
February 20, 2008 - Jonathan makes a series of small points, name changes are made to more closely follow the names in SEGY rev(1), endianess now defined in the SEGZ Header before any numbers are read. February 11, 2008 - It has been suggested that we define a Velocity header that can easily be parsed similar to the format definition. Here's an example of :Velocity-definition { CDP,TIME,TIME_DATUM,TIME_SURFACE,RMS,INT_VEL,AVE_VEL,DEPTH 11,142,42,86,2310,2310,2310,99 ,215,115,159,2626,2955,2606,207 ,322,222,266,2897,3258,2868,381 ,422,322,366,3077,3511,3044,557 ,545,445,489,3100,3167,3075,752 ,717,617,661,3190,3433,3168,1047 ,828,728,772,3213,3347,3194,1233 ,962,862,906,3235,3359,3218,1458 ... 51,138,38,82,2400,2400,2400,98 ,192,92,136,2761,3233,2731,186 ,299,199,243,2919,3108,2897,352 ,410,310,354,2987,3131,2970,526 ,579,479,523,3100,3324,3085,807 ,740,640,684,3100,3100,3088,1056 ,867,767,811,3168,3512,3155,1279 ,1001,901,945,3281,3896,3260,1540 ,1143,1043,1087,3303,3446,3284,1785 ... }The above will easily load into a spread sheet for cdp's 11 and 51. What do you think? Is this a good idea? a bad idea? Should we do something similar for the mute pattern?January 16, 2008 - Randy Selzler (BP) - Recommends the endian of binary types need to be addressed explicitly in the preamble description. The endian of binary types needs to be addressed by the standard and probably specified explicitly in the preamble. I'd avoid draconian rules that force external representation to always be big (or small). Field by field control is probably overkill. That leaves a flag per dataset that specifies big or little. It is recommended that the endianess should be verified by examining the format code (TRACE_SAMP_FORMAT) in the binary header. All you have to do is test for a number between one and six. Workstation vendors make this test so interpretation projects can be a mixture and Big and Little Endian and the interpreter doesn't even know it!
January 10, 2008 - Reginald Beardsley6(su) I think a key part of this should be an official mapping of vendor keywords to the SEG-Z header name (done by the vendor). There's no reason to get concerned about names matching. What matters is getting the usage correct. The key to that is describing the content of a header properly, so it is unambiguous what it contains and what the representation is.
January 7, 2008 - Jonathan Ravens (GLOBE Claritas Development NZ) provided an extensive list of Suggested Names and SEG description. I combined this file with most of the fields from a SeisX/SeisWare definition as well as a few survey names.
This is our initial stab at proposing a list of names. What do you think? What have we missed? The above tables were created by examining several different interpretations of SEGY. These files can be examined in spreadsheet form:
- Suggested names and SEG descriptions for the SEGZ File and trace headers
It is very important that we need to agree on a list of names, this isn't just a neater system. The old system of declaring specific byte locations and number format clearly is not working. The New Proposed System will provide the flexibility for the future and can even able data to be properly loaded. Doug Bath (ITS segytool) believes we are on the right track and it should not be hard to update his software to read and write this new format. This will also make life easier for seismic folk to converse without having to translate (eg between SOURCE_POINT and ep, or ITRACE_FILE and tracr, to use a couple of su (Colorado School of Mines seismic unix) examples.
- 400 byte line (binary) header Spread sheet (updated Jan 7, 2008)
- 240 byte trace header Spread sheet (updated Jan 15, 2008) CGGVeritas have provided a Snippit of their mapping file..
My next step is to generate a series of data examples, I will window the data down to ten samples so we can see how this works. I will write two perl scripts that will create a SEGZ file by combining a SeisWare format file and a SEGY file, (seismicline.sgy, seismicline.fmt) and a second script that will take a SEGZ data file and split it into a SEGY file and a format file.
December 20, 2007 - Carmine Militano, Terry Perkin and Bob Loblaw (C&C Systems) Propose a variable length PREAMBLE to be appended to the front of SEGY files that contain a list of STANDARD NAMES to describe the FORMAT of the data. We are proposing that SEGZ Version 1 will handle most SEGY's are are currently in existance. The length of the Text-header is assumed to be 3200, the File-header 400 and the Trace-header 240. Version 1 would be used for most data. Here is what a typical Preamble might look like:
File-header { Name,Byte,Type,Vector,Scalar,Addend,Description FILE_SAMP_RATE,17,INT2,1,1,0,Sample Interval FILE_SAMP_NUM,21,INT2,1,1,0,Number of data samples per trace or the maximum number of Samples TRACE_SAMP_FORMAT,25,INT2,1,1,0,Data sample format code 1=IBM floating point } Trace-header { Name,Byte,Type,Vector,Scalar,Addend,Description SOURCE_POINT,25,INT4,1,.25,-100,Energy source point number (for some old Amoco data) CDP,21,INT4,1,1,0,Ensemble number (CDP CMP CRP etc) CDP_X,81,INT4,1,1,0,CoordinateX CDP_Y,85,INT4,1,1,0,CoordinateY }Note: The File-header-definition above can be considered to be the default. If you don't provide an overide then this is how the binary header is to be defined. These are the ONLY fields that I can find consistant agreement between the numerous different segy files I have seen in my career.
Due to the confusion of what floating point number format the actual data are in, we have provided for a Number-format-definition to over ride what is stored in the Binary (File) header.
We are also advocating the creation of a Version 2 that will contain the SEGZ-header-definition for variable length headers as well as the ability to handle trace trailers (OPSeis field data). Here is what this would look like:
SEGZ-parameters { #Comment - This header defines a normal SEGY file Name, Length,Description Textual-header, 3200, SEGY Rev(0) Default, length of character line header (bytes) File-header, 400, Length of binary line header (bytes) Trace-header, 240, Length of trace header (bytes) Trace-trailer, 960, Opseis format for field data Trace_Sample_Rate, IBM4, Format 1 in the binary header }The major focus of this committee (the HARD part) will be the creation of a standard set of names. This is where the RODE encapsulation format failed, it was too flexible! If you can't agree to a standard name, then please provide an alias table for the rest of us. I will work on providing an alias table for Colorado School of Mines su and Stanford's SEP processing documentation.
Here is a list of proposed number(character) formats. Can you think of any number types we might be missing?
Data Type Description INT1 1-byte integer INT2 2-byte integer INT4 4-byte integer INT8 8-byte integer UINT1 1-byte unsigned integer UINT2 2-byte unsigned integer UINT4 4-byte unsigned integer UINT8 8-byte unsigned integer BCD2 2-Byte Binary Coded Decimal BCD4 4-Byte Binary Coded Decimal IEEE4 4-byte IEEE float IEEE8 8-byte IEEE float IBM4 4-byte IBM float ASCII American Standard Code for Information Exchange EBCDIC Extended Binary Coded Decimal Interchange Code UNICODE Universal character set Here's an example of a File/Line/Binary/Reel header for some typical segy data:
Name Byte Type Vector Scalar Addend Description FILE_SAMP_RATE 17 INT2 1 1 0 Sample Interval FILE_SAMP_NUM 21 INT2 1 1 0 Number of data samples per trace or the maximum number of Samples TRACE_SAMP_FORMAT 25 INT2 1 1 0 Data sample format code 1=IBM floating point SEIS_DATUM 125 IEEE4 1 1 0 Seismic Datum elevation SEIS_REPLACEMENT_VELOCITY 129 IEEE4 1 1 0 Seismic Replacement Velocity LINE_NAME 301 ASCII 32 0 0 LineName Here's an example for a trace header definition:
Name Byte Type Vector Scalar Addend Description INLINE 9 INT4 1 1 0 Original field record number 3D inline number CROSSLINE 13 INT4 1 1 0 Original field record number 3D inline number SOURCE_POINT 17 IEEE4 1 1 0 Energy source point number SOURCE_POINT 25 INT4 1 0.25 -100 Energy source point number (for some old Amoco data) SOURCE_POINT 17 INT4 1 1000 0 Energy source point number (CSEG SEGY) CDP 21 INT4 1 1 0 Ensemble number (CDP CMP CRP etc) ITRACE_LINE 1 UINT4 1 1 0 TraceCounter for line TRTYPE 31 INT2 1 1 0 DeadTraceFlag, 1=data 2=dead 3=dummy 4=time break 5=uphole (etc) HORI_NSUM 33 INT2 1 1 0 Fold, No of horizontally summed traces in this trace REC_HT 41 IEEE4 1 1 0 Elevation, Elevation at receiver group CDP_X 81 IEEE4 1 1 0 CoordinateX, X-coordinate of ensemble (CDP) for this trace CDP_Y 85 IEEE4 1 1 0 CoordinateY Note how I have provided an example of how we can properly scale a shot point field (CSEG 1994 segy standard) and how we can calculate a shot point field from a trace counter in some old amoco data (four traces per shot point, first trace is shotpoint -100).
December 13, 2007 - Zig Doborzynski (EnCana) - provided the Opseis Documentation for a SEGY format that uses a 960 byte trace trailer. (This is an acquisition system in common use during the 1990's in the southern US). In order to handle these data, we have added another 32 bit unsigned integer to the SEGZ header to handle segy trace trailers. Zig does point out that the proposed SEGZ headers can be a maximum size of 4.294 gigabytes, probably larger than we really require.
December 12, 2007 - Jonathan Ravens (GLOBE Claritas Development NZ) has put a lot of thought into coming up with the following improvements for consideration:
The hard thing about this process is can we all agree on how we are going to describe our data fields and our number types? Take a look at this Field Naming Conventions (comma separated file) to appreciate the different ways we describe the same item. I have captured how Zokero, Paradigm, ARAM and CGGVeritas describe the same number fields. My suggestion is to come up with a few rules to be used to define field names:
- Expand the SEGZ header to 20 bytes to contain a version number
- Place the header mapping files inside the EBCDIC header
- Keep the mapping file as simple as possible, a simple column or comma delimited. XML is more complicated and verbose than what we need.
- Define a short field name, with a lengthy description
- Use short standard data types. ie INT2 for 16 bit integer.
- Provide for unsigned integers. ie UINT2 for unsigned 16 bit integers
- Provide for vectors (element count). Used for character (ASCII) fields or perhaps an AGC Scalar (two fields, window length and RMS amplitude value)
- Add field (linear shift). This would be useful in order to describe how to read for example old Mobil (or Amoco) segy data that does not have any shot points defined in the header. Workstation data loaders load such data by creating a formula that uses the sequential trace counter field and computes a shot point. For an example, you could specify an addend of -100 and a Scalar of 4 would label traces 1,2,3,4.... as shot points -100, -99.75, -99.5, -99.25.... The Scalar needs to be defined as the number which was used to scale, ie we need to divide by the number 4 in this example.
If we can agree, we can have a new standard.
- No more than 32 characters
- Upper/lower case insensitive. The programmer will convert every character to upper or lower case. (NUMSAMPLES is the same as NumSamples)
- If a number is placed in the Type data field, it will over ride the stored data. This can be used to correct the bad format numbers out there. SeisX used the number 1 for IEEE float (instead of IBMfloat). If you are in Calgary, you would use the number 5, if you follow the SEG rev(1) standard you would over ride the value with a five.
- A description or comment field must be provided
- Companies will provide an alias table if they can't use the agreed upon names.
November 20, 2007 - Tom Feigs (EnCana) suggests that there should be an auxiliary file (meta) that describes the format of the data. That way, you can avoid modifying the original SEGY record of legacy data. Basic archive principal states that the "archive" always remains as the original.
This second file, perhaps call seismic_line_name.xml sits next to seismic_line_name.sgy on your online file system. Here is an example xml file that describes part of the EnCana segy format standard. It would be logical to have such a meta file that describes the internal segy format for automatic data loading. This file could be part of the variable length EBCDIC header for the proposed SEGZ format.
May 8, 2007 - Mike Dennis (Divestco - WinPics) We need a flag in the binary header to define the number of samples at the trace level. It is proposed a flag of SamplesPerTrace=0 in the binary header be used to indicate that the number of samples are read from the trace header. At present, workstation software (WinPics, SeisWare and SeisX) read the number of samples from the binary header, the trace header is ignored. The Workstation Vendors prefer that all traces contain the same number of samples.
May 1, 2007 - John Townsley (Divestco - WinPics) points out that we need a definition of the UTM Zone or the Coordinate System. The names have been added, including the international EPSG codes www.epsg.org
April 27, 2007 - Dave Morgan (Divestco) makes the observation that the CSEGZ definition can be thought of as generalized SEGY object, with additional data types and variable length headers.
April 19, 2007 - Rick makes the suggestion that the mapping file should become part of the Text header and easy for a computer program to parse. We now have the tags map and /map to surround the map file format. It should be simple for the Workstation data loading to automatically parse the Text header and build it's format file. Here is what a Talisman like Mapping file (comma separated data fileds) would look like.
April 18, 2007 - Rick Macdonald (cggveritas) hopes "the format of your info to be strict and program-readable so we could read it directly instead of converting it manually to our mapping file format every time."
I have provided a Perl script to demonstrate one way to reformat the Proposed SEGYZ mapping file into SeisWare Keywords.
April 18, 2007 - Shayne Stogrin comments "One other minor thing. I am not really sure what to say about it, but the whole Scale thing I find a bit confusing. I am not confused by it, but it is non-transparent. When I think of scale I think that you should apply that scale to something, as opposed to the fact that the "scale" has already been applied. I personally prefer a scale factor that you need to apply to get the real data, so when you have a scale of 100, I would rather see a scalar of 0.01. I think the use of scale is definitely because of historical reasons, and I won't quibble with them. I just wonder if maybe a better terminology is all that's needed.
April 18, 2007 - Doug Horton (cggveritas) reports that they suggest "that we need to be able to handle big/little endian formats in the numbers definition".
Good idea. We can easily do this by fixing the position of the DataFormat Integer16Bit in Bytes 25-26 of the Line Header (SEGY byte position). These numbers are in the range of 1 to 6. A simple test with the two byte swap of this data field will identify the endianess of the data. (That's how SeisX and SeisWare automatically identifies whether the data came from SUN like machine or a PC).
April 17, 2007 - Shayne Stogrin (Zokero) makes the following suggestions
- Mixed case with capitalization will make names more XML like (possible way of the future).
- Things should be be more descriptive than terse. So for types I would suggest something like Integer16Bit vs INT2.
- CoordinateX would be a better name than UTMX
- The industry needs to support 64bit floats, especially for the surface coordinates in the US.
Prefix Header Definition
The following table is an example of what a SEGY file would look like converted to CSEGZ.
LABEL 4 ASCII characters SEGZ Descriptive (text) Header Integer32Bit 3200 File Header Integer32Bit 400 Trace Header Integer32Bit 240 There are two different ways that the loading information needs to be specified. We require specifications for Numbers and for Text description. Here is how to describe numbers for files that are formatted like a SeisX or a SeisWare segy file:
CSEGZ Numbers Definition
Only one number position will be defined as fixed and that is the DataFormat code in bytes 25-26 of the Binary Header. (SEGY byte position). It is needed in order to provide the convenient number in order to identify if the data are Big or Little Endian.Please note we have a Scalar field that needs some clarification. Previous versions of SEGY have used this field in order to handle fractional numbers. The 1999 CSEG standard declared that shot points were to be multiplied by the number 1000. Eg, shot point 100.25 would be stored as the number 100,250. Workstation software needs to divide the number by the Scalar.
Numbers Header Byte Type Scalar Comments SampleInterval File 17 Integer16Bit 1 Sample Interval SamplesPerTrace File 21 Integer16Bit 1 Number of data samples per trace or the maximum number of Samples DataFormat File 25 Integer16Bit 1 Data sample format code, 1=IBM floating point, 6=IEEE floating point SeismicDatum File 125 FloatIEE32Bit 1 Seismic Datum elevation SeismicReplacementVelocity File 129 FloatIEE32Bit 1 Seismic Replacement Velocity LineSequenceNumber Trace 9 Integer32Bit 1 Original field record number, 3D inline number TraceSequenceNumber Trace 13 Integer32Bit 1 Original field trace number, 3D cross line number ShotSequenceNumber Trace 17 FloatIEE32Bit 1 Energy source point number CDPNumber Trace 21 Integer32Bit 1 Ensemble number (CDP CMP CRP etc) TraceCounter Trace 25 Integer32Bit 1 Trace number within the ensemble DeadTraceFlag Trace 31 Integer16Bit 1 Number of vertically summed traces Fold Trace 33 Integer16Bit 1 Number of horizontally stacked traces Elevation Trace 41 FloatIEE32Bit 1 Receiver group elevation CoordinateX Trace 81 FloatIEE32Bit 1 Group Coordinate X CoordinateY Trace 85 FloatIEE32Bit 1 Group Coordinate Y Here is how to describe character (text) information:
CSEGZ Text Description (formally the EBCDIC Header)
Characters Header Byte Size Type LineName File 301 32 StringASCII Description Text 3120 80 StringASCII
Talisman like CSEGZ Description
Talisman (and many CSEG members) like to adhere to the original SEGY rev(0) definition that specifies all numbers specified in the header are integers. They use Scalars to handle the decimal point.
Numbers Header Byte Type Scalar SampleInterval File 17 Integer16Bit 1 SamplesPerTrace File 21 Integer16Bit 1 DataFormat File 25 Integer16Bit 1 LineSequenceNumber Trace 9 Integer32Bit 1 TraceSequenceNumber Trace 13 Integer32Bit 1 ShotSequenceNumber Trace 17 Integer32Bit 1000 CDPNumber Trace 21 Integer32Bit 1 DeadTraceFlag Trace 31 Integer16Bit 1 Fold Trace 33 Integer16Bit 1 Elevation Trace 41 Integer32Bit 100 CoordinateX Trace 81 Integer32Bit 100 CoordinateY Trace 85 Integer32Bit 100 Note that the shot points are multiplied by 1000 and the CDP bin coordinates by 100. Click here to view what the Talisman file would look like This file makes use of the Characters format in order to extract information such as Line Name out of the Descriptive (text) header. .
Data Field Naming Conventions
To make this work between different computers, we will need to standardize on what we call the different items. If we cannot agree what to call different items, at least we need to be as a minimum consistent. I have assembled the following table to illustrate some of the different names that are currently being used. Here is a link to the naming conventions spread sheet. We can see that every company likes to be a little bit different. The real trick with standards is to see if we can all come to consensus of what to call the various items!
Proposed SEGZ SeisWare/Zokero SeisX/Paradigm ARAM-Airies Veritas SampleInterval Sample Interval SAMPLE_INTERVAL Sample Interval SAMPLE_RATE SamplesPerTrace Samples Per Trace Undefined Number of Samples per Trace SAMPLES_PER_TRACE DataFormat Data Format Undefined Sample Format Code FORMAT_CODE SeismicDatum Seismic Datum SEIS_DATUM Undefined CDP_DATUM SeismicReplacementVelocity Seismic Replacement Velocity SEIS_RELACEMENT_VELOCITY Undefined CDP_VELR LineSequenceNumber Line Sequence Number LINE_SEQUENCE_NUMBER File Number INLINE TraceSequenceNumber Trace Sequence Number TRACE_SEQUENCE_NUMBER Trace Number Within File CROSS_LINE ShotSequenceNumber Shot Sequence Number SHOT_SEQUENCE_NUMBER Source point Number CDP_FSTN CDPNumber CDP Number CDP_NUMBER Undefined GROUP TraceCounter Trace Counter Undefined Undefined GROUP_TRACE DeadTraceFlag Dead Trace Flag Undefined Number of Composites VSUM Fold Fold Undefined Horizontal Sums per trace CDP_FOLD Elevation Elevation Undefined Undefined CDP_ELEV CoordinateX UTM X UTMX Undefined CDP_X CoordinateY UTM Y UTMY Undefined CDP_Y EPSGCode SurveyDatum SurveyGrid
Valid Data Types
Of course for this to all work we have to agree on our naming convention. Here is what is proposed for Valid Data types. Please contact the author if you can suggest better names:
Data Types Text Integer8Bit 8 bit integer Integer16Bit 16 bit integer Integer32Bit 32 bit integer Integer64Bit 64 bit integers FloatIEE32Bit 32 bit real number FloatIEEE64Bit 64 bit real number FloatIBM32Bit 32 bit IBM real number FloatIBM64Bit 64 bit IBM real number StringASCII TEXT StringEBCDIC TEXT StringUnicode TEXT The Data types are straight forward and can even handle 64 bit numbers!
Perl Script to generate SeisWare KeyWord files
Here is a simple script build_kwd.pl that will take as input our CSEGZ mapping file and generate a SeisWare KeyWord (.kwd) file that can be used to load your seismic into SeisWare.First we need to look at what the CSEGZ mapping file looks like: Here is how it works:
bash-2.03$ more veritas_3d_map.csv Numbers,Header,Byte,Type,Scalar SampleInterval,File,17,Integer16Bit,1 SamplesPerTrace,File,21,Integer16Bit,1 DataFormat,File,25,Integer16Bit,1 SeismicDatum,File,125,FloatIEEE32Bit,1 SeismicReplacementVelocity,File,129,FloatIEEE32Bit,1 LineSequenceNumber,Trace,9,Integer32Bit,1 TraceSequenceNumber,Trace,13,Integer32Bit,1 ShotSequenceNumber,Trace,17,FloatIEEE32Bit,1 CDPNumber,Trace,21,Integer32Bit,1 TraceCounter,Trace,25,Integer32Bit,1 DeadTraceFlag,Trace,31,Integer16Bit,1 Fold,Trace,33,Integer16Bit,1 Elevation,Trace,41,FloatIEEE32Bit,1 CoordinateX,Trace,81,FloatIEEE32Bit,1 CoordinateY,Trace,85,FloatIEEE32Bit,1 Characters,Header,Byte,Size,Type LineName,File,301,32,StringASCII Description,Text,3120,80,StringASCIIThis example script will create the keywords to load most of the data into SeisWare. Use the -f option to specify the mapping file to use and direct the output to the file veritas_3d.kwd.bash-2.03$ ~ekeyser/Perl/build_kwd.pl -f veritas_3d_map.csv > veritas_3d.kwdLet's take a look at our new mapping filebash-2.03$ more veritas_3d.kwd # Self describing asciiJust point the SeisWare data loader to this file and the data can be loaded! It should be a trivial exercise to edit this script to generate all the KeyWords that SeisWare can accept!struct HeaderWord { Name Header Byte Type Scale } struct HeaderString { Name Header Byte Size Type } HeaderWord "Line Sequence Number" Trace 8 32BitInteger 1 HeaderWord "Trace Sequence Number" Trace 12 32BitInteger 1 HeaderWord "UTM X" Trace 80 16BitInteger 1 HeaderWord "UTM Y" Trace 84 16BitInteger 1 HeaderString "Line Name" File 300 32 ASCII HeaderString "Description" EBCDIC 3119 80 ASCII
Site Owner: Eric Keyser