Data section
Handling text, dates, timestamps, or any non-numeric characters
By default, lasio will attempt to convert each column of the data section
into floating-point numbers. If that fails, as it will for non-numeric
characters, then the column will be returned as text (str). The behavour
can be controlled by specifing the data type as either int, float or
str per column using the dtypes keyword argument to
lasio.LASFile.read().
See the example data_characters.las:
~A TIME DATE DEPT ARC_GR_UNC_RT
00:00:00 01-Jan-20 1500.2435 126.56
00:00:01 01-Jan-20 1500.3519 126.56
>>> import lasio.examples
>>> las = lasio.examples.open("data_characters.las")
>>> las["TIME"]
array(['00:00:00', '00:00:01'], dtype='<U32')
>>> las["DATE"]
array(['01-Jan-20', '01-Jan-20'], dtype='<U32')
>>> las["DEPT"]
array([1500.2435, 1500.3519])
>>> las["ARC_GR_UNC_RT"]
array([126.56, 126.56])
>>> las.df().reset_index().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TIME 2 non-null object
1 DATE 2 non-null object
2 DEPT 2 non-null float64
3 ARC_GR_UNC_RT 2 non-null float64
dtypes: float64(2), object(2)
memory usage: 192.0+ bytes
lasio doesn’t yet understand dates and timestamps natively, but you can do these conversions with pandas:
>>> las["DATE_DT"] = pd.to_datetime(las["DATE"]).values
Repeated/duplicate curve mnemonics
LAS files don’t always have unique mnemonics for each curve, but that
makes it difficult to retrieve curves by their mnemonic! lasio handles this
by appending :1, :2, etc. to the end of repeat/duplicate mnemonics.
For an example, see a LAS file with this ~C section, with “SFLU” duplicated:
~CURVE INFORMATION
#MNEM.UNIT API CODE CURVE DESCRIPTION
#--------- ------------- ------------------------------
DEPT.M : 1 DEPTH
DT .US/M : 2 SONIC TRANSIT TIME
RHOB.K/M3 : 3 BULK DENSITY
NPHI.V/V : 4 NEUTRON POROSITY
SFLU.OHMM : 5 RXO RESISTIVITY
SFLU.OHMM : 6 SHALLOW RESISTIVITY
ILM .OHMM : 7 MEDIUM RESISTIVITY
ILD .OHMM : 8 DEEP RESISTIVITY
This is represented in the following way:
>>> import lasio.examples
>>> las = lasio.examples.open("mnemonic_duplicate.las")
>>> print(las.curves)
Mnemonic Unit Value Description
-------- ---- ----- -----------
DEPT M 1 DEPTH
DT US/M 2 SONIC TRANSIT TIME
RHOB K/M3 3 BULK DENSITY
NPHI V/V 4 NEUTRON POROSITY
SFLU:1 OHMM 5 RXO RESISTIVITY
SFLU:2 OHMM 6 SHALLOW RESISTIVITY
ILM OHMM 7 MEDIUM RESISTIVITY
ILD OHMM 8 DEEP RESISTIVITY
>>> las["SFLU:1"]
array([123.45, 123.45, 123.45])
>>> las["SFLU:2"]
array([125.45, 125.45, 125.45])
Note that the actual mnemonic is not present, to avoid ambiguity about which curve would be expected to be returned:
>>> las["SFLU"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\devapps\kinverarity\projects\lasio\lasio\las.py", line 661, in __getitem__
raise KeyError("{} not found in curves ({})".format(key, curve_mnemonics))
KeyError: "SFLU not found in curves (['DEPT', 'DT', 'RHOB', 'NPHI', 'SFLU:1', 'SFLU:2', 'ILM', 'ILD'])"
Note also that lasio remembers the original mnemonic so that on writing the file out, the original mnemonics are replicated:
>>> import sys
>>> las.write(sys.stdout)
...
~Curve Information -----------------------------------------
DEPT.M : 1 DEPTH
DT .US/M : 2 SONIC TRANSIT TIME
RHOB.K/M3 : 3 BULK DENSITY
NPHI.V/V : 4 NEUTRON POROSITY
SFLU.OHMM : 5 RXO RESISTIVITY
SFLU.OHMM : 6 SHALLOW RESISTIVITY
ILM .OHMM : 7 MEDIUM RESISTIVITY
ILD .OHMM : 8 DEEP RESISTIVITY
...
Ignoring commented-out lines
Sometimes data sections have comment line inside them. By default lasio will ignore
any lines starting with the “#” character within the data section. You can
control this using the remove_data_line_filter='#' argument to
lasio.LASFile.read().
Ignoring the data section
- Lasio can ignore the data section by setting ignore_data to true:
lasio.read(file, ignore_date=True)
This will completely skip reading the data section and the returned object will just contain the header metadata section.
- A quick way to see the expected column names is:
lasio.read(file, ignore_data=True).keys()- To re-run without ignore_data:
lasio.read(file).keys()
If this returns a different set of columns then there may be a data parsing error. In this case, if incorrect parsing causes lasio to create extra columns they will be named ‘UKNOWN:1’, ‘UNKNOWN:2’, ‘UNKNOWN:<n>’… This can usually be fixed by tuning lasio.read()’s read_policy or null_policy options.
Handling errors with read_policy and null_policy
lasio has a flexible way of handling “errors” in the ~ASCII data section to
accommodate how strict or flexible you want to be. The two main tools are
read_policy and null_policy. These are optional arguments to
lasio.LASFile.read(). Each defaults to common options which can be
overridden either by other pre-set options or by a list of specific options.
These policy settings are configured in lasio/defaults.py.
By default, lasio.read(f) runs as if explicitly set to lasio.read(f,
read_policy='default', null_policy='common').
Examples of policy override syntax
- Change only read_policy with one of the builtin policy sets:
lasio.read(f, read_policy='comma-delimiter')- Change only null_policy with one of the builtin policy sets:
lasio.read(f, null_policy='aggressive')- Change both read_policy and null_policy with builtin policies:
lasio.read(f, read_policy='comma-delimiter', null_policy='none')- Change read_policy with specific policies (found in defaults.py):
lasio.read(f, read_policy=["comma-decimal-mark", "run-on(.)"])- Change null_policy with your own hard-coded options:
lasio.read(f, null_policy=["9999.25", "999.25", "NA", "INF", "IO", "IND"])
Example errors
Here are some examples of errors.
Files could contain a variety of indicators for an invalid data point other than that defined by the NULL line in the LAS header (usually -999.25).
Fixed-width columns could run into each other:
7686.500 64.932 0.123 0.395 12.403 156.271 10.649 -0.005 193.223 327.902 -0.023 4.491 2.074 29.652
7686.000 67.354 0.140 0.415 9.207 4648.011 10.609 -0.004 3778.709 1893.751 -0.048 4.513 2.041 291.910
7685.500 69.004 0.151 0.412 7.020101130.188 10.560 -0.004 60000.000 2901.317 -0.047 4.492 2.046 310.119
7685.000 68.809 0.150 0.411 7.330109508.961 10.424 -0.005 60000.000 2846.619 -0.042 4.538 2.049 376.968
7684.500 68.633 0.149 0.402 7.345116238.453 10.515 -0.005 60000.000 2290.275 -0.051 4.543 2.063 404.972
7684.000 68.008 0.144 0.386 7.682 4182.679 10.515 -0.004 3085.681 1545.842 -0.046 4.484 2.089 438.195
Odd text such as
(null):
8090.00 -999.25 -999.25 -999.25 0 0 0 0 0 0 0 0
8091.000 0.70 337.70 (null) 0 0 0 0 0 0 0 0
8092.000 -999.25 -999.25 -999.25 0 0 0 0 0 0 0 0
Handling run-on errors
lasio detects and handles these problems by default using lasio.read(f,
read_policy='default'). For example a file with this data section:
~A
7686.000 67.354 0.140 0.415 9.207 4648.011 10.609
7685.500 69.004 0.151 0.412 7.020101130.188 10.560
7685.000 68.809 0.150 0.411 7.330-19508.961 10.424
7684.500 68.633 0.149 0.402 7.345116238.453 10.515
7684.000 68.008 0.144 0.386 7.682 4182.679 10.515
is loaded by default as the following:
>>> import lasio.examples
>>> las = lasio.examples.open('null_policy_runon.las')
>>> las.data
array([[7686.0, 67.354, 0.14, 0.415, 9.207, 4648.011, 10.609],
[7685.5, 69.004, 0.151, 0.412, nan, nan, 10.56],
[7685.0, 68.809, 0.15, 0.411, 7.33, -19508.961, 10.424],
[7684.5, 68.633, 0.149, 0.402, nan, nan, 10.515],
[7684.0, 68.008, 0.144, 0.386, 7.682, 4182.679, 10.515]])
Handling invalid data indicators automatically
These are detected by lasio to a degree which you can control with the null_policy keyword argument.
You can specify a policy of ‘none’, ‘strict’, ‘common’, ‘aggressive’, or
‘all’. These policies all include a subset of pre-defined substitutions. Or
you can give your own list of substitutions. Here is the list of predefined
policies and substitutions from lasio.defaults.
Policies that you can pick with e.g. null_policy='common':
NULL_POLICIES = {
'none': [],
'strict': ['NULL', ],
'common': ['NULL', '(null)', '-',
'9999.25', '999.25', 'NA', 'INF', 'IO', 'IND'],
'aggressive': ['NULL', '(null)', '--',
'9999.25', '999.25', 'NA', 'INF', 'IO', 'IND',
'999', '999.99', '9999', '9999.99' '2147483647', '32767',
'-0.0', ],
'all': ['NULL', '(null)', '-',
'9999.25', '999.25', 'NA', 'INF', 'IO', 'IND',
'999', '999.99', '9999', '9999.99' '2147483647', '32767', '-0.0',
'numbers-only', ],
'numbers-only': ['numbers-only', ]
}
Or substitutions you could specify with e.g. null_policy=['NULL', '999.25',
'INF']:
NULL_SUBS = {
'NULL': [None, ], # special case to be handled
'999.25': [-999.25, 999.25],
'9999.25': [-9999.25, 9999.25],
'999.99': [-999.99, 999.99],
'9999.99': [-9999.99, 9999.99],
'999': [-999, 999],
'9999': [-9999, 9999],
'2147483647': [-2147483647, 2147483647],
'32767': [-32767, 32767],
'NA': [(re.compile(r'(#N/A)[ ]'), ' NaN '),
(re.compile(r'[ ](#N/A)'), ' NaN '), ],
'INF': [(re.compile(r'(-?1\.#INF)[ ]'), ' NaN '),
(re.compile(r'[ ](-?1\.#INF)'), ' NaN '), ],
'IO': [(re.compile(r'(-?1\.#IO)[ ]'), ' NaN '),
(re.compile(r'[ ](-?1\.#IO)'), ' NaN '), ],
'IND': [(re.compile(r'(-?1\.#IND)[ ]'), ' NaN '),
(re.compile(r'[ ](-?1\.#IND)'), ' NaN '), ],
'-0.0': [(re.compile(r'(-?0\.0+)[ ]'), ' NaN '),
(re.compile(r'[ ](-?0\.0+)'), ' NaN '), ],
'numbers-only': [(re.compile(r'([^ 0-9.\-+]+)[ ]'), ' NaN '),
(re.compile(r'[ ]([^ 0-9.\-+]+)'), ' NaN '), ],
}
You can also specify substitutions directly. E.g. for a file with this data section:
~A DEPTH DT RHOB NPHI SFLU SFLA ILM ILD
1670.000 9998 2550.000 0.450 123.450 123.450 110.200 105.600
1669.875 9999 2550.000 0.450 123.450 123.450 110.200 105.600
1669.750 10000 ERR 0.450 123.450 -999.25 110.200 105.600
By default, it will read all data as a string due to the presence of “ERR”:
>>> las = lasio.examples.open('null_policy_ERR.las')
>>> las.data
array([['1670.0', '9998.0', '2550.0', '0.45', '123.45', '123.45',
'110.2', '105.6'],
['1669.875', '9999.0', '2550.0', '0.45', '123.45', '123.45',
'110.2', '105.6'],
['1669.75', '10000.0', 'ERR', '0.45', '123.45', '-999.25',
'110.2', '105.6']], dtype='<U32')
We can fix it by using an explicit NULL policy.
>>> las = lasio.examples.open('null_policy_ERR.las', null_policy=[('ERR', ' NaN ')])
>>> las.data
array([[ 1.670000e+03, 9.998000e+03, 2.550000e+03, 4.500000e-01,
1.234500e+02, 1.234500e+02, 1.102000e+02, 1.056000e+02],
[ 1.669875e+03, 9.999000e+03, 2.550000e+03, 4.500000e-01,
1.234500e+02, 1.234500e+02, 1.102000e+02, 1.056000e+02],
[ 1.669750e+03, 1.000000e+04, nan, 4.500000e-01,
1.234500e+02, -9.992500e+02, 1.102000e+02, 1.056000e+02]])
See tests/test_null_policy.py (link)
for some examples.