Data section

Handling text, dates, timestamps, or any non-numeric characters

By default, lasio will attempt to convert each column of the data section into floating-point numbers. If that fails, as it will for non-numeric characters, then the column will be returned as text (str). The behavour can be controlled by specifing the data type as either int, float or str per column using the dtypes keyword argument to lasio.LASFile.read().

See the example data_characters.las:

~A TIME       DATE       DEPT ARC_GR_UNC_RT
00:00:00 01-Jan-20  1500.2435        126.56
00:00:01 01-Jan-20  1500.3519        126.56
>>> import lasio.examples
>>> las = lasio.examples.open("data_characters.las")
>>> las["TIME"]
array(['00:00:00', '00:00:01'], dtype='<U32')
>>> las["DATE"]
array(['01-Jan-20', '01-Jan-20'], dtype='<U32')
>>> las["DEPT"]
array([1500.2435, 1500.3519])
>>> las["ARC_GR_UNC_RT"]
array([126.56, 126.56])
>>> las.df().reset_index().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   TIME           2 non-null      object
 1   DATE           2 non-null      object
 2   DEPT           2 non-null      float64
 3   ARC_GR_UNC_RT  2 non-null      float64
dtypes: float64(2), object(2)
memory usage: 192.0+ bytes

lasio doesn’t yet understand dates and timestamps natively, but you can do these conversions with pandas:

>>> las["DATE_DT"] = pd.to_datetime(las["DATE"]).values

Repeated/duplicate curve mnemonics

LAS files don’t always have unique mnemonics for each curve, but that makes it difficult to retrieve curves by their mnemonic! lasio handles this by appending :1, :2, etc. to the end of repeat/duplicate mnemonics. For an example, see a LAS file with this ~C section, with “SFLU” duplicated:

~CURVE INFORMATION
#MNEM.UNIT      API CODE      CURVE DESCRIPTION
#---------    -------------   ------------------------------
DEPT.M                      :  1  DEPTH
DT  .US/M                        :  2  SONIC TRANSIT TIME
RHOB.K/M3                   :  3  BULK DENSITY
NPHI.V/V                    :  4   NEUTRON POROSITY
SFLU.OHMM                   :  5  RXO RESISTIVITY
SFLU.OHMM                   :  6  SHALLOW RESISTIVITY
ILM .OHMM                   :  7  MEDIUM RESISTIVITY
ILD .OHMM                   :  8  DEEP RESISTIVITY

This is represented in the following way:

>>> import lasio.examples
>>> las = lasio.examples.open("mnemonic_duplicate.las")
>>> print(las.curves)
Mnemonic  Unit  Value  Description
--------  ----  -----  -----------
DEPT      M            1  DEPTH
DT        US/M         2  SONIC TRANSIT TIME
RHOB      K/M3         3  BULK DENSITY
NPHI      V/V          4   NEUTRON POROSITY
SFLU:1    OHMM         5  RXO RESISTIVITY
SFLU:2    OHMM         6  SHALLOW RESISTIVITY
ILM       OHMM         7  MEDIUM RESISTIVITY
ILD       OHMM         8  DEEP RESISTIVITY
>>> las["SFLU:1"]
array([123.45, 123.45, 123.45])
>>> las["SFLU:2"]
array([125.45, 125.45, 125.45])

Note that the actual mnemonic is not present, to avoid ambiguity about which curve would be expected to be returned:

>>> las["SFLU"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\devapps\kinverarity\projects\lasio\lasio\las.py", line 661, in __getitem__
    raise KeyError("{} not found in curves ({})".format(key, curve_mnemonics))
KeyError: "SFLU not found in curves (['DEPT', 'DT', 'RHOB', 'NPHI', 'SFLU:1', 'SFLU:2', 'ILM', 'ILD'])"

Note also that lasio remembers the original mnemonic so that on writing the file out, the original mnemonics are replicated:

>>> import sys
>>> las.write(sys.stdout)
...
~Curve Information -----------------------------------------
DEPT.M     : 1  DEPTH
DT  .US/M  : 2  SONIC TRANSIT TIME
RHOB.K/M3  : 3  BULK DENSITY
NPHI.V/V   : 4   NEUTRON POROSITY
SFLU.OHMM  : 5  RXO RESISTIVITY
SFLU.OHMM  : 6  SHALLOW RESISTIVITY
ILM .OHMM  : 7  MEDIUM RESISTIVITY
ILD .OHMM  : 8  DEEP RESISTIVITY
...

Ignoring commented-out lines

Sometimes data sections have comment line inside them. By default lasio will ignore any lines starting with the “#” character within the data section. You can control this using the remove_data_line_filter='#' argument to lasio.LASFile.read().

Ignoring the data section

Lasio can ignore the data section by setting ignore_data to true:
lasio.read(file, ignore_date=True)

This will completely skip reading the data section and the returned object will just contain the header metadata section.

A quick way to see the expected column names is:
lasio.read(file, ignore_data=True).keys()
To re-run without ignore_data:
lasio.read(file).keys()

If this returns a different set of columns then there may be a data parsing error. In this case, if incorrect parsing causes lasio to create extra columns they will be named ‘UKNOWN:1’, ‘UNKNOWN:2’, ‘UNKNOWN:<n>’… This can usually be fixed by tuning lasio.read()’s read_policy or null_policy options.

Handling errors with read_policy and null_policy

lasio has a flexible way of handling “errors” in the ~ASCII data section to accommodate how strict or flexible you want to be. The two main tools are read_policy and null_policy. These are optional arguments to lasio.LASFile.read(). Each defaults to common options which can be overridden either by other pre-set options or by a list of specific options. These policy settings are configured in lasio/defaults.py.

By default, lasio.read(f) runs as if explicitly set to lasio.read(f, read_policy='default', null_policy='common').

Examples of policy override syntax

Change only read_policy with one of the builtin policy sets:
lasio.read(f, read_policy='comma-delimiter')
Change only null_policy with one of the builtin policy sets:
lasio.read(f, null_policy='aggressive')
Change both read_policy and null_policy with builtin policies:
lasio.read(f, read_policy='comma-delimiter', null_policy='none')
Change read_policy with specific policies (found in defaults.py):
lasio.read(f, read_policy=["comma-decimal-mark", "run-on(.)"])
Change null_policy with your own hard-coded options:
lasio.read(f, null_policy=["9999.25", "999.25", "NA", "INF", "IO", "IND"])

Example errors

Here are some examples of errors.

  • Files could contain a variety of indicators for an invalid data point other than that defined by the NULL line in the LAS header (usually -999.25).
  • Fixed-width columns could run into each other:
7686.500    64.932     0.123     0.395    12.403   156.271    10.649    -0.005   193.223   327.902    -0.023     4.491     2.074    29.652
7686.000    67.354     0.140     0.415     9.207  4648.011    10.609    -0.004  3778.709  1893.751    -0.048     4.513     2.041   291.910
7685.500    69.004     0.151     0.412     7.020101130.188    10.560    -0.004 60000.000  2901.317    -0.047     4.492     2.046   310.119
7685.000    68.809     0.150     0.411     7.330109508.961    10.424    -0.005 60000.000  2846.619    -0.042     4.538     2.049   376.968
7684.500    68.633     0.149     0.402     7.345116238.453    10.515    -0.005 60000.000  2290.275    -0.051     4.543     2.063   404.972
7684.000    68.008     0.144     0.386     7.682  4182.679    10.515    -0.004  3085.681  1545.842    -0.046     4.484     2.089   438.195
  • Odd text such as (null):
8090.00         -999.25         -999.25         -999.25               0               0               0               0               0               0               0               0
8091.000          0.70          337.70          (null)               0               0               0               0               0               0               0               0
8092.000        -999.25         -999.25         -999.25               0               0               0               0               0              0               0               0

Handling run-on errors

lasio detects and handles these problems by default using lasio.read(f, read_policy='default'). For example a file with this data section:

~A
    7686.000    67.354     0.140     0.415     9.207  4648.011    10.609
    7685.500    69.004     0.151     0.412     7.020101130.188    10.560
    7685.000    68.809     0.150     0.411     7.330-19508.961    10.424
    7684.500    68.633     0.149     0.402     7.345116238.453    10.515
    7684.000    68.008     0.144     0.386     7.682  4182.679    10.515

is loaded by default as the following:

>>> import lasio.examples
>>> las = lasio.examples.open('null_policy_runon.las')
>>> las.data
array([[7686.0, 67.354, 0.14, 0.415, 9.207, 4648.011, 10.609],
       [7685.5, 69.004, 0.151, 0.412, nan, nan, 10.56],
       [7685.0, 68.809, 0.15, 0.411, 7.33, -19508.961, 10.424],
       [7684.5, 68.633, 0.149, 0.402, nan, nan, 10.515],
       [7684.0, 68.008, 0.144, 0.386, 7.682, 4182.679, 10.515]])

Handling invalid data indicators automatically

These are detected by lasio to a degree which you can control with the null_policy keyword argument.

You can specify a policy of ‘none’, ‘strict’, ‘common’, ‘aggressive’, or ‘all’. These policies all include a subset of pre-defined substitutions. Or you can give your own list of substitutions. Here is the list of predefined policies and substitutions from lasio.defaults.

Policies that you can pick with e.g. null_policy='common':

NULL_POLICIES = {
    'none': [],
    'strict': ['NULL', ],
    'common': ['NULL', '(null)', '-',
               '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND'],
    'aggressive': ['NULL', '(null)', '--',
                   '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND',
                   '999', '999.99', '9999', '9999.99' '2147483647', '32767',
                   '-0.0', ],
    'all': ['NULL', '(null)', '-',
            '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND',
            '999', '999.99', '9999', '9999.99' '2147483647', '32767', '-0.0',
            'numbers-only', ],
    'numbers-only': ['numbers-only', ]
    }

Or substitutions you could specify with e.g. null_policy=['NULL', '999.25', 'INF']:

NULL_SUBS = {
    'NULL': [None, ],                       # special case to be handled
    '999.25': [-999.25, 999.25],
    '9999.25': [-9999.25, 9999.25],
    '999.99': [-999.99, 999.99],
    '9999.99': [-9999.99, 9999.99],
    '999': [-999, 999],
    '9999': [-9999, 9999],
    '2147483647': [-2147483647, 2147483647],
    '32767': [-32767, 32767],
    'NA': [(re.compile(r'(#N/A)[ ]'), ' NaN '),
           (re.compile(r'[ ](#N/A)'), ' NaN '), ],
    'INF': [(re.compile(r'(-?1\.#INF)[ ]'), ' NaN '),
            (re.compile(r'[ ](-?1\.#INF)'), ' NaN '), ],
    'IO': [(re.compile(r'(-?1\.#IO)[ ]'), ' NaN '),
           (re.compile(r'[ ](-?1\.#IO)'), ' NaN '), ],
    'IND': [(re.compile(r'(-?1\.#IND)[ ]'), ' NaN '),
            (re.compile(r'[ ](-?1\.#IND)'), ' NaN '), ],
    '-0.0': [(re.compile(r'(-?0\.0+)[ ]'), ' NaN '),
             (re.compile(r'[ ](-?0\.0+)'), ' NaN '), ],
    'numbers-only': [(re.compile(r'([^ 0-9.\-+]+)[ ]'), ' NaN '),
                     (re.compile(r'[ ]([^ 0-9.\-+]+)'), ' NaN '), ],
    }

You can also specify substitutions directly. E.g. for a file with this data section:

~A  DEPTH     DT       RHOB     NPHI     SFLU     SFLA      ILM      ILD
1670.000    9998  2550.000    0.450  123.450  123.450  110.200  105.600
1669.875    9999  2550.000    0.450  123.450  123.450  110.200  105.600
1669.750   10000       ERR    0.450  123.450  -999.25  110.200  105.600

By default, it will read all data as a string due to the presence of “ERR”:

>>> las = lasio.examples.open('null_policy_ERR.las')
>>> las.data
array([['1670.0', '9998.0', '2550.0', '0.45', '123.45', '123.45',
        '110.2', '105.6'],
       ['1669.875', '9999.0', '2550.0', '0.45', '123.45', '123.45',
        '110.2', '105.6'],
       ['1669.75', '10000.0', 'ERR', '0.45', '123.45', '-999.25',
        '110.2', '105.6']], dtype='<U32')

We can fix it by using an explicit NULL policy.

>>> las = lasio.examples.open('null_policy_ERR.las', null_policy=[('ERR', ' NaN ')])
>>> las.data
array([[ 1.670000e+03,  9.998000e+03,  2.550000e+03,  4.500000e-01,
         1.234500e+02,  1.234500e+02,  1.102000e+02,  1.056000e+02],
       [ 1.669875e+03,  9.999000e+03,  2.550000e+03,  4.500000e-01,
         1.234500e+02,  1.234500e+02,  1.102000e+02,  1.056000e+02],
       [ 1.669750e+03,  1.000000e+04,           nan,  4.500000e-01,
         1.234500e+02, -9.992500e+02,  1.102000e+02,  1.056000e+02]])

See tests/test_null_policy.py (link) for some examples.