Data section ============ Handling text, dates, timestamps, or any non-numeric characters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, lasio will attempt to convert each column of the data section into floating-point numbers. If that fails, as it will for non-numeric characters, then the column will be returned as text (``str``). The behavour can be controlled by specifing the data type as either ``int``, ``float`` or ``str`` per column using the ``dtypes`` keyword argument to :meth:`lasio.LASFile.read`. See the example ``data_characters.las``: .. code-block:: none ~A TIME DATE DEPT ARC_GR_UNC_RT 00:00:00 01-Jan-20 1500.2435 126.56 00:00:01 01-Jan-20 1500.3519 126.56 .. code-block:: python >>> import lasio.examples >>> las = lasio.examples.open("data_characters.las") >>> las["TIME"] array(['00:00:00', '00:00:01'], dtype='>> las["DATE"] array(['01-Jan-20', '01-Jan-20'], dtype='>> las["DEPT"] array([1500.2435, 1500.3519]) >>> las["ARC_GR_UNC_RT"] array([126.56, 126.56]) >>> las.df().reset_index().info() RangeIndex: 2 entries, 0 to 1 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TIME 2 non-null object 1 DATE 2 non-null object 2 DEPT 2 non-null float64 3 ARC_GR_UNC_RT 2 non-null float64 dtypes: float64(2), object(2) memory usage: 192.0+ bytes lasio doesn't yet understand dates and timestamps natively, but you can do these conversions with pandas: >>> las["DATE_DT"] = pd.to_datetime(las["DATE"]).values Repeated/duplicate curve mnemonics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LAS files don't always have unique mnemonics for each curve, but that makes it difficult to retrieve curves by their mnemonic! lasio handles this by appending ``:1``, ``:2``, etc. to the end of repeat/duplicate mnemonics. For an example, see a LAS file with this ~C section, with "SFLU" duplicated: .. code-block:: none ~CURVE INFORMATION #MNEM.UNIT API CODE CURVE DESCRIPTION #--------- ------------- ------------------------------ DEPT.M : 1 DEPTH DT .US/M : 2 SONIC TRANSIT TIME RHOB.K/M3 : 3 BULK DENSITY NPHI.V/V : 4 NEUTRON POROSITY SFLU.OHMM : 5 RXO RESISTIVITY SFLU.OHMM : 6 SHALLOW RESISTIVITY ILM .OHMM : 7 MEDIUM RESISTIVITY ILD .OHMM : 8 DEEP RESISTIVITY This is represented in the following way: >>> import lasio.examples >>> las = lasio.examples.open("mnemonic_duplicate.las") >>> print(las.curves) Mnemonic Unit Value Description -------- ---- ----- ----------- DEPT M 1 DEPTH DT US/M 2 SONIC TRANSIT TIME RHOB K/M3 3 BULK DENSITY NPHI V/V 4 NEUTRON POROSITY SFLU:1 OHMM 5 RXO RESISTIVITY SFLU:2 OHMM 6 SHALLOW RESISTIVITY ILM OHMM 7 MEDIUM RESISTIVITY ILD OHMM 8 DEEP RESISTIVITY >>> las["SFLU:1"] array([123.45, 123.45, 123.45]) >>> las["SFLU:2"] array([125.45, 125.45, 125.45]) Note that the actual mnemonic is not present, to avoid ambiguity about which curve would be expected to be returned: .. code-block:: python >>> las["SFLU"] .. code-block:: console Traceback (most recent call last): File "", line 1, in File "C:\devapps\kinverarity\projects\lasio\lasio\las.py", line 661, in __getitem__ raise KeyError("{} not found in curves ({})".format(key, curve_mnemonics)) KeyError: "SFLU not found in curves (['DEPT', 'DT', 'RHOB', 'NPHI', 'SFLU:1', 'SFLU:2', 'ILM', 'ILD'])" Note also that lasio remembers the original mnemonic so that on writing the file out, the original mnemonics are replicated: .. code-block:: python >>> import sys >>> las.write(sys.stdout) ... ~Curve Information ----------------------------------------- DEPT.M : 1 DEPTH DT .US/M : 2 SONIC TRANSIT TIME RHOB.K/M3 : 3 BULK DENSITY NPHI.V/V : 4 NEUTRON POROSITY SFLU.OHMM : 5 RXO RESISTIVITY SFLU.OHMM : 6 SHALLOW RESISTIVITY ILM .OHMM : 7 MEDIUM RESISTIVITY ILD .OHMM : 8 DEEP RESISTIVITY ... Ignoring commented-out lines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sometimes data sections have comment line inside them. By default lasio will ignore any lines starting with the "#" character within the data section. You can control this using the ``remove_data_line_filter='#'`` argument to :meth:`lasio.LASFile.read`. Ignoring the data section ~~~~~~~~~~~~~~~~~~~~~~~~~ Lasio can ignore the data section by setting ignore_data to true: ``lasio.read(file, ignore_date=True)`` This will completely skip reading the data section and the returned object will just contain the header metadata section. A quick way to see the expected column names is: ``lasio.read(file, ignore_data=True).keys()`` To re-run without ignore_data: ``lasio.read(file).keys()`` If this returns a different set of columns then there may be a data parsing error. In this case, if incorrect parsing causes lasio to create extra columns they will be named 'UKNOWN:1', 'UNKNOWN:2', 'UNKNOWN:'... This can usually be fixed by tuning lasio.read()'s read_policy or null_policy options. Handling errors with read_policy and null_policy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ lasio has a flexible way of handling "errors" in the ~ASCII data section to accommodate how strict or flexible you want to be. The two main tools are ``read_policy`` and ``null_policy``. These are optional arguments to :meth:`lasio.LASFile.read`. Each defaults to common options which can be overridden either by other pre-set options or by a list of specific options. These policy settings are configured in ``lasio/defaults.py``. By default, ``lasio.read(f)`` runs as if explicitly set to ``lasio.read(f, read_policy='default', null_policy='common')``. Examples of policy override syntax ---------------------------------- Change only read_policy with one of the builtin policy sets: ``lasio.read(f, read_policy='comma-delimiter')`` Change only null_policy with one of the builtin policy sets: ``lasio.read(f, null_policy='aggressive')`` Change both read_policy and null_policy with builtin policies: ``lasio.read(f, read_policy='comma-delimiter', null_policy='none')`` Change read_policy with specific policies (found in defaults.py): ``lasio.read(f, read_policy=["comma-decimal-mark", "run-on(.)"])`` Change null_policy with your own hard-coded options: ``lasio.read(f, null_policy=["9999.25", "999.25", "NA", "INF", "IO", "IND"])`` Example errors -------------- Here are some examples of errors. * Files could contain a variety of indicators for an invalid data point other than that defined by the NULL line in the LAS header (usually -999.25). * Fixed-width columns could run into each other: .. code-block:: none 7686.500 64.932 0.123 0.395 12.403 156.271 10.649 -0.005 193.223 327.902 -0.023 4.491 2.074 29.652 7686.000 67.354 0.140 0.415 9.207 4648.011 10.609 -0.004 3778.709 1893.751 -0.048 4.513 2.041 291.910 7685.500 69.004 0.151 0.412 7.020101130.188 10.560 -0.004 60000.000 2901.317 -0.047 4.492 2.046 310.119 7685.000 68.809 0.150 0.411 7.330109508.961 10.424 -0.005 60000.000 2846.619 -0.042 4.538 2.049 376.968 7684.500 68.633 0.149 0.402 7.345116238.453 10.515 -0.005 60000.000 2290.275 -0.051 4.543 2.063 404.972 7684.000 68.008 0.144 0.386 7.682 4182.679 10.515 -0.004 3085.681 1545.842 -0.046 4.484 2.089 438.195 * Odd text such as ``(null)``: .. code-block:: none 8090.00 -999.25 -999.25 -999.25 0 0 0 0 0 0 0 0 8091.000 0.70 337.70 (null) 0 0 0 0 0 0 0 0 8092.000 -999.25 -999.25 -999.25 0 0 0 0 0 0 0 0 Handling run-on errors ---------------------- lasio detects and handles these problems by default using ``lasio.read(f, read_policy='default')``. For example a file with this data section: .. code-block:: none ~A 7686.000 67.354 0.140 0.415 9.207 4648.011 10.609 7685.500 69.004 0.151 0.412 7.020101130.188 10.560 7685.000 68.809 0.150 0.411 7.330-19508.961 10.424 7684.500 68.633 0.149 0.402 7.345116238.453 10.515 7684.000 68.008 0.144 0.386 7.682 4182.679 10.515 is loaded by default as the following: .. code-block:: python >>> import lasio.examples >>> las = lasio.examples.open('null_policy_runon.las') >>> las.data array([[7686.0, 67.354, 0.14, 0.415, 9.207, 4648.011, 10.609], [7685.5, 69.004, 0.151, 0.412, nan, nan, 10.56], [7685.0, 68.809, 0.15, 0.411, 7.33, -19508.961, 10.424], [7684.5, 68.633, 0.149, 0.402, nan, nan, 10.515], [7684.0, 68.008, 0.144, 0.386, 7.682, 4182.679, 10.515]]) Handling invalid data indicators automatically ---------------------------------------------- These are detected by lasio to a degree which you can control with the null_policy keyword argument. You can specify a policy of 'none', 'strict', 'common', 'aggressive', or 'all'. These policies all include a subset of pre-defined substitutions. Or you can give your own list of substitutions. Here is the list of predefined policies and substitutions from :mod:`lasio.defaults`. Policies that you can pick with e.g. ``null_policy='common'``: .. code-block:: python NULL_POLICIES = { 'none': [], 'strict': ['NULL', ], 'common': ['NULL', '(null)', '-', '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND'], 'aggressive': ['NULL', '(null)', '--', '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND', '999', '999.99', '9999', '9999.99' '2147483647', '32767', '-0.0', ], 'all': ['NULL', '(null)', '-', '9999.25', '999.25', 'NA', 'INF', 'IO', 'IND', '999', '999.99', '9999', '9999.99' '2147483647', '32767', '-0.0', 'numbers-only', ], 'numbers-only': ['numbers-only', ] } Or substitutions you could specify with e.g. ``null_policy=['NULL', '999.25', 'INF']``: .. code-block:: python NULL_SUBS = { 'NULL': [None, ], # special case to be handled '999.25': [-999.25, 999.25], '9999.25': [-9999.25, 9999.25], '999.99': [-999.99, 999.99], '9999.99': [-9999.99, 9999.99], '999': [-999, 999], '9999': [-9999, 9999], '2147483647': [-2147483647, 2147483647], '32767': [-32767, 32767], 'NA': [(re.compile(r'(#N/A)[ ]'), ' NaN '), (re.compile(r'[ ](#N/A)'), ' NaN '), ], 'INF': [(re.compile(r'(-?1\.#INF)[ ]'), ' NaN '), (re.compile(r'[ ](-?1\.#INF)'), ' NaN '), ], 'IO': [(re.compile(r'(-?1\.#IO)[ ]'), ' NaN '), (re.compile(r'[ ](-?1\.#IO)'), ' NaN '), ], 'IND': [(re.compile(r'(-?1\.#IND)[ ]'), ' NaN '), (re.compile(r'[ ](-?1\.#IND)'), ' NaN '), ], '-0.0': [(re.compile(r'(-?0\.0+)[ ]'), ' NaN '), (re.compile(r'[ ](-?0\.0+)'), ' NaN '), ], 'numbers-only': [(re.compile(r'([^ 0-9.\-+]+)[ ]'), ' NaN '), (re.compile(r'[ ]([^ 0-9.\-+]+)'), ' NaN '), ], } You can also specify substitutions directly. E.g. for a file with this data section: .. code-block:: none ~A DEPTH DT RHOB NPHI SFLU SFLA ILM ILD 1670.000 9998 2550.000 0.450 123.450 123.450 110.200 105.600 1669.875 9999 2550.000 0.450 123.450 123.450 110.200 105.600 1669.750 10000 ERR 0.450 123.450 -999.25 110.200 105.600 By default, it will read all data as a string due to the presence of "ERR": .. code-block:: python >>> las = lasio.examples.open('null_policy_ERR.las') >>> las.data array([['1670.0', '9998.0', '2550.0', '0.45', '123.45', '123.45', '110.2', '105.6'], ['1669.875', '9999.0', '2550.0', '0.45', '123.45', '123.45', '110.2', '105.6'], ['1669.75', '10000.0', 'ERR', '0.45', '123.45', '-999.25', '110.2', '105.6']], dtype='>> las = lasio.examples.open('null_policy_ERR.las', null_policy=[('ERR', ' NaN ')]) >>> las.data array([[ 1.670000e+03, 9.998000e+03, 2.550000e+03, 4.500000e-01, 1.234500e+02, 1.234500e+02, 1.102000e+02, 1.056000e+02], [ 1.669875e+03, 9.999000e+03, 2.550000e+03, 4.500000e-01, 1.234500e+02, 1.234500e+02, 1.102000e+02, 1.056000e+02], [ 1.669750e+03, 1.000000e+04, nan, 4.500000e-01, 1.234500e+02, -9.992500e+02, 1.102000e+02, 1.056000e+02]]) See ``tests/test_null_policy.py`` (`link `__) for some examples.