Displaying Pandas DataFrames as org-mode tables

The tabulate package includes formatting of pandas dataframes as org-mode tables, except it appears not to do it correctly (-+- separator markup between header row(s) and table body instead of -|-). While that could be relatively easily corrected, tabulate also doesn't do anything in particular for pandas' MultiIndexes either. Here's some code that can display dataframes with simple or MultiIndexes in a relatively pleasing way:

from IPython.display import publish_display_data
from pandas import MultiIndex

def table(df):
    """Format a small :class:`~pandas.DataFrame` as an `org-mode table<https://orgmode.org/manual/Tables.html>`_.

    :param df: input DataFrame
    :type df: :class:`~pandas.DataFrame`
    :returns: org-mode table as IPython display string with 'text/org' MIME type

    """
    def index(idx):
        if isinstance(idx, MultiIndex):
            x = list(idx)
            return [x[0]] +[[' ' if x[i][j] == z else z for j, z in enumerate(y)]
                            for i, y in enumerate(x[1:])]
        else:
            return [[i] for i in idx]

    idx = index(df.index)
    cols = index(df.columns)
    M = df.as_matrix()
    s = '|\n|'.join('|'.join(' ' for _ in range(len(idx[0]))) + '|' + \
                          '|'.join(c[i] for c in cols) for i in range(len(cols[0]))) + \
        '|\n|' + '|'.join('-' for _ in range(len(idx[0]) + len(M[0]))) + '|\n|' + \
        '|\n|'.join('|'.join(str(i) for j in z for i in j) for z in zip(idx, M))
    return publish_display_data({'text/org': '|' + s + '|'})

You can download this as a file from my github repo. The call to IPython's publish_display_data attaches the MIME type 'text/org' to the returned string so that ob-ipython displays it as is. The result is a regular org-mode table and not a table.el one, so the index/header cells of MultiIndexes aren't merged according to the hierarchy but spread out over as many cells as necessary (to remain maximally compatible with org mode):

#+begin_src ipython :results raw :session
  import pandas as pd
  from pandas2org import table

  idx = pd.MultiIndex.from_product((['one'], ['two', 'three']))
  table(pd.DataFrame([[1, 2], [3, 4]], index=idx, columns=idx))
#+end_src

#+RESULTS:
|     |       | one |       |
|     |       | two | three |
|-----+-------+-----+-------|
| one | two   |   1 |     2 |
|     | three |   3 |     4 |

© 2018. All rights reserved.

Powered by Hydejack v8.0.0