Legate Pandas API Reference¶

DataFrame¶

class legate.pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False, frame=None)¶

Attributes

at: Access a single value for a row/column label pair.
axes: Return a list representing the axes of the DataFrame.
columns: The column labels of the DataFrame.
dtypes: Return the dtypes in the DataFrame.
empty: Indicator whether DataFrame is empty.
iat: Access a single value for a row/column pair by integer position.
iloc: Purely integer-location based indexing for selection by position.
index: The index (row labels) of the DataFrame.
loc: Access a group of rows and columns by label(s) or a boolean array.
ndim: Return an int representing the number of axes / array dimensions.
shape: Return a tuple representing the dimensionality of the DataFrame.
size: Return an int representing the number of elements in this object.

Methods

`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator add).
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`all`([axis, bool_only, skipna, level])	Return whether all elements are True, potentially over an axis.
`any`([axis, bool_only, skipna, level])	Return whether any element is True, potentially over an axis.
`append`(other[, ignore_index, …])	Append rows of other to the end of caller, returning a new object.
`astype`(dtype[, copy, errors])	Cast a pandas object to a specified dtype `dtype`.
`bool`()	Return the bool of a single element Series or DataFrame.
`count`([axis, level, numeric_only])	Count non-NA cells for each column or row.
`cummax`([axis, skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
`div`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator truediv).
`divide`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator truediv).
`drop`([labels, axis, index, columns, level, …])	Drop specified labels from rows or columns.
`droplevel`(level[, axis])	Return DataFrame with requested index / column level(s) removed.
`dropna`([axis, how, thresh, subset, inplace])	Remove missing values.
`eq`(other[, axis, level, fill_value])	Get Equal to of dataframe and other, element-wise (binary operator eq).
`equals`(other)	Test whether two objects contain the same elements.
`fillna`([value, method, axis, inplace, …])	Fill NA/NaN values using the specified method.
`floordiv`(other[, axis, level, fill_value])	Get Integer division of dataframe and other, element-wise (binary operator floordiv).
`ge`(other[, axis, level, fill_value])	Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
`get`(key[, default])	Get item from object for given key (ex: DataFrame column).
`groupby`([by, axis, level, as_index, sort])	Group DataFrame using a mapper or by a Series of columns.
`gt`(other[, axis, level, fill_value])	Get Greater than of dataframe and other, element-wise (binary operator gt).
`head`([n])	Return the first n rows.
`insert`(loc, column, value[, allow_duplicates])	Insert column into DataFrame at specified location.
`isna`()	Detect missing values.
`isnull`()	Detect missing values.
`join`(other[, on, how, lsuffix, rsuffix, sort])	Join columns of another DataFrame.
`keys`()	Get the ‘info axis’ (see Indexing for more).
`le`(other[, axis, level, fill_value])	Get Less than or equal to of dataframe and other, element-wise (binary operator le).
`lt`(other[, axis, level, fill_value])	Get Less than of dataframe and other, element-wise (binary operator lt).
`mask`(cond[, other, inplace, axis, level, …])	Replace values where the condition is True.
`max`([axis, skipna, level, numeric_only])	Return the maximum of the values over the requested axis.
`mean`([axis, skipna, level, numeric_only])	Return the mean of the values over the requested axis.
`merge`(right[, how, on, left_on, right_on, …])	Merge DataFrame or named Series objects with a database-style join.
`min`([axis, skipna, level, numeric_only])	Return the minimum of the values over the requested axis.
`mod`(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator mod).
`mul`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator mul).
`multiply`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator mul).
`ne`(other[, axis, level, fill_value])	Get Not equal to of dataframe and other, element-wise (binary operator ne).
`notna`()	Detect existing (non-missing) values.
`notnull`()	Detect existing (non-missing) values.
`pow`(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator pow).
`prod`([axis, skipna, level, numeric_only, …])	Return the product of the values over the requested axis.
`product`([axis, skipna, level, numeric_only, …])	Return the product of the values over the requested axis.
`query`(expr[, inplace])	Query the columns of a DataFrame with a boolean expression.
`radd`(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator add).
`rdiv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
`rename`([mapper, index, columns, axis, copy, …])	Alter axes labels.
`reset_index`([level, drop, inplace, …])	Reset the index, or a level of it.
`rfloordiv`(other[, axis, level, fill_value])	Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
`rmod`(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator rmod).
`rmul`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator mul).
`rpow`(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator rpow).
`rsub`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator rsub).
`rtruediv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
`set_axis`(labels[, axis, inplace])	Assign desired index to given axis.
`set_index`(keys[, drop, append, inplace, …])	Set the DataFrame index using existing columns.
`sort_index`([axis, level, ascending, …])	Sort object by labels (along an axis).
`sort_values`(by[, axis, ascending, inplace, …])	Sort by the values along either axis.
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`std`([axis, skipna, level, ddof, numeric_only])	Return sample standard deviation over requested axis.
`sub`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator sub).
`subtract`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator sub).
`sum`([axis, skipna, level, numeric_only, …])	Return the sum of the values over the requested axis.
`tail`([n])	Return the last n rows.
`to_csv`([path_or_buf, sep, na_rep, columns, …])	Write object to a comma-separated values (csv) file.
`to_pandas`([schema_only])	Convert distributed DataFrame into a Pandas DataFrame
`to_parquet`(path[, engine, compression, …])	Write a DataFrame to the binary parquet format.
`truediv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator truediv).
`var`([axis, skipna, level, ddof, numeric_only])	Return unbiased variance over requested axis.
`where`(cond[, other, inplace, axis, level, …])	Replace values where the condition is False.

copy

abs()¶

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns

abs: Series/DataFrame containing the absolute value of each element.

See also

numpy.absolute: Calculate the absolute value element-wise.

Notes

For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

add(other, axis='columns', level=None, fill_value=None)¶

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

add_prefix(prefix)¶

Prefix labels with string prefix.

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

Parameters

prefixstr: The string to add before each label.

Returns

Series or DataFrame: New Series or DataFrame with updated labels.

See also

Series.add_suffix: Suffix row labels with string suffix.
DataFrame.add_suffix: Suffix column labels with string suffix.

add_suffix(suffix)¶

Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

Parameters

suffixstr: The string to add after each label.

Returns

Series or DataFrame: New Series or DataFrame with updated labels.

See also

Series.add_prefix: Prefix row labels with string prefix.
DataFrame.add_prefix: Prefix column labels with string prefix.

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

Series.all: Return True if all elements are True.
DataFrame.any: Return True if one (or more) elements are True.

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

numpy.any: Numpy version of this method.
Series.any: Return whether any element is True.
Series.all: Return whether all elements are True.
DataFrame.any: Return whether any element is True over requested axis.
DataFrame.all: Return whether all elements are True over requested axis.

append(other, ignore_index=False, verify_integrity=False, sort=False)¶

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

Parameters

otherDataFrame or Series/dict-like object, or list of these: The data to append.
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.
verify_integritybool, default False: If True, raise ValueError on creating index with duplicates.
sortbool, default False: Sort columns if the columns of self and other are not aligned.

Changed in version 1.0.0: Changed to not sort by default.

Returns

DataFrame

See also

concat: General function to concatenate DataFrame or Series objects.

Notes

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

astype(dtype, copy=True, errors='raise')¶

Cast a pandas object to a specified dtype dtype.

Parameters

dtypedata type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copybool, default True

Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

errors{‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object.

Returns

castedsame type as caller

See also

to_datetime: Convert argument to datetime.
to_timedelta: Convert argument to timedelta.
to_numeric: Convert argument to a numeric type.
numpy.ndarray.astype: Cast a numpy array to a specified type.

property at¶

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises

KeyError: If ‘label’ does not exist in DataFrame.

See also

DataFrame.iat: Access a single value for a row/column pair by integer position.
DataFrame.loc: Access a group of rows and columns by label(s).
Series.at: Access a single value using a label.

property axes¶

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

bool()¶

Return the bool of a single element Series or DataFrame.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns

bool: The value in the Series or DataFrame.

See also

Series.astype: Change the data type of a Series, including to boolean.
DataFrame.astype: Change the data type of a DataFrame, including to boolean.
numpy.bool_: NumPy boolean data type, used by pandas for boolean values.

property columns¶: The column labels of the DataFrame.

count(axis=0, level=None, numeric_only=False)¶

Count non-NA cells for each column or row.

The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
levelint or str, optional: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.
numeric_onlybool, default False: Include only float, int or boolean data.

Returns

Series or DataFrame: For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.

See also

Series.count: Number of non-NA elements in a Series.
DataFrame.value_counts: Count unique combinations of columns.
DataFrame.shape: Number of DataFrame rows and columns (including NA elements).
DataFrame.isna: Boolean same-sized DataFrame showing places of NA elements.

cummax(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative maximum of Series or DataFrame.

See also

core.window.Expanding.max: Similar functionality but ignores NaN values.
DataFrame.max: Return the maximum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cummin(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative minimum of Series or DataFrame.

See also

core.window.Expanding.min: Similar functionality but ignores NaN values.
DataFrame.min: Return the minimum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cumprod(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative product of Series or DataFrame.

See also

core.window.Expanding.prod: Similar functionality but ignores NaN values.
DataFrame.prod: Return the product over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cumsum(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative sum of Series or DataFrame.

See also

core.window.Expanding.sum: Similar functionality but ignores NaN values.
DataFrame.sum: Return the sum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

div(other, axis='columns', level=None, fill_value=None)¶

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

divide(other, axis='columns', level=None, fill_value=None)¶

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')¶

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters

labelssingle label or list-like: Index or column labels to drop.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
indexsingle label or list-like: Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
columnssingle label or list-like: Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
levelint or level name, optional: For MultiIndex, level from which the labels will be removed.
inplacebool, default False: If False, return a copy. Otherwise, do operation inplace and return None.
errors{‘ignore’, ‘raise’}, default ‘raise’: If ‘ignore’, suppress error and only existing labels are dropped.

Returns

DataFrame or None: DataFrame without the removed index or column labels or None if inplace=True.

Raises

KeyError: If any of the labels is not found in the selected axis.

See also

DataFrame.loc: Label-location based indexer for selection by label.
DataFrame.dropna: Return DataFrame with labels on given axis omitted where (all or any) data are missing.
DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed, optionally only considering certain columns.
Series.drop: Return Series with specified index labels removed.

droplevel(level, axis=0)¶

Return DataFrame with requested index / column level(s) removed.

New in version 0.24.0.

Parameters

levelint, str, or list-like

If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the level(s) is removed:

0 or ‘index’: remove level(s) in column.
1 or ‘columns’: remove level(s) in row.

Returns

DataFrame: DataFrame with requested index / column level(s) removed.

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)¶

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.

Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

threshint, optional

Require that many non-NA values.

subsetarray-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False

If True, do operation inplace and return None.

Returns

DataFrame or None: DataFrame with NA entries dropped from it or None if inplace=True.

See also

DataFrame.isna: Indicate missing values.
DataFrame.notna: Indicate existing (non-missing) values.
DataFrame.fillna: Replace missing values.
Series.dropna: Drop missing values.
Index.dropna: Drop missing indices.

property dtypes¶

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

Returns

pandas.Series: The data type of each column.

property empty¶

Indicator whether DataFrame is empty.

True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

Returns

bool: If DataFrame is empty, return True, if not return False.

See also

Series.dropna: Return series without null values.
DataFrame.dropna: Return DataFrame with labels on given axis omitted where (all or any) data are missing.

Notes

If DataFrame contains only NaNs, it is still not considered empty. See the example below.

eq(other, axis='columns', level=None, fill_value=None)¶

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

equals(other)¶

Test whether two objects contain the same elements.

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.

Parameters

otherSeries or DataFrame: The other Series or DataFrame to be compared with the first.

Returns

bool: True if all elements are the same in both objects, False otherwise.

See also

Series.eq: Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.
DataFrame.eq: Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
testing.assert_series_equal: Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal: Like assert_series_equal, but targets DataFrames.
numpy.array_equal: Return True if two arrays have the same shape and elements, False otherwise.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)¶

Fill NA/NaN values using the specified method.

Parameters

valuescalar, dict, Series, or DataFrame: Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None: Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
axis{0 or ‘index’, 1 or ‘columns’}: Axis along which to fill missing values.
inplacebool, default False: If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
limitint, default None: If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcastdict, default is None: A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Returns

DataFrame or None: Object with missing values filled or None if inplace=True.

See also

interpolate: Fill NaN values using interpolation.
reindex: Conform object to new index.
asfreq: Convert TimeSeries to specified frequency.

floordiv(other, axis='columns', level=None, fill_value=None)¶

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

ge(other, axis='columns', level=None, fill_value=None)¶

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

get(key, default=None)¶

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters

keyobject

Returns

valuesame type as items contained in object

groupby(by=None, axis=0, level=None, as_index=True, sort=False, **kwargs)¶

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters

bymapping, function, label, or list of labels: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Split along rows (0) or columns (1).
levelint, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_indexbool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sortbool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keysbool, default True: When calling apply, add group keys to index to identify pieces.
squeezebool, default False: Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Deprecated since version 1.1.0.
observedbool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
dropnabool, default True: If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups

New in version 1.1.0.

Returns

DataFrameGroupBy: Returns a groupby object that contains information about the groups.

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

gt(other, axis='columns', level=None, fill_value=None)¶

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

head(n=5)¶

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters

nint, default 5: Number of rows to select.

Returns

same type as caller: The first n rows of the caller object.

See also

DataFrame.tail: Returns the last n rows.

property iat¶

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises

IndexError: When integer position is out of bounds.

See also

DataFrame.at: Access a single value for a row/column label pair.
DataFrame.loc: Access a group of rows and columns by label(s).
DataFrame.iloc: Access a group of rows and columns by integer position(s).

property iloc¶

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
A boolean array.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

See also

DataFrame.iat: Fast integer location scalar accessor.
DataFrame.loc: Purely label-location based indexer for selection by label.
Series.iloc: Purely integer-location based indexing for selection by position.

property index¶: The index (row labels) of the DataFrame.

insert(loc, column, value, allow_duplicates=False)¶

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters

locint: Insertion index. Must verify 0 <= loc <= len(columns).
columnstr, number, or hashable object: Label of the inserted column.
valueint, Series, or array-like
allow_duplicatesbool, optional

isna()¶

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

isnull()¶

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, **kwargs)¶

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters

otherDataFrame, Series, or list of DataFrame

Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.

onstr, list of str, or array-like, optional

Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.

how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’

How to handle the operation of the two objects.

left: use calling frame’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.
inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.

lsuffixstr, default ‘’

Suffix to use from left frame’s overlapping columns.

rsuffixstr, default ‘’

Suffix to use from right frame’s overlapping columns.

sortbool, default False

Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).

Returns

DataFrame: A dataframe containing columns from both the caller and other.

See also

DataFrame.merge: For column(s)-on-column(s) operations.

Notes

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Support for specifying index levels as the on parameter was added in version 0.23.0.

keys()¶

Get the ‘info axis’ (see Indexing for more).

This is index for Series, columns for DataFrame.

Returns

Index: Info axis.

le(other, axis='columns', level=None, fill_value=None)¶

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

property loc¶

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
A list or array of labels, e.g. ['a', 'b', 'c'].
A slice object with labels, e.g. 'a':'f'.

Warning

Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g. [True, False, True].
An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises

KeyError: If any items are not found.
IndexingError: If an indexed key is passed and its index is unalignable to the frame index.

See also

DataFrame.at: Access a single value for a row/column label pair.
DataFrame.iloc: Access group of rows and columns by integer position(s).
DataFrame.xs: Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
Series.loc: Access group of values using labels.

lt(other, axis='columns', level=None, fill_value=None)¶

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

mask(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶

Replace values where the condition is True.

Parameters

condbool Series/DataFrame, array-like, or callable

Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.where(): Return an object of same shape as self.

Notes

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This isthe equivalent of the numpy.ndarray method argmax.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the mean of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, **kwargs)¶

Merge DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Parameters

rightDataFrame or named Series

Object to merge with.

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Type of merge to be performed.

left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
cross: creates the cartesian product from both frames, preserves the order of the left keys.

New in version 1.2.0.

onlabel or list

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

left_onlabel or list, or array-like

Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.

right_onlabel or list, or array-like

Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

left_indexbool, default False

Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.

right_indexbool, default False

Use the index from the right DataFrame as the join key. Same caveats as left_index.

sortbool, default False

Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).

suffixeslist-like, default is (“_x”, “_y”)

A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

copybool, default True

If False, avoid copy if possible.

indicatorbool or str, default False

If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.

validatestr, optional

If specified, checks if merge is of specified type.

“one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
“one_to_many” or “1:m”: check if merge keys are unique in left dataset.
“many_to_one” or “m:1”: check if merge keys are unique in right dataset.
“many_to_many” or “m:m”: allowed, but does not result in checks.

Returns

DataFrame: A DataFrame of the two merged objects.

See also

merge_ordered: Merge with optional filling/interpolation.
merge_asof: Merge on nearest keys.
DataFrame.join: Similar method using indices.

Notes

Support for specifying index levels as the on, left_on, and right_on parameters was added in version 0.23.0 Support for merging named Series objects was added in version 0.24.0

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This isthe equivalent of the numpy.ndarray method argmin.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

mod(other, axis='columns', level=None, fill_value=None)¶

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

mul(other, axis='columns', level=None, fill_value=None)¶

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

multiply(other, axis='columns', level=None, fill_value=None)¶

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

property ndim¶

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

See also

ndarray.ndim: Number of array dimensions.

ne(other, axis='columns', level=None, fill_value=None)¶

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

DataFrame of bool: Result of the comparison.

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

notna()¶

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

notnull()¶

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

pow(other, axis='columns', level=None, fill_value=None)¶

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

query(expr, inplace=False, **kwargs)¶

Query the columns of a DataFrame with a boolean expression.

Parameters

exprstr

The query string to evaluate.

You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2) would be referenced as Area (cm^2)). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

New in version 0.25.0: Backtick quoting introduced.

New in version 1.0.0: Expanding functionality of backtick quoting for more than only spaces.

inplacebool

Whether the query should modify the data in place or return a modified copy.

**kwargs

See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

Returns

DataFrame or None: DataFrame resulting from the provided query expression or None if inplace=True.

See also

eval: Evaluate a string describing operations on DataFrame columns.
DataFrame.eval: Evaluate a string describing operations on DataFrame columns.

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing.

radd(other, axis='columns', level=None, fill_value=None)¶

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rdiv(other, axis='columns', level=None, fill_value=None)¶

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')¶

Alter axes labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

Parameters

mapperdict-like or function: Dict-like or function transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.
indexdict-like or function: Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).
columnsdict-like or function: Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.
copybool, default True: Also copy underlying data.
inplacebool, default False: Whether to return a new DataFrame. If True then value of copy is ignored.
levelint or level name, default None: In case of a MultiIndex, only rename labels in the specified level.
errors{‘ignore’, ‘raise’}, default ‘ignore’: If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns

DataFrame or None: DataFrame with the renamed axis labels or None if inplace=True.

Raises

KeyError: If any of the labels is not found in the selected axis and “errors=’raise’”.

See also

DataFrame.rename_axis: Set the name of the axis.

reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters

levelint, str, tuple, or list, default None: Only remove the given levels from the index. Removes all levels by default.
dropbool, default False: Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplacebool, default False: Modify the DataFrame in place (do not create a new object).
col_levelint or str, default 0: If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fillobject, default ‘’: If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

Returns

DataFrame or None: DataFrame with the new index or None if inplace=True.

See also

DataFrame.set_index: Opposite of reset_index.
DataFrame.reindex: Change to new indices or expand indices.
DataFrame.reindex_like: Change to same indices as other DataFrame.

rfloordiv(other, axis='columns', level=None, fill_value=None)¶

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rmod(other, axis='columns', level=None, fill_value=None)¶

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rmul(other, axis='columns', level=None, fill_value=None)¶

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rpow(other, axis='columns', level=None, fill_value=None)¶

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rsub(other, axis='columns', level=None, fill_value=None)¶

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

rtruediv(other, axis='columns', level=None, fill_value=None)¶

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

set_axis(labels, axis=0, inplace=False)¶

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters

labelslist-like, Index: The values for the new index.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
inplacebool, default False: Whether to return a new DataFrame instance.

Returns

renamedDataFrame or None: An object of type DataFrame or None if inplace=True.

See also

DataFrame.rename_axis: Alter the name of the index or columns.

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters

keyslabel or array-like or list of labels/arrays: This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.
dropbool, default True: Delete columns to be used as the new index.
appendbool, default False: Whether to append columns to existing index.
inplacebool, default False: If True, modifies the DataFrame in place (do not create a new object).
verify_integritybool, default False: Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

Returns

DataFrame or None: Changed row labels or None if inplace=True.

See also

DataFrame.reset_index: Opposite of set_index.
DataFrame.reindex: Change to new indices or expand indices.
DataFrame.reindex_like: Change to same indices as other DataFrame.

property shape¶

Return a tuple representing the dimensionality of the DataFrame.

See also

ndarray.shape: Tuple of array dimensions.

property size¶

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

See also

ndarray.size: Number of elements in the array.

sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
levelint or level name or list of ints or list of level names: If not None, sort on values in specified index level(s).
ascendingbool or list-like of bools, default True: Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
inplacebool, default False: If True, perform operation in-place.
kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’: Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position{‘first’, ‘last’}, default ‘last’: Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
sort_remainingbool, default True: If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.
keycallable, optional: If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape. For MultiIndex inputs, the key is applied per level.

New in version 1.1.0.

Returns

DataFrame or None: The original DataFrame sorted by the labels or None if inplace=True.

See also

Series.sort_index: Sort Series by the index.
DataFrame.sort_values: Sort DataFrame by the value.
Series.sort_values: Sort Series by the value.

sort_values(by, axis=0, ascending=True, inplace: legate.pandas.frontend.frame.Frame.bool = False, kind='quicksort', na_position='last', ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶

Sort by the values along either axis.

Parameters

bystr or list of str

Name or list of names to sort by.

if axis is 0 or ‘index’ then by may contain index levels and/or column labels.
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis to be sorted.

ascendingbool or list of bool, default True

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplacebool, default False

If True, perform operation in-place.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

na_position{‘first’, ‘last’}, default ‘last’

Puts NaNs at the beginning if first; last puts NaNs at the end.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.

keycallable, optional

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

New in version 1.1.0.

Returns

DataFrame or None: DataFrame with sorted values or None if inplace=True.

See also

DataFrame.sort_index: Sort a DataFrame by the index.
Series.sort_values: Similar method for a Series.

squeeze(axis=None)¶

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default None: A specific axis to squeeze. By default, all length-1 axes are squeezed.

Returns

DataFrame, Series, or scalar: The projection after squeezing axis or all the axes.

See also

Series.iloc: Integer-location based indexing for selecting scalars.
DataFrame.iloc: Integer-location based indexing for selecting Series.
Series.to_frame: Inverse of DataFrame.squeeze for a single-column DataFrame.

std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

sub(other, axis='columns', level=None, fill_value=None)¶

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

subtract(other, axis='columns', level=None, fill_value=None)¶

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

tail(n=5)¶

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].

Parameters

nint, default 5: Number of rows to select.

Returns

type of caller: The last n rows of the caller object.

See also

DataFrame.head: The first n rows of the caller object.

to_csv(path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator=None, chunksize=None, partition=False)¶

Write object to a comma-separated values (csv) file.

Changed in version 0.24.0: The order of arguments for Series was changed.

Parameters

path_or_bufstr or file handle, default None: File path or object, if None is provided the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.

Changed in version 0.24.0: Was previously named “path” for Series.

Changed in version 1.2.0: Support for binary file objects was introduced.
sepstr, default ‘,’: String of length 1. Field delimiter for the output file.
na_repstr, default ‘’: Missing data representation.
float_formatstr, default None: Format string for floating point numbers.
columnssequence, optional: Columns to write.
headerbool or list of str, default True: Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

Changed in version 0.24.0: Previously defaulted to False for Series.
indexbool, default True: Write row names (index).
index_labelstr or sequence, or False, default None: Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
modestr: Python write mode, default ‘w’.
encodingstr, optional: A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compressionstr or dict, default ‘infer’: If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.

Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.

Changed in version 1.2.0: Compression is supported for binary file objects.

Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.
quotingoptional constant from csv module: Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotecharstr, default ‘"’: String of length 1. Character used to quote fields.
line_terminatorstr, optional: The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).

Changed in version 0.24.0.
chunksizeint or None: Rows to write at a time.
date_formatstr, default None: Format string for datetime objects.
doublequotebool, default True: Control quoting of quotechar inside a field.
escapecharstr, default None: String of length 1. Character used to escape sep and quotechar when appropriate.
decimalstr, default ‘.’: Character recognized as decimal separator. E.g. use ‘,’ for European data.
errorsstr, default ‘strict’: Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

New in version 1.1.0.
storage_optionsdict, optional: Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

Returns

None or str: If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

See also

read_csv: Load a CSV file into a DataFrame.
to_excel: Write DataFrame to an Excel file.

to_pandas(schema_only=False)¶

Convert distributed DataFrame into a Pandas DataFrame

Parameters

schema_onlyDoesn’t convert the data when True

Returns

outpandas.DataFrame

to_parquet(path, engine='auto', compression='snappy', index=None, partition_cols=None, **kwargs)¶

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.

Parameters

pathstr or file-like object, default None

If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned.

Changed in version 1.2.0.

Previously this was “fname”

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

indexbool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to True the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

New in version 0.24.0.

partition_colslist, optional, default None

Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.

New in version 0.24.0.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

**kwargs

Additional arguments passed to the parquet library. See pandas io for more details.

Returns

bytes if no path argument is provided else None

See also

read_parquet: Read a parquet file.
DataFrame.to_csv: Write a csv file.
DataFrame.to_sql: Write to a sql table.
DataFrame.to_hdf: Write to hdf.

Notes

This function requires either the fastparquet or pyarrow library.

truediv(other, axis='columns', level=None, fill_value=None)¶

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame: Any single or multiple element data structure, or list-like object.
axis{0 or ‘index’, 1 or ‘columns’}: Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
levelint or label: Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_valuefloat or None, default None: Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns

DataFrame: Result of the arithmetic operation.

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

where(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶

Replace values where the condition is False.

Parameters

condbool Series/DataFrame, array-like, or callable

Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.mask(): Return an object of same shape as self.

Notes

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

Series¶

class legate.pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, frame=None)¶

One-dimensional distributed array with axis labels.

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as null/NaN).

Operations between Series (+, -, /, *, **) align values based on their associated index values-– they need not be the same length. The result index will be the sorted union of the two indexes.

Series objects are used as columns of DataFrame.

Parameters

dataarray-like, Iterable, dict, or scalar value: Contains data stored in Series.
indexarray-like or Index (1d): Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtypestr, numpy.dtype, or ExtensionDtype, optional: Data type for the output Series. If not specified, this will be inferred from data.
namestr, optional: The name to give to the Series.
nan_as_nullbool, Default True: If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.
frameTable: Storage manager object used for internal purposes only

Attributes

at: Access a single value for a row/column label pair.
axes: Return a list of the row axis labels.
cat: Accessor object for categorical properties of the Series values.
dt: Accessor object for datetimelike properties of the Series values.
dtype: Return the dtype object of the underlying data.
dtypes: Return the dtype object of the underlying data.
empty: Indicator whether DataFrame is empty.
hasnans: Return if I have any nans; enables various perf speedups.
iat: Access a single value for a row/column pair by integer position.
iloc: Purely integer-location based indexing for selection by position.
index: The index (row labels) of the DataFrame.
loc: Access a group of rows and columns by label(s) or a boolean array.
name: Return the name of the Series.
ndim: Number of dimensions of the underlying data, by definition 1.
shape: Return a tuple of the shape of the underlying data.
size: Return an int representing the number of elements in this object.
str: Vectorized string functions for Series and Index.
values: Return Series as ndarray or ndarray-like depending on the dtype.

Methods

`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator add).
`all`([axis, bool_only, skipna, level])	Return whether all elements are True, potentially over an axis.
`any`([axis, bool_only, skipna, level])	Return whether any element is True, potentially over an axis.
`append`(other[, ignore_index, …])	Append rows of other to the end of caller, returning a new object.
`astype`(dtype[, copy, errors])	Cast a pandas object to a specified dtype `dtype`.
`bool`()	Return the bool of a single element Series or DataFrame.
`count`([level])	Group Series using a mapper or by a Series of columns.
`cummax`([axis, skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
`div`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`divide`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`drop`([labels, axis, index, columns, level, …])	Drop specified labels from rows or columns.
`droplevel`(level[, axis])	Return DataFrame with requested index / column level(s) removed.
`dropna`([axis, how, thresh, subset, inplace])	Remove missing values.
`eq`(other[, level, fill_value, axis])	Return Equal to of series and other, element-wise (binary operator eq).
`equals`(other)	Test whether two objects contain the same elements.
`fillna`([value, method, axis, inplace, …])	Fill NA/NaN values using the specified method.
`floordiv`(other[, level, fill_value, axis])	Return Integer division of series and other, element-wise (binary operator floordiv).
`ge`(other[, level, fill_value, axis])	Return Greater than or equal to of series and other, element-wise (binary operator ge).
`get`(key[, default])	Get item from object for given key (ex: DataFrame column).
`groupby`([by, axis, level, sort])	Group Series using a mapper or by a Series of columns.
`gt`(other[, level, fill_value, axis])	Return Greater than of series and other, element-wise (binary operator gt).
`head`([n])	Return the first n rows.
`isna`()	Detect missing values.
`isnull`()	Detect missing values.
`le`(other[, level, fill_value, axis])	Return Less than or equal to of series and other, element-wise (binary operator le).
`lt`(other[, level, fill_value, axis])	Return Less than of series and other, element-wise (binary operator lt).
`mask`(cond[, other, inplace, axis, level, …])	Replace values where the condition is True.
`max`([axis, skipna, level, numeric_only])	Return the maximum of the values over the requested axis.
`mean`([axis, skipna, level, numeric_only])	Return the mean of the values over the requested axis.
`min`([axis, skipna, level, numeric_only])	Return the minimum of the values over the requested axis.
`mod`(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator mod).
`mul`(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator mul).
`multiply`(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator mul).
`ne`(other[, level, fill_value, axis])	Return Not equal to of series and other, element-wise (binary operator ne).
`notna`()	Detect existing (non-missing) values.
`notnull`()	Detect existing (non-missing) values.
`pow`(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator pow).
`prod`([axis, skipna, level, numeric_only, …])	Return the product of the values over the requested axis.
`product`([axis, skipna, level, numeric_only, …])	Return the product of the values over the requested axis.
`radd`(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator add).
`rdiv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
`reset_index`([level, drop, name, inplace])	Generate a new DataFrame or Series with the index reset.
`rfloordiv`(other[, level, fill_value, axis])	Return Integer division of series and other, element-wise (binary operator rfloordiv).
`rmod`(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator rmod).
`rmul`(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator mul).
`rpow`(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator rpow).
`rsub`(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator rsub).
`rtruediv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
`set_axis`(labels[, axis, inplace])	Assign desired index to given axis.
`sort_index`([axis, level, ascending, …])	Sort object by labels (along an axis).
`sort_values`([axis, ascending, inplace, …])	Sort by the values.
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`std`([axis, skipna, level, ddof, numeric_only])	Return sample standard deviation over requested axis.
`sub`(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator sub).
`subtract`(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator sub).
`sum`([axis, skipna, level, numeric_only, …])	Return the sum of the values over the requested axis.
`tail`([n])	Return the last n rows.
`to_csv`([path_or_buf, sep, na_rep, columns, …])	Write object to a comma-separated values (csv) file.
`to_frame`([name])	Convert Series to DataFrame.
`to_numpy`([dtype, copy, na_value])	A NumPy ndarray representing the values in this Series or Index.
`to_pandas`([schema_only])	Convert distributed Series into a Pandas Series
`truediv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`var`([axis, skipna, level, ddof, numeric_only])	Return unbiased variance over requested axis.
`where`(cond[, other, inplace, axis, level, …])	Replace values where the condition is False.

copy

abs()¶

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns

abs: Series/DataFrame containing the absolute value of each element.

See also

numpy.absolute: Calculate the absolute value element-wise.

Notes

For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

add(other, level=None, fill_value=None, axis=0)¶

Return Addition of series and other, element-wise (binary operator add).

Equivalent to series + other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.radd: Reverse of the Addition operator, see Python documentation for more details.

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

Series.all: Return True if all elements are True.
DataFrame.any: Return True if one (or more) elements are True.

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.

bool_onlybool, default None

Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.

skipnabool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

**kwargsany, default None

Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

numpy.any: Numpy version of this method.
Series.any: Return whether any element is True.
Series.all: Return whether all elements are True.
DataFrame.any: Return whether any element is True over requested axis.
DataFrame.all: Return whether all elements are True over requested axis.

append(other, ignore_index=False, verify_integrity=False, sort=False)¶

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

Parameters

otherDataFrame or Series/dict-like object, or list of these: The data to append.
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.
verify_integritybool, default False: If True, raise ValueError on creating index with duplicates.
sortbool, default False: Sort columns if the columns of self and other are not aligned.

Changed in version 1.0.0: Changed to not sort by default.

Returns

DataFrame

See also

concat: General function to concatenate DataFrame or Series objects.

Notes

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

astype(dtype, copy=True, errors='raise')¶

Cast a pandas object to a specified dtype dtype.

Parameters

dtypedata type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copybool, default True

Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

errors{‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object.

Returns

castedsame type as caller

See also

to_datetime: Convert argument to datetime.
to_timedelta: Convert argument to timedelta.
to_numeric: Convert argument to a numeric type.
numpy.ndarray.astype: Cast a numpy array to a specified type.

property at¶

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises

KeyError: If ‘label’ does not exist in DataFrame.

See also

DataFrame.iat: Access a single value for a row/column pair by integer position.
DataFrame.loc: Access a group of rows and columns by label(s).
Series.at: Access a single value using a label.

property axes¶: Return a list of the row axis labels.

bool()¶

Return the bool of a single element Series or DataFrame.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns

bool: The value in the Series or DataFrame.

See also

Series.astype: Change the data type of a Series, including to boolean.
DataFrame.astype: Change the data type of a DataFrame, including to boolean.
numpy.bool_: NumPy boolean data type, used by pandas for boolean values.

property cat¶

Accessor object for categorical properties of the Series values.

Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default (but can be called with inplace=True).

Parameters

dataSeries or CategoricalIndex

count(level=None)¶

Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters

bymapping, function, label, or list of labels: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Split along rows (0) or columns (1).
levelint, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_indexbool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sortbool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keysbool, default True: When calling apply, add group keys to index to identify pieces.
squeezebool, default False: Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Deprecated since version 1.1.0.
observedbool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
dropnabool, default True: If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups

New in version 1.1.0.

Returns

SeriesGroupBy: Returns a groupby object that contains information about the groups.

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

cummax(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative maximum of Series or DataFrame.

See also

core.window.Expanding.max: Similar functionality but ignores NaN values.
DataFrame.max: Return the maximum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cummin(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative minimum of Series or DataFrame.

See also

core.window.Expanding.min: Similar functionality but ignores NaN values.
DataFrame.min: Return the minimum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cumprod(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative product of Series or DataFrame.

See also

core.window.Expanding.prod: Similar functionality but ignores NaN values.
DataFrame.prod: Return the product over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

cumsum(axis=None, skipna=True, *args, **kwargs)¶

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The index or the name of the axis. 0 is equivalent to None or ‘index’.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args, **kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns

Series or DataFrame: Return cumulative sum of Series or DataFrame.

See also

core.window.Expanding.sum: Similar functionality but ignores NaN values.
DataFrame.sum: Return the sum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

div(other, level=None, fill_value=None, axis=0)¶

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

divide(other, level=None, fill_value=None, axis=0)¶

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')¶

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters

labelssingle label or list-like: Index or column labels to drop.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
indexsingle label or list-like: Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
columnssingle label or list-like: Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
levelint or level name, optional: For MultiIndex, level from which the labels will be removed.
inplacebool, default False: If False, return a copy. Otherwise, do operation inplace and return None.
errors{‘ignore’, ‘raise’}, default ‘raise’: If ‘ignore’, suppress error and only existing labels are dropped.

Returns

DataFrame or None: DataFrame without the removed index or column labels or None if inplace=True.

Raises

KeyError: If any of the labels is not found in the selected axis.

See also

DataFrame.loc: Label-location based indexer for selection by label.
DataFrame.dropna: Return DataFrame with labels on given axis omitted where (all or any) data are missing.
DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed, optionally only considering certain columns.
Series.drop: Return Series with specified index labels removed.

droplevel(level, axis=0)¶

Return DataFrame with requested index / column level(s) removed.

New in version 0.24.0.

Parameters

levelint, str, or list-like

If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the level(s) is removed:

0 or ‘index’: remove level(s) in column.
1 or ‘columns’: remove level(s) in row.

Returns

DataFrame: DataFrame with requested index / column level(s) removed.

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)¶

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.

Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

threshint, optional

Require that many non-NA values.

subsetarray-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False

If True, do operation inplace and return None.

Returns

DataFrame or None: DataFrame with NA entries dropped from it or None if inplace=True.

See also

DataFrame.isna: Indicate missing values.
DataFrame.notna: Indicate existing (non-missing) values.
DataFrame.fillna: Replace missing values.
Series.dropna: Drop missing values.
Index.dropna: Drop missing indices.

property dt¶: Accessor object for datetimelike properties of the Series values.

property dtype¶: Return the dtype object of the underlying data.

property dtypes¶: Return the dtype object of the underlying data.

property empty¶

Indicator whether DataFrame is empty.

True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

Returns

bool: If DataFrame is empty, return True, if not return False.

See also

Series.dropna: Return series without null values.
DataFrame.dropna: Return DataFrame with labels on given axis omitted where (all or any) data are missing.

Notes

If DataFrame contains only NaNs, it is still not considered empty. See the example below.

eq(other, level=None, fill_value=None, axis=0)¶

Return Equal to of series and other, element-wise (binary operator eq).

Equivalent to series == other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

equals(other)¶

Test whether two objects contain the same elements.

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.

Parameters

otherSeries or DataFrame: The other Series or DataFrame to be compared with the first.

Returns

bool: True if all elements are the same in both objects, False otherwise.

See also

Series.eq: Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.
DataFrame.eq: Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
testing.assert_series_equal: Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal: Like assert_series_equal, but targets DataFrames.
numpy.array_equal: Return True if two arrays have the same shape and elements, False otherwise.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)¶

Fill NA/NaN values using the specified method.

Parameters

valuescalar, dict, Series, or DataFrame: Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None: Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
axis{0 or ‘index’, 1 or ‘columns’}: Axis along which to fill missing values.
inplacebool, default False: If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
limitint, default None: If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcastdict, default is None: A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Returns

DataFrame or None: Object with missing values filled or None if inplace=True.

See also

interpolate: Fill NaN values using interpolation.
reindex: Conform object to new index.
asfreq: Convert TimeSeries to specified frequency.

floordiv(other, level=None, fill_value=None, axis=0)¶

Return Integer division of series and other, element-wise (binary operator floordiv).

Equivalent to series // other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rfloordiv: Reverse of the Integer division operator, see Python documentation for more details.

ge(other, level=None, fill_value=None, axis=0)¶

Return Greater than or equal to of series and other, element-wise (binary operator ge).

Equivalent to series >= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

get(key, default=None)¶

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters

keyobject

Returns

valuesame type as items contained in object

groupby(by=None, axis=0, level=None, sort=False, **kwargs)¶

Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters

bymapping, function, label, or list of labels: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: Split along rows (0) or columns (1).
levelint, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_indexbool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sortbool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keysbool, default True: When calling apply, add group keys to index to identify pieces.
squeezebool, default False: Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Deprecated since version 1.1.0.
observedbool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
dropnabool, default True: If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups

New in version 1.1.0.

Returns

SeriesGroupBy: Returns a groupby object that contains information about the groups.

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

gt(other, level=None, fill_value=None, axis=0)¶

Return Greater than of series and other, element-wise (binary operator gt).

Equivalent to series > other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

property hasnans¶: Return if I have any nans; enables various perf speedups.

head(n=5)¶

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters

nint, default 5: Number of rows to select.

Returns

same type as caller: The first n rows of the caller object.

See also

DataFrame.tail: Returns the last n rows.

property iat¶

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises

IndexError: When integer position is out of bounds.

See also

DataFrame.at: Access a single value for a row/column label pair.
DataFrame.loc: Access a group of rows and columns by label(s).
DataFrame.iloc: Access a group of rows and columns by integer position(s).

property iloc¶

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
A boolean array.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

See also

DataFrame.iat: Fast integer location scalar accessor.
DataFrame.loc: Purely label-location based indexer for selection by label.
Series.iloc: Purely integer-location based indexing for selection by position.

property index¶: The index (row labels) of the DataFrame.

isna()¶

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

isnull()¶

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

le(other, level=None, fill_value=None, axis=0)¶

Return Less than or equal to of series and other, element-wise (binary operator le).

Equivalent to series <= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

property loc¶

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
A list or array of labels, e.g. ['a', 'b', 'c'].
A slice object with labels, e.g. 'a':'f'.

Warning

Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g. [True, False, True].
An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises

KeyError: If any items are not found.
IndexingError: If an indexed key is passed and its index is unalignable to the frame index.

See also

DataFrame.at: Access a single value for a row/column label pair.
DataFrame.iloc: Access group of rows and columns by integer position(s).
DataFrame.xs: Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
Series.loc: Access group of values using labels.

lt(other, level=None, fill_value=None, axis=0)¶

Return Less than of series and other, element-wise (binary operator lt).

Equivalent to series < other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

mask(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶

Replace values where the condition is True.

Parameters

condbool Series/DataFrame, array-like, or callable

Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.where(): Return an object of same shape as self.

Notes

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This isthe equivalent of the numpy.ndarray method argmax.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the mean of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This isthe equivalent of the numpy.ndarray method argmin.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

mod(other, level=None, fill_value=None, axis=0)¶

Return Modulo of series and other, element-wise (binary operator mod).

Equivalent to series % other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rmod: Reverse of the Modulo operator, see Python documentation for more details.

mul(other, level=None, fill_value=None, axis=0)¶

Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rmul: Reverse of the Multiplication operator, see Python documentation for more details.

multiply(other, level=None, fill_value=None, axis=0)¶

Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rmul: Reverse of the Multiplication operator, see Python documentation for more details.

property name¶

Return the name of the Series.

The name of a Series becomes its index or column name if it is used to form a DataFrame. It is also used whenever displaying the Series using the interpreter.

Returns

label (hashable object): The name of the Series, also the column name if part of a DataFrame.

See also

Series.rename: Sets the Series name when given a scalar input.
Index.name: Corresponding Index property.

Examples

The Series name can be set initially when calling the constructor.

>>> s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers')
>>> s
0    1
1    2
2    3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0    1
1    2
2    3
Name: Integers, dtype: int64

The name of a Series within a DataFrame is its column name.

>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
...                   columns=["Odd Numbers", "Even Numbers"])
>>> df
   Odd Numbers  Even Numbers
0            1             2
1            3             4
2            5             6
>>> df["Even Numbers"].name
'Even Numbers'

property ndim¶: Number of dimensions of the underlying data, by definition 1.

ne(other, level=None, fill_value=None, axis=0)¶

Return Not equal to of series and other, element-wise (binary operator ne).

Equivalent to series != other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

notna()¶

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

notnull()¶

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

DataFrame: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

pow(other, level=None, fill_value=None, axis=0)¶

Return Exponential power of series and other, element-wise (binary operator pow).

Equivalent to series ** other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rpow: Reverse of the Exponential power operator, see Python documentation for more details.

prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

radd(other, level=None, fill_value=None, axis=0)¶

Return Addition of series and other, element-wise (binary operator add).

Equivalent to series + other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.radd: Reverse of the Addition operator, see Python documentation for more details.

rdiv(other, level=None, fill_value=None, axis=0)¶

Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.truediv: Element-wise Floating division, see Python documentation for more details.

reset_index(level=None, drop=False, name=None, inplace=False)¶

Generate a new DataFrame or Series with the index reset.

This is useful when the index needs to be treated as a column, or when the index is meaningless and needs to be reset to the default before another operation.

Parameters

levelint, str, tuple, or list, default optional: For a Series with a MultiIndex, only remove the specified levels from the index. Removes all levels by default.
dropbool, default False: Just reset the index, without inserting it as a column in the new DataFrame.
nameobject, optional: The name to use for the column containing the original Series values. Uses self.name by default. This argument is ignored when drop is True.
inplacebool, default False: Modify the Series in place (do not create a new object).

Returns

Series or DataFrame or None: When drop is False (the default), a DataFrame is returned. The newly created columns will come first in the DataFrame, followed by the original Series values. When drop is True, a Series is returned. In either case, if inplace=True, no value is returned.

See also

DataFrame.reset_index: Analogous function for DataFrame.

rfloordiv(other, level=None, fill_value=None, axis=0)¶

Return Integer division of series and other, element-wise (binary operator rfloordiv).

Equivalent to other // series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.floordiv: Element-wise Integer division, see Python documentation for more details.

rmod(other, level=None, fill_value=None, axis=0)¶

Return Modulo of series and other, element-wise (binary operator rmod).

Equivalent to other % series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.mod: Element-wise Modulo, see Python documentation for more details.

rmul(other, level=None, fill_value=None, axis=0)¶

Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rmul: Reverse of the Multiplication operator, see Python documentation for more details.

rpow(other, level=None, fill_value=None, axis=0)¶

Return Exponential power of series and other, element-wise (binary operator rpow).

Equivalent to other ** series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.pow: Element-wise Exponential power, see Python documentation for more details.

rsub(other, level=None, fill_value=None, axis=0)¶

Return Subtraction of series and other, element-wise (binary operator rsub).

Equivalent to other - series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.sub: Element-wise Subtraction, see Python documentation for more details.

rtruediv(other, level=None, fill_value=None, axis=0)¶

Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.truediv: Element-wise Floating division, see Python documentation for more details.

set_axis(labels, axis=0, inplace=False)¶

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters

labelslist-like, Index: The values for the new index.
axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
inplacebool, default False: Whether to return a new DataFrame instance.

Returns

renamedDataFrame or None: An object of type DataFrame or None if inplace=True.

See also

DataFrame.rename_axis: Alter the name of the index or columns.

property shape¶: Return a tuple of the shape of the underlying data.

property size¶

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

See also

ndarray.size: Number of elements in the array.

sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
levelint or level name or list of ints or list of level names: If not None, sort on values in specified index level(s).
ascendingbool or list-like of bools, default True: Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
inplacebool, default False: If True, perform operation in-place.
kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’: Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position{‘first’, ‘last’}, default ‘last’: Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
sort_remainingbool, default True: If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.
keycallable, optional: If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape. For MultiIndex inputs, the key is applied per level.

New in version 1.1.0.

Returns

DataFrame or None: The original DataFrame sorted by the labels or None if inplace=True.

See also

Series.sort_index: Sort Series by the index.
DataFrame.sort_values: Sort DataFrame by the value.
Series.sort_values: Sort Series by the value.

sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index: bool = False)¶

Sort by the values.

Sort a Series in ascending or descending order by some criterion.

Parameters

axis{0 or ‘index’}, default 0: Axis to direct sorting. The value ‘index’ is accepted for compatibility with DataFrame.sort_values.
ascendingbool or list of bools, default True: If True, sort values in ascending order, otherwise descending.
inplacebool, default False: If True, perform operation in-place.
kind{‘quicksort’, ‘mergesort’ or ‘heapsort’}, default ‘quicksort’: Choice of sorting algorithm. See also numpy.sort() for more information. ‘mergesort’ is the only stable algorithm.
na_position{‘first’ or ‘last’}, default ‘last’: Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
ignore_indexbool, default False: If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.
keycallable, optional: If not None, apply the key function to the series values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return an array-like.

New in version 1.1.0.

Returns

Series or None: Series ordered by values or None if inplace=True.

See also

Series.sort_index: Sort by the Series indices.
DataFrame.sort_values: Sort DataFrame by the values along either axis.
DataFrame.sort_index: Sort DataFrame by indices.

squeeze(axis=None)¶

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default None: A specific axis to squeeze. By default, all length-1 axes are squeezed.

Returns

DataFrame, Series, or scalar: The projection after squeezing axis or all the axes.

See also

Series.iloc: Integer-location based indexing for selecting scalars.
DataFrame.iloc: Integer-location based indexing for selecting Series.
Series.to_frame: Inverse of DataFrame.squeeze for a single-column DataFrame.

std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

property str¶

Vectorized string functions for Series and Index.

NAs stay NA unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.

sub(other, level=None, fill_value=None, axis=0)¶

Return Subtraction of series and other, element-wise (binary operator sub).

Equivalent to series - other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rsub: Reverse of the Subtraction operator, see Python documentation for more details.

subtract(other, level=None, fill_value=None, axis=0)¶

Return Subtraction of series and other, element-wise (binary operator sub).

Equivalent to series - other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rsub: Reverse of the Subtraction operator, see Python documentation for more details.

sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters

axis{index (0), columns (1)}: Axis for the function to be applied on.
skipnabool, default True: Exclude NA/null values when computing the result.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_countint, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs: Additional keyword arguments to be passed to the function.

Returns

Series or DataFrame (if level specified)

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

tail(n=5)¶

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].

Parameters

nint, default 5: Number of rows to select.

Returns

type of caller: The last n rows of the caller object.

See also

DataFrame.head: The first n rows of the caller object.

to_csv(path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator=None, chunksize=None, partition=False)¶

Write object to a comma-separated values (csv) file.

Changed in version 0.24.0: The order of arguments for Series was changed.

Parameters

path_or_bufstr or file handle, default None: File path or object, if None is provided the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.

Changed in version 0.24.0: Was previously named “path” for Series.

Changed in version 1.2.0: Support for binary file objects was introduced.
sepstr, default ‘,’: String of length 1. Field delimiter for the output file.
na_repstr, default ‘’: Missing data representation.
float_formatstr, default None: Format string for floating point numbers.
columnssequence, optional: Columns to write.
headerbool or list of str, default True: Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

Changed in version 0.24.0: Previously defaulted to False for Series.
indexbool, default True: Write row names (index).
index_labelstr or sequence, or False, default None: Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
modestr: Python write mode, default ‘w’.
encodingstr, optional: A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compressionstr or dict, default ‘infer’: If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.

Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.

Changed in version 1.2.0: Compression is supported for binary file objects.

Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.
quotingoptional constant from csv module: Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotecharstr, default ‘"’: String of length 1. Character used to quote fields.
line_terminatorstr, optional: The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).

Changed in version 0.24.0.
chunksizeint or None: Rows to write at a time.
date_formatstr, default None: Format string for datetime objects.
doublequotebool, default True: Control quoting of quotechar inside a field.
escapecharstr, default None: String of length 1. Character used to escape sep and quotechar when appropriate.
decimalstr, default ‘.’: Character recognized as decimal separator. E.g. use ‘,’ for European data.
errorsstr, default ‘strict’: Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

New in version 1.1.0.
storage_optionsdict, optional: Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.0.

Returns

None or str: If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

See also

read_csv: Load a CSV file into a DataFrame.
to_excel: Write DataFrame to an Excel file.

to_frame(name=None)¶

Convert Series to DataFrame.

Parameters

nameobject, default None: The passed name should substitute for the series name (if it has one).

Returns

DataFrame: DataFrame representation of Series.

to_numpy(dtype=None, copy=False, na_value=<object object>, **kwargs)¶

A NumPy ndarray representing the values in this Series or Index.

New in version 0.24.0.

Parameters

dtypestr or numpy.dtype, optional: The dtype to pass to numpy.asarray().
copybool, default False: Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.
na_valueAny, optional: The value to use for missing values. The default value depends on dtype and the type of the array.

New in version 1.0.0.
**kwargs: Additional keywords passed through to the to_numpy method of the underlying array (for extension arrays).

New in version 1.0.0.

Returns

numpy.ndarray

See also

Series.array: Get the actual data stored within.
Index.array: Get the actual data stored within.
DataFrame.to_numpy: Similar method for DataFrame.

Notes

The returned array will be the same up to equality (values equal in self will be equal in the returned array; likewise for values that are not equal). When self contains an ExtensionArray, the dtype may be different. For example, for a category-dtype Series, to_numpy() will return a NumPy array and the categorical dtype will be lost.

For NumPy dtypes, this will be a reference to the actual data stored in this Series or Index (assuming copy=False). Modifying the result in place will modify the data stored in the Series or Index (not that we recommend doing that).

For extension types, to_numpy() may require copying data and coercing the result to a NumPy type (possibly object), which may be expensive. When you need a no-copy reference to the underlying data, Series.array should be used instead.

This table lays out the different dtypes and default return types of to_numpy() for various dtypes within pandas.

dtype	array type
category[T]	ndarray[T] (same dtype as input)
period	ndarray[object] (Periods)
interval	ndarray[object] (Intervals)
IntegerNA	ndarray[object]
datetime64[ns]	datetime64[ns]
datetime64[ns, tz]	ndarray[object] (Timestamps)

to_pandas(schema_only=False)¶

Convert distributed Series into a Pandas Series

Parameters

schema_onlyDoesn’t convert the data when True

Returns

outpandas.Series

truediv(other, level=None, fill_value=None, axis=0)¶

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters

otherSeries or scalar value
fill_valueNone or float value, default None (NaN): Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
levelint or name: Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns

Series: The result of the operation.

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

property values¶

Return Series as ndarray or ndarray-like depending on the dtype.

Returns

outnumpy.ndarray or ndarray-like

var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
levelint or level name, default None: If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default None: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

where(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶

Replace values where the condition is False.

Parameters

condbool Series/DataFrame, array-like, or callable

Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

otherscalar, Series/DataFrame, or callable

Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

inplacebool, default False

Whether to perform the operation in place on the data.

axisint, default None

Alignment axis if needed.

levelint, default None

Alignment level if needed.

errorsstr, {‘raise’, ‘ignore’}, default ‘raise’

Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.

‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.

try_castbool, default False

Try to cast the result back to the input type (if possible).

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.mask(): Return an object of same shape as self.

Notes

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

IO¶

legate.pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, prefix=None, mangle_dupe_cols=True, dtype=None, true_values=None, false_values=None, skiprows=None, skipfooter=0, nrows=None, na_values=None, skip_blank_lines=True, parse_dates=False, compression='infer', quotechar='"', quoting=0, doublequote=True, verify_header=False, **kwargs)¶

Read a comma-separated values (csv) file into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for IO Tools.

Parameters

filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

sepstr, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

delimiterstr, default None

Alias for sep.

headerint, list of int, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

namesarray-like, optional

List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

index_colint, str, sequence of int / str, or False, default None

Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.

Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.

usecolslist-like or callable, optional

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

squeezebool, default False

If the parsed data only contains one column then return a Series.

prefixstr, optional

Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …

mangle_dupe_colsbool, default True

Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.

dtypeType name or dict of column -> type, optional

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

engine{‘c’, ‘python’}, optional

Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

convertersdict, optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels.

true_valueslist, optional

Values to consider as True.

false_valueslist, optional

Values to consider as False.

skipinitialspacebool, default False

Skip spaces after delimiter.

skiprowslist-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skipfooterint, default 0

Number of lines at bottom of file to skip (Unsupported with engine=’c’).

nrowsint, optional

Number of rows of file to read. Useful for reading pieces of large files.

na_valuesscalar, str, list-like, or dict, optional

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_nabool, default True

Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.

Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.

na_filterbool, default True

Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.

verbosebool, default False

Indicate number of NA values placed in non-numeric columns.

skip_blank_linesbool, default True

If True, skip over blank lines rather than interpreting as NaN values.

parse_datesbool or list of int or names or list of lists or dict, default False

The behavior is as follows:

boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See io.csv.mixed_timezones for more.

Note: A fast-path exists for iso8601-formatted dates.

infer_datetime_formatbool, default False

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.

keep_date_colbool, default False

If True and parse_dates specifies combining multiple columns then keep the original columns.

date_parserfunction, optional

Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.

dayfirstbool, default False

DD/MM format dates, international and European format.

cache_datesbool, default True

If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets.

New in version 0.25.0.

iteratorbool, default False

Return TextFileReader object for iteration or getting chunks with get_chunk().

Changed in version 1.2: TextFileReader is a context manager.

chunksizeint, optional

Return TextFileReader object for iteration. See the IO Tools docs for more information on iterator and chunksize.

Changed in version 1.2: TextFileReader is a context manager.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.

thousandsstr, optional

Thousands separator.

decimalstr, default ‘.’

Character to recognize as decimal point (e.g. use ‘,’ for European data).

lineterminatorstr (length 1), optional

Character to break file into lines. Only valid with C parser.

quotecharstr (length 1), optional

The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.

quotingint or csv.QUOTE_* instance, default 0

Control field quoting behavior per csv.QUOTE_* constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).

doublequotebool, default True

When quotechar is specified and quoting is not QUOTE_NONE, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a single quotechar element.

escapecharstr (length 1), optional

One-character string used to escape other characters.

commentstr, optional

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment='#', parsing #empty\na,b,c\n1,2,3 with header=0 will result in ‘a,b,c’ being treated as the header.

encodingstr, optional

Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings . .. versionchanged:: 1.2

When encoding is None, errors="replace" is passed to open(). Otherwise, errors="strict" is passed to open(). This behavior was previously only the case for engine="python".

dialectstr or csv.Dialect, optional

If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.

error_bad_linesbool, default True

Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned.

warn_bad_linesbool, default True

If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output.

delim_whitespacebool, default False

Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep. Equivalent to setting sep='\s+'. If this option is set to True, nothing should be passed in for the delimiter parameter.

low_memorybool, default True

Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).

memory_mapbool, default False

If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

float_precisionstr, optional

Specifies which converter the C engine should use for floating-point values. The options are None or ‘high’ for the ordinary converter, ‘legacy’ for the original lower precision pandas converter, and ‘round_trip’ for the round-trip converter.

Changed in version 1.2.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.

Returns

DataFrame or TextParser: A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes.

See also

DataFrame.to_csv: Write DataFrame to a comma-separated values (csv) file.
read_csv: Read a comma-separated values (csv) file into DataFrame.
read_fwf: Read a table of fixed-width formatted lines into DataFrame.

legate.pandas.read_parquet(path, columns=None, **kwargs)¶

Load a parquet object from the file path, returning a DataFrame.

Parameters

pathstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.parquet. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be: file://localhost/path/to/tables or s3://bucket/partition_dir

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

columnslist, default=None

If not None, only these columns will be read from the file.

use_nullable_dtypesbool, default False

If True, use dtypes that use pd.NA as missing value indicator for the resulting DataFrame (only applicable for engine="pyarrow"). As new dtypes are added that support pd.NA in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice.

New in version 1.2.0.

**kwargs

Any additional kwargs are passed to the engine.

Returns

DataFrame

legate.pandas.read_table(filepath_or_buffer, sep='\t', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, true_values=None, false_values=None, na_values=None, skip_blank_lines=True, parse_dates=False, compression='infer', quotechar='"', quoting=0, skipfooter=0, skiprows=None, nrows=None, doublequote=True, **kwargs)¶

Read general delimited file into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for IO Tools.

Parameters

filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

sepstr, default ‘\t’ (tab-stop)

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

delimiterstr, default None

Alias for sep.

headerint, list of int, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

namesarray-like, optional

List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

index_colint, str, sequence of int / str, or False, default None

Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.

Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.

usecolslist-like or callable, optional

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

squeezebool, default False

If the parsed data only contains one column then return a Series.

prefixstr, optional

Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …

mangle_dupe_colsbool, default True

Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.

dtypeType name or dict of column -> type, optional

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

engine{‘c’, ‘python’}, optional

Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

convertersdict, optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels.

true_valueslist, optional

Values to consider as True.

false_valueslist, optional

Values to consider as False.

skipinitialspacebool, default False

Skip spaces after delimiter.

skiprowslist-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skipfooterint, default 0

Number of lines at bottom of file to skip (Unsupported with engine=’c’).

nrowsint, optional

Number of rows of file to read. Useful for reading pieces of large files.

na_valuesscalar, str, list-like, or dict, optional

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_nabool, default True

Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.

Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.

na_filterbool, default True

Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.

verbosebool, default False

Indicate number of NA values placed in non-numeric columns.

skip_blank_linesbool, default True

If True, skip over blank lines rather than interpreting as NaN values.

parse_datesbool or list of int or names or list of lists or dict, default False

The behavior is as follows:

boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See io.csv.mixed_timezones for more.

Note: A fast-path exists for iso8601-formatted dates.

infer_datetime_formatbool, default False

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.

keep_date_colbool, default False

If True and parse_dates specifies combining multiple columns then keep the original columns.

date_parserfunction, optional

Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.

dayfirstbool, default False

DD/MM format dates, international and European format.

cache_datesbool, default True

If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets.

New in version 0.25.0.

iteratorbool, default False

Return TextFileReader object for iteration or getting chunks with get_chunk().

Changed in version 1.2: TextFileReader is a context manager.

chunksizeint, optional

Return TextFileReader object for iteration. See the IO Tools docs for more information on iterator and chunksize.

Changed in version 1.2: TextFileReader is a context manager.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.

thousandsstr, optional

Thousands separator.

decimalstr, default ‘.’

Character to recognize as decimal point (e.g. use ‘,’ for European data).

lineterminatorstr (length 1), optional

Character to break file into lines. Only valid with C parser.

quotecharstr (length 1), optional

The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.

quotingint or csv.QUOTE_* instance, default 0

Control field quoting behavior per csv.QUOTE_* constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).

doublequotebool, default True

When quotechar is specified and quoting is not QUOTE_NONE, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a single quotechar element.

escapecharstr (length 1), optional

One-character string used to escape other characters.

commentstr, optional

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment='#', parsing #empty\na,b,c\n1,2,3 with header=0 will result in ‘a,b,c’ being treated as the header.

encodingstr, optional

Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings . .. versionchanged:: 1.2

When encoding is None, errors="replace" is passed to open(). Otherwise, errors="strict" is passed to open(). This behavior was previously only the case for engine="python".

dialectstr or csv.Dialect, optional

If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.

error_bad_linesbool, default True

Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned.

warn_bad_linesbool, default True

If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output.

delim_whitespacebool, default False

Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep. Equivalent to setting sep='\s+'. If this option is set to True, nothing should be passed in for the delimiter parameter.

low_memorybool, default True

Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).

memory_mapbool, default False

If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

float_precisionstr, optional

Specifies which converter the C engine should use for floating-point values. The options are None or ‘high’ for the ordinary converter, ‘legacy’ for the original lower precision pandas converter, and ‘round_trip’ for the round-trip converter.

Changed in version 1.2.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

New in version 1.2.

Returns

DataFrame or TextParser: A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes.

See also

DataFrame.to_csv: Write DataFrame to a comma-separated values (csv) file.
read_csv: Read a comma-separated values (csv) file into DataFrame.
read_fwf: Read a table of fixed-width formatted lines into DataFrame.

Utility functions¶

legate.pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)¶

Concatenate pandas objects along a particular axis with optional set logic along the other axes.

Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.

Parameters

objsa sequence or mapping of Series or DataFrame objects: If a mapping is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised.
axis{0/’index’, 1/’columns’}, default 0: The axis to concatenate along.
join{‘inner’, ‘outer’}, default ‘outer’: How to handle indexes on other axis (or axes).
ignore_indexbool, default False: If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
keyssequence, default None: If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.
levelslist of sequences, default None: Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.
nameslist, default None: Names for the levels in the resulting hierarchical index.
verify_integritybool, default False: Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.
sortbool, default False: Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

Changed in version 1.0.0: Changed to not sort by default.
copybool, default True: If False, do not copy data unnecessarily.

Returns

object, type of objs: When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.

See also

Series.append: Concatenate Series.
DataFrame.append: Concatenate DataFrames.
DataFrame.join: Join DataFrames using indexes.
DataFrame.merge: Merge DataFrames by indexes or columns.

Notes

The keys, levels, and names arguments are all optional.

A walkthrough of how this method fits in with other tools for combining pandas objects can be found here.

legate.pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)¶

Convert argument to datetime.

Parameters

argint, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like

The object to convert to a datetime.

errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
If ‘ignore’, then invalid parsing will return the input.

dayfirstbool, default False

Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

yearfirstbool, default False

Specify a date parse order if arg is str or its list-likes.

If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.
If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).

Warning: yearfirst=True is not strict, but will prefer to parse with year first (this is a known bug, based on dateutil behavior).

utcbool, default None

Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

formatstr, default None

The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. See strftime documentation for more information on choices: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.

exactbool, True by default

Behaves as: - If True, require an exact format match. - If False, allow the format to match anywhere in the target string.

unitstr, default ‘ns’

The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.

infer_datetime_formatbool, default False

If True and no format is given, attempt to infer the format of the datetime strings based on the first non-NaN element, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.

originscalar, default ‘unix’

Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date.

If ‘unix’ (or POSIX) time; origin is set to 1970-01-01.
If ‘julian’, unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC.
If Timestamp convertible, origin is set to Timestamp identified by origin.

cachebool, default True

If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. The cache is only used when there are at least 50 values. The presence of out-of-bounds values will render the cache unusable and may slow down parsing.

Changed in version 0.25.0: - changed default value from False to True.

Returns

datetime

If parsing succeeded. Return type depends on input:

list-like: DatetimeIndex
Series: Series of datetime64 dtype
scalar: Timestamp

In case when it is not possible to return designated types (e.g. when any element of input is before Timestamp.min or after Timestamp.max) return will have datetime.datetime type (or corresponding array/Series).

See also

DataFrame.astype: Cast argument to a specified dtype.
to_timedelta: Convert argument to timedelta.
convert_dtypes: Convert dtypes.