Legate Pandas API Reference¶
DataFrame¶
-
class
legate.pandas.
DataFrame
(data=None, index=None, columns=None, dtype=None, copy=False, frame=None)¶ - Attributes
at
Access a single value for a row/column label pair.
axes
Return a list representing the axes of the DataFrame.
columns
The column labels of the DataFrame.
dtypes
Return the dtypes in the DataFrame.
empty
Indicator whether DataFrame is empty.
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
The index (row labels) of the DataFrame.
loc
Access a group of rows and columns by label(s) or a boolean array.
ndim
Return an int representing the number of axes / array dimensions.
shape
Return a tuple representing the dimensionality of the DataFrame.
size
Return an int representing the number of elements in this object.
Methods
abs
()Return a Series/DataFrame with absolute numeric value of each element.
add
(other[, axis, level, fill_value])Get Addition of dataframe and other, element-wise (binary operator add).
add_prefix
(prefix)Prefix labels with string prefix.
add_suffix
(suffix)Suffix labels with string suffix.
all
([axis, bool_only, skipna, level])Return whether all elements are True, potentially over an axis.
any
([axis, bool_only, skipna, level])Return whether any element is True, potentially over an axis.
append
(other[, ignore_index, …])Append rows of other to the end of caller, returning a new object.
astype
(dtype[, copy, errors])Cast a pandas object to a specified dtype
dtype
.bool
()Return the bool of a single element Series or DataFrame.
count
([axis, level, numeric_only])Count non-NA cells for each column or row.
cummax
([axis, skipna])Return cumulative maximum over a DataFrame or Series axis.
cummin
([axis, skipna])Return cumulative minimum over a DataFrame or Series axis.
cumprod
([axis, skipna])Return cumulative product over a DataFrame or Series axis.
cumsum
([axis, skipna])Return cumulative sum over a DataFrame or Series axis.
div
(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator truediv).
divide
(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator truediv).
drop
([labels, axis, index, columns, level, …])Drop specified labels from rows or columns.
droplevel
(level[, axis])Return DataFrame with requested index / column level(s) removed.
dropna
([axis, how, thresh, subset, inplace])Remove missing values.
eq
(other[, axis, level, fill_value])Get Equal to of dataframe and other, element-wise (binary operator eq).
equals
(other)Test whether two objects contain the same elements.
fillna
([value, method, axis, inplace, …])Fill NA/NaN values using the specified method.
floordiv
(other[, axis, level, fill_value])Get Integer division of dataframe and other, element-wise (binary operator floordiv).
ge
(other[, axis, level, fill_value])Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
get
(key[, default])Get item from object for given key (ex: DataFrame column).
groupby
([by, axis, level, as_index, sort])Group DataFrame using a mapper or by a Series of columns.
gt
(other[, axis, level, fill_value])Get Greater than of dataframe and other, element-wise (binary operator gt).
head
([n])Return the first n rows.
insert
(loc, column, value[, allow_duplicates])Insert column into DataFrame at specified location.
isna
()Detect missing values.
isnull
()Detect missing values.
join
(other[, on, how, lsuffix, rsuffix, sort])Join columns of another DataFrame.
keys
()Get the ‘info axis’ (see Indexing for more).
le
(other[, axis, level, fill_value])Get Less than or equal to of dataframe and other, element-wise (binary operator le).
lt
(other[, axis, level, fill_value])Get Less than of dataframe and other, element-wise (binary operator lt).
mask
(cond[, other, inplace, axis, level, …])Replace values where the condition is True.
max
([axis, skipna, level, numeric_only])Return the maximum of the values over the requested axis.
mean
([axis, skipna, level, numeric_only])Return the mean of the values over the requested axis.
merge
(right[, how, on, left_on, right_on, …])Merge DataFrame or named Series objects with a database-style join.
min
([axis, skipna, level, numeric_only])Return the minimum of the values over the requested axis.
mod
(other[, axis, level, fill_value])Get Modulo of dataframe and other, element-wise (binary operator mod).
mul
(other[, axis, level, fill_value])Get Multiplication of dataframe and other, element-wise (binary operator mul).
multiply
(other[, axis, level, fill_value])Get Multiplication of dataframe and other, element-wise (binary operator mul).
ne
(other[, axis, level, fill_value])Get Not equal to of dataframe and other, element-wise (binary operator ne).
notna
()Detect existing (non-missing) values.
notnull
()Detect existing (non-missing) values.
pow
(other[, axis, level, fill_value])Get Exponential power of dataframe and other, element-wise (binary operator pow).
prod
([axis, skipna, level, numeric_only, …])Return the product of the values over the requested axis.
product
([axis, skipna, level, numeric_only, …])Return the product of the values over the requested axis.
query
(expr[, inplace])Query the columns of a DataFrame with a boolean expression.
radd
(other[, axis, level, fill_value])Get Addition of dataframe and other, element-wise (binary operator add).
rdiv
(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
rename
([mapper, index, columns, axis, copy, …])Alter axes labels.
reset_index
([level, drop, inplace, …])Reset the index, or a level of it.
rfloordiv
(other[, axis, level, fill_value])Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
rmod
(other[, axis, level, fill_value])Get Modulo of dataframe and other, element-wise (binary operator rmod).
rmul
(other[, axis, level, fill_value])Get Multiplication of dataframe and other, element-wise (binary operator mul).
rpow
(other[, axis, level, fill_value])Get Exponential power of dataframe and other, element-wise (binary operator rpow).
rsub
(other[, axis, level, fill_value])Get Subtraction of dataframe and other, element-wise (binary operator rsub).
rtruediv
(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
set_axis
(labels[, axis, inplace])Assign desired index to given axis.
set_index
(keys[, drop, append, inplace, …])Set the DataFrame index using existing columns.
sort_index
([axis, level, ascending, …])Sort object by labels (along an axis).
sort_values
(by[, axis, ascending, inplace, …])Sort by the values along either axis.
squeeze
([axis])Squeeze 1 dimensional axis objects into scalars.
std
([axis, skipna, level, ddof, numeric_only])Return sample standard deviation over requested axis.
sub
(other[, axis, level, fill_value])Get Subtraction of dataframe and other, element-wise (binary operator sub).
subtract
(other[, axis, level, fill_value])Get Subtraction of dataframe and other, element-wise (binary operator sub).
sum
([axis, skipna, level, numeric_only, …])Return the sum of the values over the requested axis.
tail
([n])Return the last n rows.
to_csv
([path_or_buf, sep, na_rep, columns, …])Write object to a comma-separated values (csv) file.
to_pandas
([schema_only])Convert distributed DataFrame into a Pandas DataFrame
to_parquet
(path[, engine, compression, …])Write a DataFrame to the binary parquet format.
truediv
(other[, axis, level, fill_value])Get Floating division of dataframe and other, element-wise (binary operator truediv).
var
([axis, skipna, level, ddof, numeric_only])Return unbiased variance over requested axis.
where
(cond[, other, inplace, axis, level, …])Replace values where the condition is False.
copy
-
abs
()¶ Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
- Returns
- abs
Series/DataFrame containing the absolute value of each element.
See also
numpy.absolute
Calculate the absolute value element-wise.
Notes
For
complex
inputs,1.2 + 1j
, the absolute value is \(\sqrt{ a^2 + b^2 }\).
-
add
(other, axis='columns', level=None, fill_value=None)¶ Get Addition of dataframe and other, element-wise (binary operator add).
Equivalent to
dataframe + other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
add_prefix
(prefix)¶ Prefix labels with string prefix.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
- Parameters
- prefixstr
The string to add before each label.
- Returns
- Series or DataFrame
New Series or DataFrame with updated labels.
See also
Series.add_suffix
Suffix row labels with string suffix.
DataFrame.add_suffix
Suffix column labels with string suffix.
-
add_suffix
(suffix)¶ Suffix labels with string suffix.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
- Parameters
- suffixstr
The string to add after each label.
- Returns
- Series or DataFrame
New Series or DataFrame with updated labels.
See also
Series.add_prefix
Prefix row labels with string prefix.
DataFrame.add_prefix
Prefix column labels with string prefix.
-
all
(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.
- bool_onlybool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- **kwargsany, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.
See also
Series.all
Return True if all elements are True.
DataFrame.any
Return True if one (or more) elements are True.
-
any
(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.
- bool_onlybool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- **kwargsany, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.
See also
numpy.any
Numpy version of this method.
Series.any
Return whether any element is True.
Series.all
Return whether all elements are True.
DataFrame.any
Return whether any element is True over requested axis.
DataFrame.all
Return whether all elements are True over requested axis.
-
append
(other, ignore_index=False, verify_integrity=False, sort=False)¶ Append rows of other to the end of caller, returning a new object.
Columns in other that are not in the caller are added as new columns.
- Parameters
- otherDataFrame or Series/dict-like object, or list of these
The data to append.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
- verify_integritybool, default False
If True, raise ValueError on creating index with duplicates.
- sortbool, default False
Sort columns if the columns of self and other are not aligned.
Changed in version 1.0.0: Changed to not sort by default.
- Returns
- DataFrame
See also
concat
General function to concatenate DataFrame or Series objects.
Notes
If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.
-
astype
(dtype, copy=True, errors='raise')¶ Cast a pandas object to a specified dtype
dtype
.- Parameters
- dtypedata type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
- copybool, default True
Return a copy when
copy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects).- errors{‘raise’, ‘ignore’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype.
raise
: allow exceptions to be raisedignore
: suppress exceptions. On error return original object.
- Returns
- castedsame type as caller
See also
to_datetime
Convert argument to datetime.
to_timedelta
Convert argument to timedelta.
to_numeric
Convert argument to a numeric type.
numpy.ndarray.astype
Cast a numpy array to a specified type.
-
property
at
¶ Access a single value for a row/column label pair.
Similar to
loc
, in that both provide label-based lookups. Useat
if you only need to get or set a single value in a DataFrame or Series.- Raises
- KeyError
If ‘label’ does not exist in DataFrame.
See also
DataFrame.iat
Access a single value for a row/column pair by integer position.
DataFrame.loc
Access a group of rows and columns by label(s).
Series.at
Access a single value using a label.
-
property
axes
¶ Return a list representing the axes of the DataFrame.
It has the row axis labels and column axis labels as the only members. They are returned in that order.
-
bool
()¶ Return the bool of a single element Series or DataFrame.
This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).
- Returns
- bool
The value in the Series or DataFrame.
See also
Series.astype
Change the data type of a Series, including to boolean.
DataFrame.astype
Change the data type of a DataFrame, including to boolean.
numpy.bool_
NumPy boolean data type, used by pandas for boolean values.
-
property
columns
¶ The column labels of the DataFrame.
-
count
(axis=0, level=None, numeric_only=False)¶ Count non-NA cells for each column or row.
The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
- levelint or str, optional
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.
- numeric_onlybool, default False
Include only float, int or boolean data.
- Returns
- Series or DataFrame
For each column/row the number of non-NA/null entries. If level is specified returns a DataFrame.
See also
Series.count
Number of non-NA elements in a Series.
DataFrame.value_counts
Count unique combinations of columns.
DataFrame.shape
Number of DataFrame rows and columns (including NA elements).
DataFrame.isna
Boolean same-sized DataFrame showing places of NA elements.
-
cummax
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative maximum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative maximum of Series or DataFrame.
See also
core.window.Expanding.max
Similar functionality but ignores
NaN
values.DataFrame.max
Return the maximum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cummin
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative minimum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative minimum of Series or DataFrame.
See also
core.window.Expanding.min
Similar functionality but ignores
NaN
values.DataFrame.min
Return the minimum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cumprod
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative product.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative product of Series or DataFrame.
See also
core.window.Expanding.prod
Similar functionality but ignores
NaN
values.DataFrame.prod
Return the product over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cumsum
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative sum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative sum of Series or DataFrame.
See also
core.window.Expanding.sum
Similar functionality but ignores
NaN
values.DataFrame.sum
Return the sum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
div
(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to
dataframe / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
divide
(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to
dataframe / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
drop
(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')¶ Drop specified labels from rows or columns.
Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.
- Parameters
- labelssingle label or list-like
Index or column labels to drop.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
- indexsingle label or list-like
Alternative to specifying axis (
labels, axis=0
is equivalent toindex=labels
).- columnssingle label or list-like
Alternative to specifying axis (
labels, axis=1
is equivalent tocolumns=labels
).- levelint or level name, optional
For MultiIndex, level from which the labels will be removed.
- inplacebool, default False
If False, return a copy. Otherwise, do operation inplace and return None.
- errors{‘ignore’, ‘raise’}, default ‘raise’
If ‘ignore’, suppress error and only existing labels are dropped.
- Returns
- DataFrame or None
DataFrame without the removed index or column labels or None if
inplace=True
.
- Raises
- KeyError
If any of the labels is not found in the selected axis.
See also
DataFrame.loc
Label-location based indexer for selection by label.
DataFrame.dropna
Return DataFrame with labels on given axis omitted where (all or any) data are missing.
DataFrame.drop_duplicates
Return DataFrame with duplicate rows removed, optionally only considering certain columns.
Series.drop
Return Series with specified index labels removed.
-
droplevel
(level, axis=0)¶ Return DataFrame with requested index / column level(s) removed.
New in version 0.24.0.
- Parameters
- levelint, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the level(s) is removed:
0 or ‘index’: remove level(s) in column.
1 or ‘columns’: remove level(s) in row.
- Returns
- DataFrame
DataFrame with requested index / column level(s) removed.
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Remove missing values.
See the User Guide for more on which values are considered missing, and how to work with missing data.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.
Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
- how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.
- threshint, optional
Require that many non-NA values.
- subsetarray-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
- inplacebool, default False
If True, do operation inplace and return None.
- Returns
- DataFrame or None
DataFrame with NA entries dropped from it or None if
inplace=True
.
See also
DataFrame.isna
Indicate missing values.
DataFrame.notna
Indicate existing (non-missing) values.
DataFrame.fillna
Replace missing values.
Series.dropna
Drop missing values.
Index.dropna
Drop missing indices.
-
property
dtypes
¶ Return the dtypes in the DataFrame.
This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the
object
dtype. See the User Guide for more.- Returns
- pandas.Series
The data type of each column.
-
property
empty
¶ Indicator whether DataFrame is empty.
True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
- Returns
- bool
If DataFrame is empty, return True, if not return False.
See also
Series.dropna
Return series without null values.
DataFrame.dropna
Return DataFrame with labels on given axis omitted where (all or any) data are missing.
Notes
If DataFrame contains only NaNs, it is still not considered empty. See the example below.
-
eq
(other, axis='columns', level=None, fill_value=None)¶ Get Equal to of dataframe and other, element-wise (binary operator eq).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
equals
(other)¶ Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.
- Parameters
- otherSeries or DataFrame
The other Series or DataFrame to be compared with the first.
- Returns
- bool
True if all elements are the same in both objects, False otherwise.
See also
Series.eq
Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.
DataFrame.eq
Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
testing.assert_series_equal
Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal
Like assert_series_equal, but targets DataFrames.
numpy.array_equal
Return True if two arrays have the same shape and elements, False otherwise.
-
fillna
(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method.
- Parameters
- valuescalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
- method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
- axis{0 or ‘index’, 1 or ‘columns’}
Axis along which to fill missing values.
- inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
- limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
- Returns
- DataFrame or None
Object with missing values filled or None if
inplace=True
.
See also
interpolate
Fill NaN values using interpolation.
reindex
Conform object to new index.
asfreq
Convert TimeSeries to specified frequency.
-
floordiv
(other, axis='columns', level=None, fill_value=None)¶ Get Integer division of dataframe and other, element-wise (binary operator floordiv).
Equivalent to
dataframe // other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
ge
(other, axis='columns', level=None, fill_value=None)¶ Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
get
(key, default=None)¶ Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
- Parameters
- keyobject
- Returns
- valuesame type as items contained in object
-
groupby
(by=None, axis=0, level=None, as_index=True, sort=False, **kwargs)¶ Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If
by
is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see.align()
method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns inself
. Notice that a tuple is interpreted as a (single) key.- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1).
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
- group_keysbool, default True
When calling apply, add group keys to index to identify pieces.
- squeezebool, default False
Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
Deprecated since version 1.1.0.
- observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups
New in version 1.1.0.
- Returns
- DataFrameGroupBy
Returns a groupby object that contains information about the groups.
See also
resample
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more.
-
gt
(other, axis='columns', level=None, fill_value=None)¶ Get Greater than of dataframe and other, element-wise (binary operator gt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
head
(n=5)¶ Return the first n rows.
This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
For negative values of n, this function returns all rows except the last n rows, equivalent to
df[:-n]
.- Parameters
- nint, default 5
Number of rows to select.
- Returns
- same type as caller
The first n rows of the caller object.
See also
DataFrame.tail
Returns the last n rows.
-
property
iat
¶ Access a single value for a row/column pair by integer position.
Similar to
iloc
, in that both provide integer-based lookups. Useiat
if you only need to get or set a single value in a DataFrame or Series.- Raises
- IndexError
When integer position is out of bounds.
See also
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.loc
Access a group of rows and columns by label(s).
DataFrame.iloc
Access a group of rows and columns by integer position(s).
-
property
iloc
¶ Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from0
tolength-1
of the axis), but may also be used with a boolean array.Allowed inputs are:
An integer, e.g.
5
.A list or array of integers, e.g.
[4, 3, 0]
.A slice object with ints, e.g.
1:7
.A boolean array.
A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
.iloc
will raiseIndexError
if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).See more at Selection by Position.
See also
DataFrame.iat
Fast integer location scalar accessor.
DataFrame.loc
Purely label-location based indexer for selection by label.
Series.iloc
Purely integer-location based indexing for selection by position.
-
property
index
¶ The index (row labels) of the DataFrame.
-
insert
(loc, column, value, allow_duplicates=False)¶ Insert column into DataFrame at specified location.
Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.
- Parameters
- locint
Insertion index. Must verify 0 <= loc <= len(columns).
- columnstr, number, or hashable object
Label of the inserted column.
- valueint, Series, or array-like
- allow_duplicatesbool, optional
-
isna
()¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
See also
DataFrame.isnull
Alias of isna.
DataFrame.notna
Boolean inverse of isna.
DataFrame.dropna
Omit axes labels with missing values.
isna
Top-level isna.
-
isnull
()¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
See also
DataFrame.isnull
Alias of isna.
DataFrame.notna
Boolean inverse of isna.
DataFrame.dropna
Omit axes labels with missing values.
isna
Top-level isna.
-
join
(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, **kwargs)¶ Join columns of another DataFrame.
Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.
- Parameters
- otherDataFrame, Series, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
- onstr, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.
- how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: use other’s index.
outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it. lexicographically.
inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
- lsuffixstr, default ‘’
Suffix to use from left frame’s overlapping columns.
- rsuffixstr, default ‘’
Suffix to use from right frame’s overlapping columns.
- sortbool, default False
Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
- Returns
- DataFrame
A dataframe containing columns from both the caller and other.
See also
DataFrame.merge
For column(s)-on-column(s) operations.
Notes
Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.
Support for specifying index levels as the on parameter was added in version 0.23.0.
-
keys
()¶ Get the ‘info axis’ (see Indexing for more).
This is index for Series, columns for DataFrame.
- Returns
- Index
Info axis.
-
le
(other, axis='columns', level=None, fill_value=None)¶ Get Less than or equal to of dataframe and other, element-wise (binary operator le).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
property
loc
¶ Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.Allowed inputs are:
A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index).A list or array of labels, e.g.
['a', 'b', 'c']
.A slice object with labels, e.g.
'a':'f'
.Warning
Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g.
[True, False, True]
.An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)
See more at Selection by Label.
- Raises
- KeyError
If any items are not found.
- IndexingError
If an indexed key is passed and its index is unalignable to the frame index.
See also
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.iloc
Access group of rows and columns by integer position(s).
DataFrame.xs
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
Series.loc
Access group of values using labels.
-
lt
(other, axis='columns', level=None, fill_value=None)¶ Get Less than of dataframe and other, element-wise (binary operator lt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
mask
(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like, or callable
Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- otherscalar, Series/DataFrame, or callable
Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- inplacebool, default False
Whether to perform the operation in place on the data.
- axisint, default None
Alignment axis if needed.
- levelint, default None
Alignment level if needed.
- errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.
‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.
- try_castbool, default False
Try to cast the result back to the input type (if possible).
- Returns
- Same type as caller or None if
inplace=True
.
- Same type as caller or None if
See also
DataFrame.where()
Return an object of same shape as self.
Notes
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isFalse
the element is used; otherwise the corresponding element from the DataFrameother
is used.The signature for
DataFrame.where()
differs fromnumpy.where()
. Roughlydf1.where(m, df2)
is equivalent tonp.where(m, df1, df2)
.For further details and examples see the
mask
documentation in indexing.
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the maximum of the values over the requested axis.
If you want the index of the maximum, use
idxmax
. This isthe equivalent of thenumpy.ndarray
methodargmax
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
-
merge
(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, **kwargs)¶ Merge DataFrame or named Series objects with a database-style join.
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
- Parameters
- rightDataFrame or named Series
Object to merge with.
- how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
Type of merge to be performed.
left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
cross: creates the cartesian product from both frames, preserves the order of the left keys.
New in version 1.2.0.
- onlabel or list
Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
- left_onlabel or list, or array-like
Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.
- right_onlabel or list, or array-like
Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.
- left_indexbool, default False
Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.
- right_indexbool, default False
Use the index from the right DataFrame as the join key. Same caveats as left_index.
- sortbool, default False
Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).
- suffixeslist-like, default is (“_x”, “_y”)
A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.
- copybool, default True
If False, avoid copy if possible.
- indicatorbool or str, default False
If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.
- validatestr, optional
If specified, checks if merge is of specified type.
“one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
“one_to_many” or “1:m”: check if merge keys are unique in left dataset.
“many_to_one” or “m:1”: check if merge keys are unique in right dataset.
“many_to_many” or “m:m”: allowed, but does not result in checks.
- Returns
- DataFrame
A DataFrame of the two merged objects.
See also
merge_ordered
Merge with optional filling/interpolation.
merge_asof
Merge on nearest keys.
DataFrame.join
Similar method using indices.
Notes
Support for specifying index levels as the on, left_on, and right_on parameters was added in version 0.23.0 Support for merging named Series objects was added in version 0.24.0
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the minimum of the values over the requested axis.
If you want the index of the minimum, use
idxmin
. This isthe equivalent of thenumpy.ndarray
methodargmin
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
mod
(other, axis='columns', level=None, fill_value=None)¶ Get Modulo of dataframe and other, element-wise (binary operator mod).
Equivalent to
dataframe % other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
mul
(other, axis='columns', level=None, fill_value=None)¶ Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to
dataframe * other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
multiply
(other, axis='columns', level=None, fill_value=None)¶ Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to
dataframe * other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
property
ndim
¶ Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.
See also
ndarray.ndim
Number of array dimensions.
-
ne
(other, axis='columns', level=None, fill_value=None)¶ Get Not equal to of dataframe and other, element-wise (binary operator ne).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- DataFrame of bool
Result of the comparison.
See also
DataFrame.eq
Compare DataFrames for equality elementwise.
DataFrame.ne
Compare DataFrames for inequality elementwise.
DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.
Notes
Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).
-
notna
()¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
See also
DataFrame.notnull
Alias of notna.
DataFrame.isna
Boolean inverse of notna.
DataFrame.dropna
Omit axes labels with missing values.
notna
Top-level notna.
-
notnull
()¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
See also
DataFrame.notnull
Alias of notna.
DataFrame.isna
Boolean inverse of notna.
DataFrame.dropna
Omit axes labels with missing values.
notna
Top-level notna.
-
pow
(other, axis='columns', level=None, fill_value=None)¶ Get Exponential power of dataframe and other, element-wise (binary operator pow).
Equivalent to
dataframe ** other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the product of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
product
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the product of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
query
(expr, inplace=False, **kwargs)¶ Query the columns of a DataFrame with a boolean expression.
- Parameters
- exprstr
The query string to evaluate.
You can refer to variables in the environment by prefixing them with an ‘@’ character like
@a + b
.You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2) would be referenced as Area (cm^2)). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.
For example, if one of your columns is called
a a
and you want to sum it withb
, your query should be`a a` + b
.New in version 0.25.0: Backtick quoting introduced.
New in version 1.0.0: Expanding functionality of backtick quoting for more than only spaces.
- inplacebool
Whether the query should modify the data in place or return a modified copy.
- **kwargs
See the documentation for
eval()
for complete details on the keyword arguments accepted byDataFrame.query()
.
- Returns
- DataFrame or None
DataFrame resulting from the provided query expression or None if
inplace=True
.
See also
eval
Evaluate a string describing operations on DataFrame columns.
DataFrame.eval
Evaluate a string describing operations on DataFrame columns.
Notes
The result of the evaluation of this expression is first passed to
DataFrame.loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed toDataFrame.__getitem__()
.This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
DataFrame.index
andDataFrame.columns
attributes of theDataFrame
instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.For further details and examples see the
query
documentation in indexing.Backtick quoted variables
Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.
During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).
In a special case, quotes that make a pair around a backtick can confuse the parser. For example,
`it's` > `that's`
will raise an error, as it forms a quoted string ('s > `that'
) with a backtick inside.See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in
pandas.core.computation.parsing
.
-
radd
(other, axis='columns', level=None, fill_value=None)¶ Get Addition of dataframe and other, element-wise (binary operator add).
Equivalent to
dataframe + other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rdiv
(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to
other / dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rename
(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')¶ Alter axes labels.
Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.
See the user guide for more.
- Parameters
- mapperdict-like or function
Dict-like or function transformations to apply to that axis’ values. Use either
mapper
andaxis
to specify the axis to target withmapper
, orindex
andcolumns
.- indexdict-like or function
Alternative to specifying axis (
mapper, axis=0
is equivalent toindex=mapper
).- columnsdict-like or function
Alternative to specifying axis (
mapper, axis=1
is equivalent tocolumns=mapper
).- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to target with
mapper
. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.- copybool, default True
Also copy underlying data.
- inplacebool, default False
Whether to return a new DataFrame. If True then value of copy is ignored.
- levelint or level name, default None
In case of a MultiIndex, only rename labels in the specified level.
- errors{‘ignore’, ‘raise’}, default ‘ignore’
If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.
- Returns
- DataFrame or None
DataFrame with the renamed axis labels or None if
inplace=True
.
- Raises
- KeyError
If any of the labels is not found in the selected axis and “errors=’raise’”.
See also
DataFrame.rename_axis
Set the name of the axis.
-
reset_index
(level=None, drop=False, inplace=False, col_level=0, col_fill='')¶ Reset the index, or a level of it.
Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.
- Parameters
- levelint, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default.
- dropbool, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index.
- inplacebool, default False
Modify the DataFrame in place (do not create a new object).
- col_levelint or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
- col_fillobject, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
- Returns
- DataFrame or None
DataFrame with the new index or None if
inplace=True
.
See also
DataFrame.set_index
Opposite of reset_index.
DataFrame.reindex
Change to new indices or expand indices.
DataFrame.reindex_like
Change to same indices as other DataFrame.
-
rfloordiv
(other, axis='columns', level=None, fill_value=None)¶ Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
Equivalent to
other // dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rmod
(other, axis='columns', level=None, fill_value=None)¶ Get Modulo of dataframe and other, element-wise (binary operator rmod).
Equivalent to
other % dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rmul
(other, axis='columns', level=None, fill_value=None)¶ Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to
dataframe * other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rpow
(other, axis='columns', level=None, fill_value=None)¶ Get Exponential power of dataframe and other, element-wise (binary operator rpow).
Equivalent to
other ** dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rsub
(other, axis='columns', level=None, fill_value=None)¶ Get Subtraction of dataframe and other, element-wise (binary operator rsub).
Equivalent to
other - dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
rtruediv
(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to
other / dataframe
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
set_axis
(labels, axis=0, inplace=False)¶ Assign desired index to given axis.
Indexes for column or row labels can be changed by assigning a list-like or Index.
- Parameters
- labelslist-like, Index
The values for the new index.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
- inplacebool, default False
Whether to return a new DataFrame instance.
- Returns
- renamedDataFrame or None
An object of type DataFrame or None if
inplace=True
.
See also
DataFrame.rename_axis
Alter the name of the index or columns.
-
set_index
(keys, drop=True, append=False, inplace=False, verify_integrity=False)¶ Set the DataFrame index using existing columns.
Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.
- Parameters
- keyslabel or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses
Series
,Index
,np.ndarray
, and instances ofIterator
.- dropbool, default True
Delete columns to be used as the new index.
- appendbool, default False
Whether to append columns to existing index.
- inplacebool, default False
If True, modifies the DataFrame in place (do not create a new object).
- verify_integritybool, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.
- Returns
- DataFrame or None
Changed row labels or None if
inplace=True
.
See also
DataFrame.reset_index
Opposite of set_index.
DataFrame.reindex
Change to new indices or expand indices.
DataFrame.reindex_like
Change to same indices as other DataFrame.
-
property
shape
¶ Return a tuple representing the dimensionality of the DataFrame.
See also
ndarray.shape
Tuple of array dimensions.
-
property
size
¶ Return an int representing the number of elements in this object.
Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.
See also
ndarray.size
Number of elements in the array.
-
sort_index
(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶ Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if inplace argument is
False
, otherwise updates the original DataFrame and returns None.- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
- levelint or level name or list of ints or list of level names
If not None, sort on values in specified index level(s).
- ascendingbool or list-like of bools, default True
Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
- inplacebool, default False
If True, perform operation in-place.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
- na_position{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
- sort_remainingbool, default True
If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- keycallable, optional
If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect anIndex
and return anIndex
of the same shape. For MultiIndex inputs, the key is applied per level.New in version 1.1.0.
- Returns
- DataFrame or None
The original DataFrame sorted by the labels or None if
inplace=True
.
See also
Series.sort_index
Sort Series by the index.
DataFrame.sort_values
Sort DataFrame by the value.
Series.sort_values
Sort Series by the value.
-
sort_values
(by, axis=0, ascending=True, inplace: legate.pandas.frontend.frame.Frame.bool = False, kind='quicksort', na_position='last', ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶ Sort by the values along either axis.
- Parameters
- bystr or list of str
Name or list of names to sort by.
if axis is 0 or ‘index’ then by may contain index levels and/or column labels.
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to be sorted.
- ascendingbool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
- inplacebool, default False
If True, perform operation in-place.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
- na_position{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- keycallable, optional
Apply the key function to the values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect aSeries
and return a Series with the same shape as the input. It will be applied to each column in by independently.New in version 1.1.0.
- Returns
- DataFrame or None
DataFrame with sorted values or None if
inplace=True
.
See also
DataFrame.sort_index
Sort a DataFrame by the index.
Series.sort_values
Similar method for a Series.
-
squeeze
(axis=None)¶ Squeeze 1 dimensional axis objects into scalars.
Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.
This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
A specific axis to squeeze. By default, all length-1 axes are squeezed.
- Returns
- DataFrame, Series, or scalar
The projection after squeezing axis or all the axes.
See also
Series.iloc
Integer-location based indexing for selecting scalars.
DataFrame.iloc
Integer-location based indexing for selecting Series.
Series.to_frame
Inverse of DataFrame.squeeze for a single-column DataFrame.
-
std
(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- axis{index (0), columns (1)}
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- Returns
- Series or DataFrame (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
sub
(other, axis='columns', level=None, fill_value=None)¶ Get Subtraction of dataframe and other, element-wise (binary operator sub).
Equivalent to
dataframe - other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
subtract
(other, axis='columns', level=None, fill_value=None)¶ Get Subtraction of dataframe and other, element-wise (binary operator sub).
Equivalent to
dataframe - other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the sum of the values over the requested axis.
This is equivalent to the method
numpy.sum
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
tail
(n=5)¶ Return the last n rows.
This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
For negative values of n, this function returns all rows except the first n rows, equivalent to
df[n:]
.- Parameters
- nint, default 5
Number of rows to select.
- Returns
- type of caller
The last n rows of the caller object.
See also
DataFrame.head
The first n rows of the caller object.
-
to_csv
(path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator=None, chunksize=None, partition=False)¶ Write object to a comma-separated values (csv) file.
Changed in version 0.24.0: The order of arguments for Series was changed.
- Parameters
- path_or_bufstr or file handle, default None
File path or object, if None is provided the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
Changed in version 0.24.0: Was previously named “path” for Series.
Changed in version 1.2.0: Support for binary file objects was introduced.
- sepstr, default ‘,’
String of length 1. Field delimiter for the output file.
- na_repstr, default ‘’
Missing data representation.
- float_formatstr, default None
Format string for floating point numbers.
- columnssequence, optional
Columns to write.
- headerbool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
Changed in version 0.24.0: Previously defaulted to False for Series.
- indexbool, default True
Write row names (index).
- index_labelstr or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
- modestr
Python write mode, default ‘w’.
- encodingstr, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
- compressionstr or dict, default ‘infer’
If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.
Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.
Changed in version 1.2.0: Compression is supported for binary file objects.
Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.
- quotingoptional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
- quotecharstr, default ‘"’
String of length 1. Character used to quote fields.
- line_terminatorstr, optional
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).
Changed in version 0.24.0.
- chunksizeint or None
Rows to write at a time.
- date_formatstr, default None
Format string for datetime objects.
- doublequotebool, default True
Control quoting of quotechar inside a field.
- escapecharstr, default None
String of length 1. Character used to escape sep and quotechar when appropriate.
- decimalstr, default ‘.’
Character recognized as decimal separator. E.g. use ‘,’ for European data.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.New in version 1.1.0.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by
fsspec
, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.New in version 1.2.0.
- Returns
- None or str
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
See also
read_csv
Load a CSV file into a DataFrame.
to_excel
Write DataFrame to an Excel file.
-
to_pandas
(schema_only=False)¶ Convert distributed DataFrame into a Pandas DataFrame
- Parameters
- schema_onlyDoesn’t convert the data when True
- Returns
- outpandas.DataFrame
-
to_parquet
(path, engine='auto', compression='snappy', index=None, partition_cols=None, **kwargs)¶ Write a DataFrame to the binary parquet format.
This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.
- Parameters
- pathstr or file-like object, default None
If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned.
Changed in version 1.2.0.
Previously this was “fname”
- engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.- compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
Name of the compression to use. Use
None
for no compression.- indexbool, default None
If
True
, include the dataframe’s index(es) in the file output. IfFalse
, they will not be written to the file. IfNone
, similar toTrue
the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.New in version 0.24.0.
- partition_colslist, optional, default None
Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
New in version 0.24.0.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by
fsspec
, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.New in version 1.2.0.
- **kwargs
Additional arguments passed to the parquet library. See pandas io for more details.
- Returns
- bytes if no path argument is provided else None
See also
read_parquet
Read a parquet file.
DataFrame.to_csv
Write a csv file.
DataFrame.to_sql
Write to a sql table.
DataFrame.to_hdf
Write to hdf.
Notes
This function requires either the fastparquet or pyarrow library.
-
truediv
(other, axis='columns', level=None, fill_value=None)¶ Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to
dataframe / other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.
- Parameters
- otherscalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
- axis{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’). For Series input, axis to match Series index on.
- levelint or label
Broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_valuefloat or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
- Returns
- DataFrame
Result of the arithmetic operation.
See also
DataFrame.add
Add DataFrames.
DataFrame.sub
Subtract DataFrames.
DataFrame.mul
Multiply DataFrames.
DataFrame.div
Divide DataFrames (float division).
DataFrame.truediv
Divide DataFrames (float division).
DataFrame.floordiv
Divide DataFrames (integer division).
DataFrame.mod
Calculate modulo (remainder after division).
DataFrame.pow
Calculate exponential power.
Notes
Mismatched indices will be unioned together.
-
var
(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- axis{index (0), columns (1)}
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- Returns
- Series or DataFrame (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
where
(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶ Replace values where the condition is False.
- Parameters
- condbool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- otherscalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- inplacebool, default False
Whether to perform the operation in place on the data.
- axisint, default None
Alignment axis if needed.
- levelint, default None
Alignment level if needed.
- errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.
‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.
- try_castbool, default False
Try to cast the result back to the input type (if possible).
- Returns
- Same type as caller or None if
inplace=True
.
- Same type as caller or None if
See also
DataFrame.mask()
Return an object of same shape as self.
Notes
The where method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isTrue
the element is used; otherwise the corresponding element from the DataFrameother
is used.The signature for
DataFrame.where()
differs fromnumpy.where()
. Roughlydf1.where(m, df2)
is equivalent tonp.where(m, df1, df2)
.For further details and examples see the
where
documentation in indexing.
Series¶
-
class
legate.pandas.
Series
(data=None, index=None, dtype=None, name=None, copy=False, frame=None)¶ One-dimensional distributed array with axis labels.
Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as null/NaN).
Operations between Series (+, -, /, *, **) align values based on their associated index values-– they need not be the same length. The result index will be the sorted union of the two indexes.
Series
objects are used as columns ofDataFrame
.- Parameters
- dataarray-like, Iterable, dict, or scalar value
Contains data stored in Series.
- indexarray-like or Index (1d)
Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
- dtypestr, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be inferred from data.
- namestr, optional
The name to give to the Series.
- nan_as_nullbool, Default True
If
None
/True
, convertsnp.nan
values tonull
values. IfFalse
, leavesnp.nan
values as is.- frameTable
Storage manager object used for internal purposes only
- Attributes
at
Access a single value for a row/column label pair.
axes
Return a list of the row axis labels.
cat
Accessor object for categorical properties of the Series values.
dt
Accessor object for datetimelike properties of the Series values.
dtype
Return the dtype object of the underlying data.
dtypes
Return the dtype object of the underlying data.
empty
Indicator whether DataFrame is empty.
hasnans
Return if I have any nans; enables various perf speedups.
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
The index (row labels) of the DataFrame.
loc
Access a group of rows and columns by label(s) or a boolean array.
name
Return the name of the Series.
ndim
Number of dimensions of the underlying data, by definition 1.
shape
Return a tuple of the shape of the underlying data.
size
Return an int representing the number of elements in this object.
str
Vectorized string functions for Series and Index.
values
Return Series as ndarray or ndarray-like depending on the dtype.
Methods
abs
()Return a Series/DataFrame with absolute numeric value of each element.
add
(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator add).
all
([axis, bool_only, skipna, level])Return whether all elements are True, potentially over an axis.
any
([axis, bool_only, skipna, level])Return whether any element is True, potentially over an axis.
append
(other[, ignore_index, …])Append rows of other to the end of caller, returning a new object.
astype
(dtype[, copy, errors])Cast a pandas object to a specified dtype
dtype
.bool
()Return the bool of a single element Series or DataFrame.
count
([level])Group Series using a mapper or by a Series of columns.
cummax
([axis, skipna])Return cumulative maximum over a DataFrame or Series axis.
cummin
([axis, skipna])Return cumulative minimum over a DataFrame or Series axis.
cumprod
([axis, skipna])Return cumulative product over a DataFrame or Series axis.
cumsum
([axis, skipna])Return cumulative sum over a DataFrame or Series axis.
div
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
divide
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
drop
([labels, axis, index, columns, level, …])Drop specified labels from rows or columns.
droplevel
(level[, axis])Return DataFrame with requested index / column level(s) removed.
dropna
([axis, how, thresh, subset, inplace])Remove missing values.
eq
(other[, level, fill_value, axis])Return Equal to of series and other, element-wise (binary operator eq).
equals
(other)Test whether two objects contain the same elements.
fillna
([value, method, axis, inplace, …])Fill NA/NaN values using the specified method.
floordiv
(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator floordiv).
ge
(other[, level, fill_value, axis])Return Greater than or equal to of series and other, element-wise (binary operator ge).
get
(key[, default])Get item from object for given key (ex: DataFrame column).
groupby
([by, axis, level, sort])Group Series using a mapper or by a Series of columns.
gt
(other[, level, fill_value, axis])Return Greater than of series and other, element-wise (binary operator gt).
head
([n])Return the first n rows.
isna
()Detect missing values.
isnull
()Detect missing values.
le
(other[, level, fill_value, axis])Return Less than or equal to of series and other, element-wise (binary operator le).
lt
(other[, level, fill_value, axis])Return Less than of series and other, element-wise (binary operator lt).
mask
(cond[, other, inplace, axis, level, …])Replace values where the condition is True.
max
([axis, skipna, level, numeric_only])Return the maximum of the values over the requested axis.
mean
([axis, skipna, level, numeric_only])Return the mean of the values over the requested axis.
min
([axis, skipna, level, numeric_only])Return the minimum of the values over the requested axis.
mod
(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator mod).
mul
(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator mul).
multiply
(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator mul).
ne
(other[, level, fill_value, axis])Return Not equal to of series and other, element-wise (binary operator ne).
notna
()Detect existing (non-missing) values.
notnull
()Detect existing (non-missing) values.
pow
(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator pow).
prod
([axis, skipna, level, numeric_only, …])Return the product of the values over the requested axis.
product
([axis, skipna, level, numeric_only, …])Return the product of the values over the requested axis.
radd
(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator add).
rdiv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
reset_index
([level, drop, name, inplace])Generate a new DataFrame or Series with the index reset.
rfloordiv
(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator rfloordiv).
rmod
(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator rmod).
rmul
(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator mul).
rpow
(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator rpow).
rsub
(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator rsub).
rtruediv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
set_axis
(labels[, axis, inplace])Assign desired index to given axis.
sort_index
([axis, level, ascending, …])Sort object by labels (along an axis).
sort_values
([axis, ascending, inplace, …])Sort by the values.
squeeze
([axis])Squeeze 1 dimensional axis objects into scalars.
std
([axis, skipna, level, ddof, numeric_only])Return sample standard deviation over requested axis.
sub
(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator sub).
subtract
(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator sub).
sum
([axis, skipna, level, numeric_only, …])Return the sum of the values over the requested axis.
tail
([n])Return the last n rows.
to_csv
([path_or_buf, sep, na_rep, columns, …])Write object to a comma-separated values (csv) file.
to_frame
([name])Convert Series to DataFrame.
to_numpy
([dtype, copy, na_value])A NumPy ndarray representing the values in this Series or Index.
to_pandas
([schema_only])Convert distributed Series into a Pandas Series
truediv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
var
([axis, skipna, level, ddof, numeric_only])Return unbiased variance over requested axis.
where
(cond[, other, inplace, axis, level, …])Replace values where the condition is False.
copy
-
abs
()¶ Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
- Returns
- abs
Series/DataFrame containing the absolute value of each element.
See also
numpy.absolute
Calculate the absolute value element-wise.
Notes
For
complex
inputs,1.2 + 1j
, the absolute value is \(\sqrt{ a^2 + b^2 }\).
-
add
(other, level=None, fill_value=None, axis=0)¶ Return Addition of series and other, element-wise (binary operator add).
Equivalent to
series + other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.radd
Reverse of the Addition operator, see Python documentation for more details.
-
all
(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.
- bool_onlybool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- **kwargsany, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.
See also
Series.all
Return True if all elements are True.
DataFrame.any
Return True if one (or more) elements are True.
-
any
(axis=0, bool_only=None, skipna=True, level=None, **kwargs)¶ Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
None : reduce all axes, return a scalar.
- bool_onlybool, default None
Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series.
- skipnabool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- **kwargsany, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.
See also
numpy.any
Numpy version of this method.
Series.any
Return whether any element is True.
Series.all
Return whether all elements are True.
DataFrame.any
Return whether any element is True over requested axis.
DataFrame.all
Return whether all elements are True over requested axis.
-
append
(other, ignore_index=False, verify_integrity=False, sort=False)¶ Append rows of other to the end of caller, returning a new object.
Columns in other that are not in the caller are added as new columns.
- Parameters
- otherDataFrame or Series/dict-like object, or list of these
The data to append.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
- verify_integritybool, default False
If True, raise ValueError on creating index with duplicates.
- sortbool, default False
Sort columns if the columns of self and other are not aligned.
Changed in version 1.0.0: Changed to not sort by default.
- Returns
- DataFrame
See also
concat
General function to concatenate DataFrame or Series objects.
Notes
If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.
-
astype
(dtype, copy=True, errors='raise')¶ Cast a pandas object to a specified dtype
dtype
.- Parameters
- dtypedata type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
- copybool, default True
Return a copy when
copy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects).- errors{‘raise’, ‘ignore’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype.
raise
: allow exceptions to be raisedignore
: suppress exceptions. On error return original object.
- Returns
- castedsame type as caller
See also
to_datetime
Convert argument to datetime.
to_timedelta
Convert argument to timedelta.
to_numeric
Convert argument to a numeric type.
numpy.ndarray.astype
Cast a numpy array to a specified type.
-
property
at
¶ Access a single value for a row/column label pair.
Similar to
loc
, in that both provide label-based lookups. Useat
if you only need to get or set a single value in a DataFrame or Series.- Raises
- KeyError
If ‘label’ does not exist in DataFrame.
See also
DataFrame.iat
Access a single value for a row/column pair by integer position.
DataFrame.loc
Access a group of rows and columns by label(s).
Series.at
Access a single value using a label.
-
property
axes
¶ Return a list of the row axis labels.
-
bool
()¶ Return the bool of a single element Series or DataFrame.
This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).
- Returns
- bool
The value in the Series or DataFrame.
See also
Series.astype
Change the data type of a Series, including to boolean.
DataFrame.astype
Change the data type of a DataFrame, including to boolean.
numpy.bool_
NumPy boolean data type, used by pandas for boolean values.
-
property
cat
¶ Accessor object for categorical properties of the Series values.
Be aware that assigning to categories is a inplace operation, while all methods return new categorical data per default (but can be called with inplace=True).
- Parameters
- dataSeries or CategoricalIndex
-
count
(level=None)¶ Group Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If
by
is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see.align()
method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns inself
. Notice that a tuple is interpreted as a (single) key.- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1).
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
- group_keysbool, default True
When calling apply, add group keys to index to identify pieces.
- squeezebool, default False
Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
Deprecated since version 1.1.0.
- observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups
New in version 1.1.0.
- Returns
- SeriesGroupBy
Returns a groupby object that contains information about the groups.
See also
resample
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more.
-
cummax
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative maximum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative maximum of Series or DataFrame.
See also
core.window.Expanding.max
Similar functionality but ignores
NaN
values.DataFrame.max
Return the maximum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cummin
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative minimum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative minimum of Series or DataFrame.
See also
core.window.Expanding.min
Similar functionality but ignores
NaN
values.DataFrame.min
Return the minimum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cumprod
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative product.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative product of Series or DataFrame.
See also
core.window.Expanding.prod
Similar functionality but ignores
NaN
values.DataFrame.prod
Return the product over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
cumsum
(axis=None, skipna=True, *args, **kwargs)¶ Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative sum.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’.
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- *args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.
- Returns
- Series or DataFrame
Return cumulative sum of Series or DataFrame.
See also
core.window.Expanding.sum
Similar functionality but ignores
NaN
values.DataFrame.sum
Return the sum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
-
div
(other, level=None, fill_value=None, axis=0)¶ Return Floating division of series and other, element-wise (binary operator truediv).
Equivalent to
series / other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rtruediv
Reverse of the Floating division operator, see Python documentation for more details.
-
divide
(other, level=None, fill_value=None, axis=0)¶ Return Floating division of series and other, element-wise (binary operator truediv).
Equivalent to
series / other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rtruediv
Reverse of the Floating division operator, see Python documentation for more details.
-
drop
(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')¶ Drop specified labels from rows or columns.
Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.
- Parameters
- labelssingle label or list-like
Index or column labels to drop.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
- indexsingle label or list-like
Alternative to specifying axis (
labels, axis=0
is equivalent toindex=labels
).- columnssingle label or list-like
Alternative to specifying axis (
labels, axis=1
is equivalent tocolumns=labels
).- levelint or level name, optional
For MultiIndex, level from which the labels will be removed.
- inplacebool, default False
If False, return a copy. Otherwise, do operation inplace and return None.
- errors{‘ignore’, ‘raise’}, default ‘raise’
If ‘ignore’, suppress error and only existing labels are dropped.
- Returns
- DataFrame or None
DataFrame without the removed index or column labels or None if
inplace=True
.
- Raises
- KeyError
If any of the labels is not found in the selected axis.
See also
DataFrame.loc
Label-location based indexer for selection by label.
DataFrame.dropna
Return DataFrame with labels on given axis omitted where (all or any) data are missing.
DataFrame.drop_duplicates
Return DataFrame with duplicate rows removed, optionally only considering certain columns.
Series.drop
Return Series with specified index labels removed.
-
droplevel
(level, axis=0)¶ Return DataFrame with requested index / column level(s) removed.
New in version 0.24.0.
- Parameters
- levelint, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the level(s) is removed:
0 or ‘index’: remove level(s) in column.
1 or ‘columns’: remove level(s) in row.
- Returns
- DataFrame
DataFrame with requested index / column level(s) removed.
-
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)¶ Remove missing values.
See the User Guide for more on which values are considered missing, and how to work with missing data.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.
Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
- how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.
- threshint, optional
Require that many non-NA values.
- subsetarray-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
- inplacebool, default False
If True, do operation inplace and return None.
- Returns
- DataFrame or None
DataFrame with NA entries dropped from it or None if
inplace=True
.
See also
DataFrame.isna
Indicate missing values.
DataFrame.notna
Indicate existing (non-missing) values.
DataFrame.fillna
Replace missing values.
Series.dropna
Drop missing values.
Index.dropna
Drop missing indices.
-
property
dt
¶ Accessor object for datetimelike properties of the Series values.
-
property
dtype
¶ Return the dtype object of the underlying data.
-
property
dtypes
¶ Return the dtype object of the underlying data.
-
property
empty
¶ Indicator whether DataFrame is empty.
True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
- Returns
- bool
If DataFrame is empty, return True, if not return False.
See also
Series.dropna
Return series without null values.
DataFrame.dropna
Return DataFrame with labels on given axis omitted where (all or any) data are missing.
Notes
If DataFrame contains only NaNs, it is still not considered empty. See the example below.
-
eq
(other, level=None, fill_value=None, axis=0)¶ Return Equal to of series and other, element-wise (binary operator eq).
Equivalent to
series == other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
equals
(other)¶ Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.
- Parameters
- otherSeries or DataFrame
The other Series or DataFrame to be compared with the first.
- Returns
- bool
True if all elements are the same in both objects, False otherwise.
See also
Series.eq
Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.
DataFrame.eq
Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
testing.assert_series_equal
Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal
Like assert_series_equal, but targets DataFrames.
numpy.array_equal
Return True if two arrays have the same shape and elements, False otherwise.
-
fillna
(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)¶ Fill NA/NaN values using the specified method.
- Parameters
- valuescalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
- method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap.
- axis{0 or ‘index’, 1 or ‘columns’}
Axis along which to fill missing values.
- inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
- limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
- Returns
- DataFrame or None
Object with missing values filled or None if
inplace=True
.
See also
interpolate
Fill NaN values using interpolation.
reindex
Conform object to new index.
asfreq
Convert TimeSeries to specified frequency.
-
floordiv
(other, level=None, fill_value=None, axis=0)¶ Return Integer division of series and other, element-wise (binary operator floordiv).
Equivalent to
series // other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rfloordiv
Reverse of the Integer division operator, see Python documentation for more details.
-
ge
(other, level=None, fill_value=None, axis=0)¶ Return Greater than or equal to of series and other, element-wise (binary operator ge).
Equivalent to
series >= other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
get
(key, default=None)¶ Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
- Parameters
- keyobject
- Returns
- valuesame type as items contained in object
-
groupby
(by=None, axis=0, level=None, sort=False, **kwargs)¶ Group Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters
- bymapping, function, label, or list of labels
Used to determine the groups for the groupby. If
by
is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see.align()
method). If an ndarray is passed, the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns inself
. Notice that a tuple is interpreted as a (single) key.- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1).
- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
- as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
- group_keysbool, default True
When calling apply, add group keys to index to identify pieces.
- squeezebool, default False
Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
Deprecated since version 1.1.0.
- observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups
New in version 1.1.0.
- Returns
- SeriesGroupBy
Returns a groupby object that contains information about the groups.
See also
resample
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more.
-
gt
(other, level=None, fill_value=None, axis=0)¶ Return Greater than of series and other, element-wise (binary operator gt).
Equivalent to
series > other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
property
hasnans
¶ Return if I have any nans; enables various perf speedups.
-
head
(n=5)¶ Return the first n rows.
This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
For negative values of n, this function returns all rows except the last n rows, equivalent to
df[:-n]
.- Parameters
- nint, default 5
Number of rows to select.
- Returns
- same type as caller
The first n rows of the caller object.
See also
DataFrame.tail
Returns the last n rows.
-
property
iat
¶ Access a single value for a row/column pair by integer position.
Similar to
iloc
, in that both provide integer-based lookups. Useiat
if you only need to get or set a single value in a DataFrame or Series.- Raises
- IndexError
When integer position is out of bounds.
See also
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.loc
Access a group of rows and columns by label(s).
DataFrame.iloc
Access a group of rows and columns by integer position(s).
-
property
iloc
¶ Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from0
tolength-1
of the axis), but may also be used with a boolean array.Allowed inputs are:
An integer, e.g.
5
.A list or array of integers, e.g.
[4, 3, 0]
.A slice object with ints, e.g.
1:7
.A boolean array.
A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
.iloc
will raiseIndexError
if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).See more at Selection by Position.
See also
DataFrame.iat
Fast integer location scalar accessor.
DataFrame.loc
Purely label-location based indexer for selection by label.
Series.iloc
Purely integer-location based indexing for selection by position.
-
property
index
¶ The index (row labels) of the DataFrame.
-
isna
()¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
See also
DataFrame.isnull
Alias of isna.
DataFrame.notna
Boolean inverse of isna.
DataFrame.dropna
Omit axes labels with missing values.
isna
Top-level isna.
-
isnull
()¶ Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or
numpy.NaN
, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
).- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
See also
DataFrame.isnull
Alias of isna.
DataFrame.notna
Boolean inverse of isna.
DataFrame.dropna
Omit axes labels with missing values.
isna
Top-level isna.
-
le
(other, level=None, fill_value=None, axis=0)¶ Return Less than or equal to of series and other, element-wise (binary operator le).
Equivalent to
series <= other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
property
loc
¶ Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.Allowed inputs are:
A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index).A list or array of labels, e.g.
['a', 'b', 'c']
.A slice object with labels, e.g.
'a':'f'
.Warning
Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g.
[True, False, True]
.An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)
See more at Selection by Label.
- Raises
- KeyError
If any items are not found.
- IndexingError
If an indexed key is passed and its index is unalignable to the frame index.
See also
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.iloc
Access group of rows and columns by integer position(s).
DataFrame.xs
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
Series.loc
Access group of values using labels.
-
lt
(other, level=None, fill_value=None, axis=0)¶ Return Less than of series and other, element-wise (binary operator lt).
Equivalent to
series < other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
mask
(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶ Replace values where the condition is True.
- Parameters
- condbool Series/DataFrame, array-like, or callable
Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- otherscalar, Series/DataFrame, or callable
Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- inplacebool, default False
Whether to perform the operation in place on the data.
- axisint, default None
Alignment axis if needed.
- levelint, default None
Alignment level if needed.
- errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.
‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.
- try_castbool, default False
Try to cast the result back to the input type (if possible).
- Returns
- Same type as caller or None if
inplace=True
.
- Same type as caller or None if
See also
DataFrame.where()
Return an object of same shape as self.
Notes
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isFalse
the element is used; otherwise the corresponding element from the DataFrameother
is used.The signature for
DataFrame.where()
differs fromnumpy.where()
. Roughlydf1.where(m, df2)
is equivalent tonp.where(m, df1, df2)
.For further details and examples see the
mask
documentation in indexing.
-
max
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the maximum of the values over the requested axis.
If you want the index of the maximum, use
idxmax
. This isthe equivalent of thenumpy.ndarray
methodargmax
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
mean
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the mean of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
-
min
(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶ Return the minimum of the values over the requested axis.
If you want the index of the minimum, use
idxmin
. This isthe equivalent of thenumpy.ndarray
methodargmin
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
mod
(other, level=None, fill_value=None, axis=0)¶ Return Modulo of series and other, element-wise (binary operator mod).
Equivalent to
series % other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rmod
Reverse of the Modulo operator, see Python documentation for more details.
-
mul
(other, level=None, fill_value=None, axis=0)¶ Return Multiplication of series and other, element-wise (binary operator mul).
Equivalent to
series * other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rmul
Reverse of the Multiplication operator, see Python documentation for more details.
-
multiply
(other, level=None, fill_value=None, axis=0)¶ Return Multiplication of series and other, element-wise (binary operator mul).
Equivalent to
series * other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rmul
Reverse of the Multiplication operator, see Python documentation for more details.
-
property
name
¶ Return the name of the Series.
The name of a Series becomes its index or column name if it is used to form a DataFrame. It is also used whenever displaying the Series using the interpreter.
- Returns
- label (hashable object)
The name of the Series, also the column name if part of a DataFrame.
See also
Series.rename
Sets the Series name when given a scalar input.
Index.name
Corresponding Index property.
Examples
The Series name can be set initially when calling the constructor.
>>> s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers') >>> s 0 1 1 2 2 3 Name: Numbers, dtype: int64 >>> s.name = "Integers" >>> s 0 1 1 2 2 3 Name: Integers, dtype: int64
The name of a Series within a DataFrame is its column name.
>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], ... columns=["Odd Numbers", "Even Numbers"]) >>> df Odd Numbers Even Numbers 0 1 2 1 3 4 2 5 6 >>> df["Even Numbers"].name 'Even Numbers'
-
property
ndim
¶ Number of dimensions of the underlying data, by definition 1.
-
ne
(other, level=None, fill_value=None, axis=0)¶ Return Not equal to of series and other, element-wise (binary operator ne).
Equivalent to
series != other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
-
notna
()¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
See also
DataFrame.notnull
Alias of notna.
DataFrame.isna
Boolean inverse of notna.
DataFrame.dropna
Omit axes labels with missing values.
notna
Top-level notna.
-
notnull
()¶ Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings
''
ornumpy.inf
are not considered NA values (unless you setpandas.options.mode.use_inf_as_na = True
). NA values, such as None ornumpy.NaN
, get mapped to False values.- Returns
- DataFrame
Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
See also
DataFrame.notnull
Alias of notna.
DataFrame.isna
Boolean inverse of notna.
DataFrame.dropna
Omit axes labels with missing values.
notna
Top-level notna.
-
pow
(other, level=None, fill_value=None, axis=0)¶ Return Exponential power of series and other, element-wise (binary operator pow).
Equivalent to
series ** other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rpow
Reverse of the Exponential power operator, see Python documentation for more details.
-
prod
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the product of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
product
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the product of the values over the requested axis.
- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
radd
(other, level=None, fill_value=None, axis=0)¶ Return Addition of series and other, element-wise (binary operator add).
Equivalent to
series + other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.radd
Reverse of the Addition operator, see Python documentation for more details.
-
rdiv
(other, level=None, fill_value=None, axis=0)¶ Return Floating division of series and other, element-wise (binary operator rtruediv).
Equivalent to
other / series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.truediv
Element-wise Floating division, see Python documentation for more details.
-
reset_index
(level=None, drop=False, name=None, inplace=False)¶ Generate a new DataFrame or Series with the index reset.
This is useful when the index needs to be treated as a column, or when the index is meaningless and needs to be reset to the default before another operation.
- Parameters
- levelint, str, tuple, or list, default optional
For a Series with a MultiIndex, only remove the specified levels from the index. Removes all levels by default.
- dropbool, default False
Just reset the index, without inserting it as a column in the new DataFrame.
- nameobject, optional
The name to use for the column containing the original Series values. Uses
self.name
by default. This argument is ignored when drop is True.- inplacebool, default False
Modify the Series in place (do not create a new object).
- Returns
- Series or DataFrame or None
When drop is False (the default), a DataFrame is returned. The newly created columns will come first in the DataFrame, followed by the original Series values. When drop is True, a Series is returned. In either case, if
inplace=True
, no value is returned.
See also
DataFrame.reset_index
Analogous function for DataFrame.
-
rfloordiv
(other, level=None, fill_value=None, axis=0)¶ Return Integer division of series and other, element-wise (binary operator rfloordiv).
Equivalent to
other // series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.floordiv
Element-wise Integer division, see Python documentation for more details.
-
rmod
(other, level=None, fill_value=None, axis=0)¶ Return Modulo of series and other, element-wise (binary operator rmod).
Equivalent to
other % series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.mod
Element-wise Modulo, see Python documentation for more details.
-
rmul
(other, level=None, fill_value=None, axis=0)¶ Return Multiplication of series and other, element-wise (binary operator mul).
Equivalent to
series * other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rmul
Reverse of the Multiplication operator, see Python documentation for more details.
-
rpow
(other, level=None, fill_value=None, axis=0)¶ Return Exponential power of series and other, element-wise (binary operator rpow).
Equivalent to
other ** series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.pow
Element-wise Exponential power, see Python documentation for more details.
-
rsub
(other, level=None, fill_value=None, axis=0)¶ Return Subtraction of series and other, element-wise (binary operator rsub).
Equivalent to
other - series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.sub
Element-wise Subtraction, see Python documentation for more details.
-
rtruediv
(other, level=None, fill_value=None, axis=0)¶ Return Floating division of series and other, element-wise (binary operator rtruediv).
Equivalent to
other / series
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.truediv
Element-wise Floating division, see Python documentation for more details.
-
set_axis
(labels, axis=0, inplace=False)¶ Assign desired index to given axis.
Indexes for column or row labels can be changed by assigning a list-like or Index.
- Parameters
- labelslist-like, Index
The values for the new index.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
- inplacebool, default False
Whether to return a new DataFrame instance.
- Returns
- renamedDataFrame or None
An object of type DataFrame or None if
inplace=True
.
See also
DataFrame.rename_axis
Alter the name of the index or columns.
-
property
shape
¶ Return a tuple of the shape of the underlying data.
-
property
size
¶ Return an int representing the number of elements in this object.
Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.
See also
ndarray.size
Number of elements in the array.
-
sort_index
(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: legate.pandas.frontend.frame.Frame.bool = False)¶ Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if inplace argument is
False
, otherwise updates the original DataFrame and returns None.- Parameters
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
- levelint or level name or list of ints or list of level names
If not None, sort on values in specified index level(s).
- ascendingbool or list-like of bools, default True
Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
- inplacebool, default False
If True, perform operation in-place.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
- na_position{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
- sort_remainingbool, default True
If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- keycallable, optional
If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect anIndex
and return anIndex
of the same shape. For MultiIndex inputs, the key is applied per level.New in version 1.1.0.
- Returns
- DataFrame or None
The original DataFrame sorted by the labels or None if
inplace=True
.
See also
Series.sort_index
Sort Series by the index.
DataFrame.sort_values
Sort DataFrame by the value.
Series.sort_values
Sort Series by the value.
-
sort_values
(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index: bool = False)¶ Sort by the values.
Sort a Series in ascending or descending order by some criterion.
- Parameters
- axis{0 or ‘index’}, default 0
Axis to direct sorting. The value ‘index’ is accepted for compatibility with DataFrame.sort_values.
- ascendingbool or list of bools, default True
If True, sort values in ascending order, otherwise descending.
- inplacebool, default False
If True, perform operation in-place.
- kind{‘quicksort’, ‘mergesort’ or ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also
numpy.sort()
for more information. ‘mergesort’ is the only stable algorithm.- na_position{‘first’ or ‘last’}, default ‘last’
Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
- keycallable, optional
If not None, apply the key function to the series values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect aSeries
and return an array-like.New in version 1.1.0.
- Returns
- Series or None
Series ordered by values or None if
inplace=True
.
See also
Series.sort_index
Sort by the Series indices.
DataFrame.sort_values
Sort DataFrame by the values along either axis.
DataFrame.sort_index
Sort DataFrame by indices.
-
squeeze
(axis=None)¶ Squeeze 1 dimensional axis objects into scalars.
Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.
This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.
- Parameters
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
A specific axis to squeeze. By default, all length-1 axes are squeezed.
- Returns
- DataFrame, Series, or scalar
The projection after squeezing axis or all the axes.
See also
Series.iloc
Integer-location based indexing for selecting scalars.
DataFrame.iloc
Integer-location based indexing for selecting Series.
Series.to_frame
Inverse of DataFrame.squeeze for a single-column DataFrame.
-
std
(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- axis{index (0), columns (1)}
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- Returns
- Series or DataFrame (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
property
str
¶ Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.
-
sub
(other, level=None, fill_value=None, axis=0)¶ Return Subtraction of series and other, element-wise (binary operator sub).
Equivalent to
series - other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rsub
Reverse of the Subtraction operator, see Python documentation for more details.
-
subtract
(other, level=None, fill_value=None, axis=0)¶ Return Subtraction of series and other, element-wise (binary operator sub).
Equivalent to
series - other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rsub
Reverse of the Subtraction operator, see Python documentation for more details.
-
sum
(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)¶ Return the sum of the values over the requested axis.
This is equivalent to the method
numpy.sum
.- Parameters
- axis{index (0), columns (1)}
Axis for the function to be applied on.
- skipnabool, default True
Exclude NA/null values when computing the result.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than
min_count
non-NA values are present the result will be NA.- **kwargs
Additional keyword arguments to be passed to the function.
- Returns
- Series or DataFrame (if level specified)
See also
Series.sum
Return the sum.
Series.min
Return the minimum.
Series.max
Return the maximum.
Series.idxmin
Return the index of the minimum.
Series.idxmax
Return the index of the maximum.
DataFrame.sum
Return the sum over the requested axis.
DataFrame.min
Return the minimum over the requested axis.
DataFrame.max
Return the maximum over the requested axis.
DataFrame.idxmin
Return the index of the minimum over the requested axis.
DataFrame.idxmax
Return the index of the maximum over the requested axis.
-
tail
(n=5)¶ Return the last n rows.
This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
For negative values of n, this function returns all rows except the first n rows, equivalent to
df[n:]
.- Parameters
- nint, default 5
Number of rows to select.
- Returns
- type of caller
The last n rows of the caller object.
See also
DataFrame.head
The first n rows of the caller object.
-
to_csv
(path_or_buf=None, sep=',', na_rep='', columns=None, header=True, index=True, line_terminator=None, chunksize=None, partition=False)¶ Write object to a comma-separated values (csv) file.
Changed in version 0.24.0: The order of arguments for Series was changed.
- Parameters
- path_or_bufstr or file handle, default None
File path or object, if None is provided the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
Changed in version 0.24.0: Was previously named “path” for Series.
Changed in version 1.2.0: Support for binary file objects was introduced.
- sepstr, default ‘,’
String of length 1. Field delimiter for the output file.
- na_repstr, default ‘’
Missing data representation.
- float_formatstr, default None
Format string for floating point numbers.
- columnssequence, optional
Columns to write.
- headerbool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
Changed in version 0.24.0: Previously defaulted to False for Series.
- indexbool, default True
Write row names (index).
- index_labelstr or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
- modestr
Python write mode, default ‘w’.
- encodingstr, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
- compressionstr or dict, default ‘infer’
If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.
Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.
Changed in version 1.2.0: Compression is supported for binary file objects.
Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime.
- quotingoptional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
- quotecharstr, default ‘"’
String of length 1. Character used to quote fields.
- line_terminatorstr, optional
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).
Changed in version 0.24.0.
- chunksizeint or None
Rows to write at a time.
- date_formatstr, default None
Format string for datetime objects.
- doublequotebool, default True
Control quoting of quotechar inside a field.
- escapecharstr, default None
String of length 1. Character used to escape sep and quotechar when appropriate.
- decimalstr, default ‘.’
Character recognized as decimal separator. E.g. use ‘,’ for European data.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.New in version 1.1.0.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by
fsspec
, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.New in version 1.2.0.
- Returns
- None or str
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
See also
read_csv
Load a CSV file into a DataFrame.
to_excel
Write DataFrame to an Excel file.
-
to_frame
(name=None)¶ Convert Series to DataFrame.
- Parameters
- nameobject, default None
The passed name should substitute for the series name (if it has one).
- Returns
- DataFrame
DataFrame representation of Series.
-
to_numpy
(dtype=None, copy=False, na_value=<object object>, **kwargs)¶ A NumPy ndarray representing the values in this Series or Index.
New in version 0.24.0.
- Parameters
- dtypestr or numpy.dtype, optional
The dtype to pass to
numpy.asarray()
.- copybool, default False
Whether to ensure that the returned value is not a view on another array. Note that
copy=False
does not ensure thatto_numpy()
is no-copy. Rather,copy=True
ensure that a copy is made, even if not strictly necessary.- na_valueAny, optional
The value to use for missing values. The default value depends on dtype and the type of the array.
New in version 1.0.0.
- **kwargs
Additional keywords passed through to the
to_numpy
method of the underlying array (for extension arrays).New in version 1.0.0.
- Returns
- numpy.ndarray
See also
Series.array
Get the actual data stored within.
Index.array
Get the actual data stored within.
DataFrame.to_numpy
Similar method for DataFrame.
Notes
The returned array will be the same up to equality (values equal in self will be equal in the returned array; likewise for values that are not equal). When self contains an ExtensionArray, the dtype may be different. For example, for a category-dtype Series,
to_numpy()
will return a NumPy array and the categorical dtype will be lost.For NumPy dtypes, this will be a reference to the actual data stored in this Series or Index (assuming
copy=False
). Modifying the result in place will modify the data stored in the Series or Index (not that we recommend doing that).For extension types,
to_numpy()
may require copying data and coercing the result to a NumPy type (possibly object), which may be expensive. When you need a no-copy reference to the underlying data,Series.array
should be used instead.This table lays out the different dtypes and default return types of
to_numpy()
for various dtypes within pandas.dtype
array type
category[T]
ndarray[T] (same dtype as input)
period
ndarray[object] (Periods)
interval
ndarray[object] (Intervals)
IntegerNA
ndarray[object]
datetime64[ns]
datetime64[ns]
datetime64[ns, tz]
ndarray[object] (Timestamps)
-
to_pandas
(schema_only=False)¶ Convert distributed Series into a Pandas Series
- Parameters
- schema_onlyDoesn’t convert the data when True
- Returns
- outpandas.Series
-
truediv
(other, level=None, fill_value=None, axis=0)¶ Return Floating division of series and other, element-wise (binary operator truediv).
Equivalent to
series / other
, but with support to substitute a fill_value for missing data in either one of the inputs.- Parameters
- otherSeries or scalar value
- fill_valueNone or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
- levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.
- Returns
- Series
The result of the operation.
See also
Series.rtruediv
Reverse of the Floating division operator, see Python documentation for more details.
-
property
values
¶ Return Series as ndarray or ndarray-like depending on the dtype.
- Returns
- outnumpy.ndarray or ndarray-like
-
var
(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)¶ Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters
- axis{index (0), columns (1)}
- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- levelint or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- Returns
- Series or DataFrame (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
-
where
(cond, other=None, inplace=False, axis=None, level=None, errors='raise', try_cast=False)¶ Replace values where the condition is False.
- Parameters
- condbool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- otherscalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
- inplacebool, default False
Whether to perform the operation in place on the data.
- axisint, default None
Alignment axis if needed.
- levelint, default None
Alignment level if needed.
- errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
Note that currently this parameter won’t affect the results and will always coerce to a suitable dtype.
‘raise’ : allow exceptions to be raised.
‘ignore’ : suppress exceptions. On error return original object.
- try_castbool, default False
Try to cast the result back to the input type (if possible).
- Returns
- Same type as caller or None if
inplace=True
.
- Same type as caller or None if
See also
DataFrame.mask()
Return an object of same shape as self.
Notes
The where method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isTrue
the element is used; otherwise the corresponding element from the DataFrameother
is used.The signature for
DataFrame.where()
differs fromnumpy.where()
. Roughlydf1.where(m, df2)
is equivalent tonp.where(m, df1, df2)
.For further details and examples see the
where
documentation in indexing.
IO¶
-
legate.pandas.
read_csv
(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, prefix=None, mangle_dupe_cols=True, dtype=None, true_values=None, false_values=None, skiprows=None, skipfooter=0, nrows=None, na_values=None, skip_blank_lines=True, parse_dates=False, compression='infer', quotechar='"', quoting=0, doublequote=True, verify_header=False, **kwargs)¶ Read a comma-separated values (csv) file into DataFrame.
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for IO Tools.
- Parameters
- filepath_or_bufferstr, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
If you want to pass in a path object, pandas accepts any
os.PathLike
.By file-like object, we refer to objects with a
read()
method, such as a file handle (e.g. via builtinopen
function) orStringIO
.- sepstr, default ‘,’
Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool,
csv.Sniffer
. In addition, separators longer than 1 character and different from'\s+'
will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example:'\r\t'
.- delimiterstr, default
None
Alias for sep.
- headerint, list of int, default ‘infer’
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to
header=0
and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical toheader=None
. Explicitly passheader=0
to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines ifskip_blank_lines=True
, soheader=0
denotes the first line of data rather than the first line of the file.- namesarray-like, optional
List of column names to use. If the file contains a header row, then you should explicitly pass
header=0
to override the column names. Duplicates in this list are not allowed.- index_colint, str, sequence of int / str, or False, default
None
Column(s) to use as the row labels of the
DataFrame
, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.Note:
index_col=False
can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.- usecolslist-like or callable, optional
Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be
[0, 1, 2]
or['foo', 'bar', 'baz']
. Element order is ignored, sousecols=[0, 1]
is the same as[1, 0]
. To instantiate a DataFrame fromdata
with element order preserved usepd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]
for columns in['foo', 'bar']
order orpd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]
for['bar', 'foo']
order.If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be
lambda x: x.upper() in ['AAA', 'BBB', 'DDD']
. Using this parameter results in much faster parsing time and lower memory usage.- squeezebool, default False
If the parsed data only contains one column then return a Series.
- prefixstr, optional
Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …
- mangle_dupe_colsbool, default True
Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.
- dtypeType name or dict of column -> type, optional
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
- engine{‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
- convertersdict, optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
- true_valueslist, optional
Values to consider as True.
- false_valueslist, optional
Values to consider as False.
- skipinitialspacebool, default False
Skip spaces after delimiter.
- skiprowslist-like, int or callable, optional
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
.- skipfooterint, default 0
Number of lines at bottom of file to skip (Unsupported with engine=’c’).
- nrowsint, optional
Number of rows of file to read. Useful for reading pieces of large files.
- na_valuesscalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
- keep_default_nabool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:
If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.
- na_filterbool, default True
Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.
- verbosebool, default False
Indicate number of NA values placed in non-numeric columns.
- skip_blank_linesbool, default True
If True, skip over blank lines rather than interpreting as NaN values.
- parse_datesbool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use
pd.to_datetime
afterpd.read_csv
. To parse an index or column with a mixture of timezones, specifydate_parser
to be a partially-appliedpandas.to_datetime()
withutc=True
. See io.csv.mixed_timezones for more.Note: A fast-path exists for iso8601-formatted dates.
- infer_datetime_formatbool, default False
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
- keep_date_colbool, default False
If True and parse_dates specifies combining multiple columns then keep the original columns.
- date_parserfunction, optional
Function to use for converting a sequence of string columns to an array of datetime instances. The default uses
dateutil.parser.parser
to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.- dayfirstbool, default False
DD/MM format dates, international and European format.
- cache_datesbool, default True
If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets.
New in version 0.25.0.
- iteratorbool, default False
Return TextFileReader object for iteration or getting chunks with
get_chunk()
.Changed in version 1.2:
TextFileReader
is a context manager.- chunksizeint, optional
Return TextFileReader object for iteration. See the IO Tools docs for more information on
iterator
andchunksize
.Changed in version 1.2:
TextFileReader
is a context manager.- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
- thousandsstr, optional
Thousands separator.
- decimalstr, default ‘.’
Character to recognize as decimal point (e.g. use ‘,’ for European data).
- lineterminatorstr (length 1), optional
Character to break file into lines. Only valid with C parser.
- quotecharstr (length 1), optional
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
- quotingint or csv.QUOTE_* instance, default 0
Control field quoting behavior per
csv.QUOTE_*
constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).- doublequotebool, default
True
When quotechar is specified and quoting is not
QUOTE_NONE
, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a singlequotechar
element.- escapecharstr (length 1), optional
One-character string used to escape other characters.
- commentstr, optional
Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as
skip_blank_lines=True
), fully commented lines are ignored by the parameter header but not by skiprows. For example, ifcomment='#'
, parsing#empty\na,b,c\n1,2,3
withheader=0
will result in ‘a,b,c’ being treated as the header.- encodingstr, optional
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings . .. versionchanged:: 1.2
When
encoding
isNone
,errors="replace"
is passed toopen()
. Otherwise,errors="strict"
is passed toopen()
. This behavior was previously only the case forengine="python"
.- dialectstr or csv.Dialect, optional
If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.
- error_bad_linesbool, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned.
- warn_bad_linesbool, default True
If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output.
- delim_whitespacebool, default False
Specifies whether or not whitespace (e.g.
' '
or' '
) will be used as the sep. Equivalent to settingsep='\s+'
. If this option is set to True, nothing should be passed in for thedelimiter
parameter.- low_memorybool, default True
Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).
- memory_mapbool, default False
If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.
- float_precisionstr, optional
Specifies which converter the C engine should use for floating-point values. The options are
None
or ‘high’ for the ordinary converter, ‘legacy’ for the original lower precision pandas converter, and ‘round_trip’ for the round-trip converter.Changed in version 1.2.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by
fsspec
, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.New in version 1.2.
- Returns
- DataFrame or TextParser
A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes.
See also
DataFrame.to_csv
Write DataFrame to a comma-separated values (csv) file.
read_csv
Read a comma-separated values (csv) file into DataFrame.
read_fwf
Read a table of fixed-width formatted lines into DataFrame.
-
legate.pandas.
read_parquet
(path, columns=None, **kwargs)¶ Load a parquet object from the file path, returning a DataFrame.
- Parameters
- pathstr, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be:
file://localhost/path/to/table.parquet
. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be:file://localhost/path/to/tables
ors3://bucket/partition_dir
If you want to pass in a path object, pandas accepts any
os.PathLike
.By file-like object, we refer to objects with a
read()
method, such as a file handle (e.g. via builtinopen
function) orStringIO
.- engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.- columnslist, default=None
If not None, only these columns will be read from the file.
- use_nullable_dtypesbool, default False
If True, use dtypes that use
pd.NA
as missing value indicator for the resulting DataFrame (only applicable forengine="pyarrow"
). As new dtypes are added that supportpd.NA
in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice.New in version 1.2.0.
- **kwargs
Any additional kwargs are passed to the engine.
- Returns
- DataFrame
-
legate.pandas.
read_table
(filepath_or_buffer, sep='\t', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, true_values=None, false_values=None, na_values=None, skip_blank_lines=True, parse_dates=False, compression='infer', quotechar='"', quoting=0, skipfooter=0, skiprows=None, nrows=None, doublequote=True, **kwargs)¶ Read general delimited file into DataFrame.
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for IO Tools.
- Parameters
- filepath_or_bufferstr, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
If you want to pass in a path object, pandas accepts any
os.PathLike
.By file-like object, we refer to objects with a
read()
method, such as a file handle (e.g. via builtinopen
function) orStringIO
.- sepstr, default ‘\t’ (tab-stop)
Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool,
csv.Sniffer
. In addition, separators longer than 1 character and different from'\s+'
will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example:'\r\t'
.- delimiterstr, default
None
Alias for sep.
- headerint, list of int, default ‘infer’
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to
header=0
and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical toheader=None
. Explicitly passheader=0
to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines ifskip_blank_lines=True
, soheader=0
denotes the first line of data rather than the first line of the file.- namesarray-like, optional
List of column names to use. If the file contains a header row, then you should explicitly pass
header=0
to override the column names. Duplicates in this list are not allowed.- index_colint, str, sequence of int / str, or False, default
None
Column(s) to use as the row labels of the
DataFrame
, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.Note:
index_col=False
can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.- usecolslist-like or callable, optional
Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be
[0, 1, 2]
or['foo', 'bar', 'baz']
. Element order is ignored, sousecols=[0, 1]
is the same as[1, 0]
. To instantiate a DataFrame fromdata
with element order preserved usepd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]
for columns in['foo', 'bar']
order orpd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]
for['bar', 'foo']
order.If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be
lambda x: x.upper() in ['AAA', 'BBB', 'DDD']
. Using this parameter results in much faster parsing time and lower memory usage.- squeezebool, default False
If the parsed data only contains one column then return a Series.
- prefixstr, optional
Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …
- mangle_dupe_colsbool, default True
Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.
- dtypeType name or dict of column -> type, optional
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
- engine{‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
- convertersdict, optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
- true_valueslist, optional
Values to consider as True.
- false_valueslist, optional
Values to consider as False.
- skipinitialspacebool, default False
Skip spaces after delimiter.
- skiprowslist-like, int or callable, optional
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
.- skipfooterint, default 0
Number of lines at bottom of file to skip (Unsupported with engine=’c’).
- nrowsint, optional
Number of rows of file to read. Useful for reading pieces of large files.
- na_valuesscalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
- keep_default_nabool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:
If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.
- na_filterbool, default True
Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.
- verbosebool, default False
Indicate number of NA values placed in non-numeric columns.
- skip_blank_linesbool, default True
If True, skip over blank lines rather than interpreting as NaN values.
- parse_datesbool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use
pd.to_datetime
afterpd.read_csv
. To parse an index or column with a mixture of timezones, specifydate_parser
to be a partially-appliedpandas.to_datetime()
withutc=True
. See io.csv.mixed_timezones for more.Note: A fast-path exists for iso8601-formatted dates.
- infer_datetime_formatbool, default False
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
- keep_date_colbool, default False
If True and parse_dates specifies combining multiple columns then keep the original columns.
- date_parserfunction, optional
Function to use for converting a sequence of string columns to an array of datetime instances. The default uses
dateutil.parser.parser
to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.- dayfirstbool, default False
DD/MM format dates, international and European format.
- cache_datesbool, default True
If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets.
New in version 0.25.0.
- iteratorbool, default False
Return TextFileReader object for iteration or getting chunks with
get_chunk()
.Changed in version 1.2:
TextFileReader
is a context manager.- chunksizeint, optional
Return TextFileReader object for iteration. See the IO Tools docs for more information on
iterator
andchunksize
.Changed in version 1.2:
TextFileReader
is a context manager.- compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
- thousandsstr, optional
Thousands separator.
- decimalstr, default ‘.’
Character to recognize as decimal point (e.g. use ‘,’ for European data).
- lineterminatorstr (length 1), optional
Character to break file into lines. Only valid with C parser.
- quotecharstr (length 1), optional
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
- quotingint or csv.QUOTE_* instance, default 0
Control field quoting behavior per
csv.QUOTE_*
constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).- doublequotebool, default
True
When quotechar is specified and quoting is not
QUOTE_NONE
, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a singlequotechar
element.- escapecharstr (length 1), optional
One-character string used to escape other characters.
- commentstr, optional
Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as
skip_blank_lines=True
), fully commented lines are ignored by the parameter header but not by skiprows. For example, ifcomment='#'
, parsing#empty\na,b,c\n1,2,3
withheader=0
will result in ‘a,b,c’ being treated as the header.- encodingstr, optional
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings . .. versionchanged:: 1.2
When
encoding
isNone
,errors="replace"
is passed toopen()
. Otherwise,errors="strict"
is passed toopen()
. This behavior was previously only the case forengine="python"
.- dialectstr or csv.Dialect, optional
If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.
- error_bad_linesbool, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned.
- warn_bad_linesbool, default True
If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output.
- delim_whitespacebool, default False
Specifies whether or not whitespace (e.g.
' '
or' '
) will be used as the sep. Equivalent to settingsep='\s+'
. If this option is set to True, nothing should be passed in for thedelimiter
parameter.- low_memorybool, default True
Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless, use the chunksize or iterator parameter to return the data in chunks. (Only valid with C parser).
- memory_mapbool, default False
If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.
- float_precisionstr, optional
Specifies which converter the C engine should use for floating-point values. The options are
None
or ‘high’ for the ordinary converter, ‘legacy’ for the original lower precision pandas converter, and ‘round_trip’ for the round-trip converter.Changed in version 1.2.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by
fsspec
, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.New in version 1.2.
- Returns
- DataFrame or TextParser
A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes.
See also
DataFrame.to_csv
Write DataFrame to a comma-separated values (csv) file.
read_csv
Read a comma-separated values (csv) file into DataFrame.
read_fwf
Read a table of fixed-width formatted lines into DataFrame.
Utility functions¶
-
legate.pandas.
concat
(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)¶ Concatenate pandas objects along a particular axis with optional set logic along the other axes.
Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
- Parameters
- objsa sequence or mapping of Series or DataFrame objects
If a mapping is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised.
- axis{0/’index’, 1/’columns’}, default 0
The axis to concatenate along.
- join{‘inner’, ‘outer’}, default ‘outer’
How to handle indexes on other axis (or axes).
- ignore_indexbool, default False
If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
- keyssequence, default None
If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.
- levelslist of sequences, default None
Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.
- nameslist, default None
Names for the levels in the resulting hierarchical index.
- verify_integritybool, default False
Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.
- sortbool, default False
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when
join='inner'
, which already preserves the order of the non-concatenation axis.Changed in version 1.0.0: Changed to not sort by default.
- copybool, default True
If False, do not copy data unnecessarily.
- Returns
- object, type of objs
When concatenating all
Series
along the index (axis=0), aSeries
is returned. Whenobjs
contains at least oneDataFrame
, aDataFrame
is returned. When concatenating along the columns (axis=1), aDataFrame
is returned.
See also
Series.append
Concatenate Series.
DataFrame.append
Concatenate DataFrames.
DataFrame.join
Join DataFrames using indexes.
DataFrame.merge
Merge DataFrames by indexes or columns.
Notes
The keys, levels, and names arguments are all optional.
A walkthrough of how this method fits in with other tools for combining pandas objects can be found here.
-
legate.pandas.
to_datetime
(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)¶ Convert argument to datetime.
- Parameters
- argint, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
The object to convert to a datetime.
- errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
If ‘ignore’, then invalid parsing will return the input.
- dayfirstbool, default False
Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
- yearfirstbool, default False
Specify a date parse order if arg is str or its list-likes.
If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.
If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).
Warning: yearfirst=True is not strict, but will prefer to parse with year first (this is a known bug, based on dateutil behavior).
- utcbool, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).
- formatstr, default None
The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. See strftime documentation for more information on choices: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
- exactbool, True by default
Behaves as: - If True, require an exact format match. - If False, allow the format to match anywhere in the target string.
- unitstr, default ‘ns’
The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.
- infer_datetime_formatbool, default False
If True and no format is given, attempt to infer the format of the datetime strings based on the first non-NaN element, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.
- originscalar, default ‘unix’
Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date.
If ‘unix’ (or POSIX) time; origin is set to 1970-01-01.
If ‘julian’, unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC.
If Timestamp convertible, origin is set to Timestamp identified by origin.
- cachebool, default True
If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. The cache is only used when there are at least 50 values. The presence of out-of-bounds values will render the cache unusable and may slow down parsing.
Changed in version 0.25.0: - changed default value from False to True.
- Returns
- datetime
If parsing succeeded. Return type depends on input:
list-like: DatetimeIndex
Series: Series of datetime64 dtype
scalar: Timestamp
In case when it is not possible to return designated types (e.g. when any element of input is before Timestamp.min or after Timestamp.max) return will have datetime.datetime type (or corresponding array/Series).
See also
DataFrame.astype
Cast argument to a specified dtype.
to_timedelta
Convert argument to timedelta.
convert_dtypes
Convert dtypes.