Title.png

In this session we will look at how to use Python to explore data.

In designing this version of the material I used a combination of the results from the survey hopefully most of you took a few weeks back. Plus I looked at many of the questions, issues, desires of people who have come into the CRDDS Consulting Hours over the past few years.

I have designed this for beginning to intermediate people but should also be 100% followable for people with absolutely zero Python, or any coding experience. Just ignore the fact that we are programming and all the subtleties that coding requires and just pay attention to the flow of the process and you all 'should' be golden.

The material can be broken up into five main chapters.

I'm assuming everyone has Python up and running Python up and running if not please let me know now so we can get you going.

A quick tour of Jupyter Lab.

For a MUCH deeper look into the wonderful world of Jupyter Lab, I will be running a workshop on it on Sept 22 from 10:00 - 11:30.
Right now via Zoom but maybe in Hybrid live/Remote (yet to be determined).

For more information please visit our CRDDS Events page (https://www.colorado.edu/crdds/events).

Jupyter command line interfacing.

Sometimes you need to do something outside the normal realm of Jupyter Notebooks.
One major example we will take advantage of right now, and that's the ability to install packages from Jupyter.
For instance we can use pip, or conda, but for speed please consider always using pip from Jupyter else jump back out to the command line.
There's a delimiter to tell Jupyter that you want to invoke command line commands and that's and bang/explanation mark, '!' example:
!pip install pandas

We will be using numerous packages in this session which probably did not come with your installation of Python and we need to install them.
In the next cell is numerous commented out lines (lines starting with '#'.
If you need to install these packages just delete the '#' from the beginning of the line and when you have uncommented all the desired lines just execute the cell.

This can take some time so just run the cell if needed and by the time we are ready to start running code it should be done.

An Introduction to Important Python Concepts.

Main data structures - Lists, Tuples, and Dictionaries

Actually lets step back a little bit first and just look at simple variables.

A variable is just a user defined 'name' which which references some sort of data.
For example:

foo = 37 # The true answer to life the universe and just everything

Here we have created a variable named foo and stored the value of 37 into it. It is followed by a comment on what the value represents.

The true power, beauty, and sadly part of its dark side, is EVERYTHING in Python is a non-typed object. Where a type can be defined as as a bunch of text, an integer value, a float value, a long list or matrix of values or well anything you can dream up. In Python, at least until just now, you never worry about defining the type of data you are creating and/or using like you do in most every other language. Python then 'interpret' (spelled best guess but almost correctly) the type for you behind the scenes. Hence why Python is know as an interpreted language as compared to a compiled language.

In the past few releases and even more so in the upcoming 3.10 version of Python the concept of type-hinting has been made available to the hardcore coders that seek performance and data type safety. However due to how it works even deeper under the covers (the GIL if you know a bit about Python) even these type-hints are just that 'hints' and Python could conceivably interpret something different at run time. It won't but it could!

Beyond storing a single piece of data into a variable we have specialized containers to help us out. There are many but the vast majority of time the base containers you will use are;

Now the real fun comes in when you understand that you can compose each of these containers into super containers which hold all kinds of other containers.
For example;

and well anything you can dream up!

For data science the two you will use almost all the time are list and dictionaries.
Well till we dive into Pandas and dataframes which are different yet under the covers are just composed of standard containers!

There are several other main Python features we need to discuss, such as for-loops, functions, magics, and others. But we will explore those when we run into them.

Obtaining Data

We will start off easy and look at downloading data from a Python package.
The good news is for many major sources of data there are already Python packages written to obtain and often work with that data.
For Census data there a re numerous different packages to help you out.
Often you nay resort to using multiple different packages to get all the data you are after and then combine them into one grand dataset that fits your needs.

The censusdata Package for Downloading Census Data.

For this example we will use a package called 'censusdata'.

We start off, as you almost always do with importing packages we will need for you project.

Notice I received a rather common error. "ModuleNotFoundError: No module named 'censusdata'"
I do not have censusdata installed on my system (mainly because I removed it before we started today).
If you are curious you can do it this way;

!pip uninstall package-name --yes
The '--yes' tells pip to automatically say yes when it asks you 'Proceed (y/n)?'

With Jupyter Lab all I need to do is what hopefully have already done or are doing right now. Install the package.

There are lots of ways of importing data the main way is;

It's primary function is the 'search()' function which is designed to help you find and obtain the subset of census data you may want.

Using censusdata to 'search' for census data

search(src, year, field, criterion, tabletype='detail')

Arguments:

Returns:

So the $64,000 question is how did I know this?
Google? Great guess and often the one you will use but no!

Lets do our first search.

Notice that 'sample' is saved as a list of tuples!
Remember that means that everything is 'indexed' (starting from 0) so we can 'slice' the data to look at it.

First off we will just look at the first full tuple in the list.
NOTE to access the index of a list, or tuple you use '[]' with the index number inside it

Since the tuple inside of it is also composed of several items we can access them with slicing too with a second set of '[]'.
This time we will look at the second item in the first tuple.

What if we want to look at say the first five items in a list.
In this case slicing allows us to give a range of index values as such;
[start index : end index]

What if we want to look at things at the bottom of a long list and we have no idea how long that list is?
In this case we add '-' to the index value and assume the last entry is index 0 too (its not but we essentially are telling python 'make it happen!'

In case you have not guessed it you can range the bottom values too.
Lets look at the 3rd-7th from the bottom.

Buggers, I guess even Python is not perfect!
You 'should' start at -0, the last item if starting from the bottom.
Just as importantly you only use '-0' for the start value. The end value is just the offset from the start value regardless of which end of the data you are working on.

Back to the census data the censusdata package provides a cool pretty print function that displays the data in a nice table.
The function is called 'printable'.
You can use like this in conjunction with searching for data.
Note I am not assigning the results to a variable as I'm just hunting for data I may want at this point.

censusdata has functions that provide state and county code information.
Lets get the get the state codes for each state.

For loops

for loops are a way of iterating through objects such as loops, tuples, dictionaries and pretty much any container.
for loops are funky, spelled vastly richer, then pretty much any other language namely because wee have no type casting on our variable/objects. Python predicts it for us.
This we do not need to do the normal indexing and counting we do in other languages.

If I wanted to loop through a set of numbers and print them out I can do this, using a 'range' of numbers for my set.

We can iterate through a list via python background juju like this;

There are a billion other ways to preform for loops, and they are not the only form of iterating.
We will see and explore other methods throughout this session.

Printing conventions I will be using

Remember I welcomed you into the Python metaverse with a million ways to get 1 thing done?
Well printing is this way for sure. There are a million ways to format printing, maybe two million. And its always evolving!
Throughout this session I will use the most modern take on print formatting which looks like this.

Iterating through a dictionary

Recall a dictionary consists of a matching key and value pairing.
To iterate through a dictionary getting both the keys and values we use a special dictionary function called '.items()' which returns the key and val for each iterated item.

Looking at the 'keys' of a dictionary

Lets say we want a 'list' of all the counties in Colorado derived from the census data.
Yes there are other ways to get counties but we have this data so we should use it especially because in a real use case we will want to do something with the counties in conjunction with other bits of data.

To do this we just call the dictionaries 'keys()' function.

Likewise we could look at just all the values using the dictionary function '.values()'

How to get a single specific key

To look at a single key, or value, in a dictionary you can use the index of the item you want. However you first need to 'cast' all the keys to a 'list'.

Then you specify the index of the item you want.

This is wonderful but if and only if you know the index value and who the heck actually ever knows that without looking and counting.
We will explore much better ways to 'query' for values shortly. But there are other cool things you can do with indices.

But what we want is just the county and not the state.
Happily this is easy to do since they are all separated by a comma. So we can use a special list function 'split()'.

split() will create a new list from the string we wish to split and we can specify the character or characters we wish to split the string into.

Since we want just the county which is the left, or first, part of the string, we can specify the index [0] to get the county from the split list

Now lets put it all together to create a list of Colorado counties.

BeautifulSoup and WebScrapping

So packages can be cool but what if you want data that does not have a package or data from a website?
Again there's a ton of different packages out there but the defacto standard is BeautifulSoup which helps you scrape web information.

To start off we need to import 2 packages.

If we look at this page: https://en.wikipedia.org/wiki/2020_United_States_census
We find there's table on this page 'Population and population change in the United States by state ' which is of interest to use (for demo purposes really).

We can use BeatifulSoup to get just this table and move it into a dataframe which will lead us into the wonderful world of Pandas!

We can now take a look at what the scrapped page looks like - pretty much the raw html code!

What we want is just the table but we need to find some info to grab it. Namely we need the table class name.

Now that we have the class name we can extract just the table html code.

Looking at our results we find just the code for that table and nothing else from the page.

What we want to do now is 'clean' all this data and build a nice dataframe (a special table) which we can easily work with.

But first!

A Quick Introduction to Python Functions

One of the main goals of all coding/programming, is to build reusable bits of code. This takes the form of functions, classes, templates, ect...
Functions are the simplest and most commonly used form of reusable code and this is what you will work with a lot.
In fact you have already been using them in most everything you have done in Python, R, or whatever language you have worked with, even JavaScript, CSS, ect...

In Python the basic form for a function looks like this;

def function_name(optional arguments): -- your code -- return value (optional)

Note: This is for non-class oriented functions. For classes theres this concept of 'self' which needs to be dealt with but we do need to worry about that for today.

We want to create a special function which will parse the raw html code in such a way as to build a list of lists which will represent our table of data (a matrix for math minded folks).

Notice that when we try to execute the cell with the function in it NOTHING happens.
The function is just a container for code that just sits there till we 'call' it into action.

Now we can store this list of list as an all important Pandas dataframe.

An Introduction to Pandas

First let it be known there are week long all-day workshops which just scratch the surface of how to use Pandas for Data Science.
Thus this will be quick and very dirty but some of the most important essentials will be laid out.

As mentioned before Pandas uses a specialized container called a dataframe.
It also provides you with a gazillion functions to help you work with the dataframe.

That's it in a nutshell. Questions?

Seriously though we as always start with the package import.

There are over 308 million rows of data representing every US citizen from the 2010 census, sorry the 2020 was not possible to get the same level of information, yet!
The data consists of location values in an 'easting' and 'northing' coordinate system for longitude and latitude respectfully.
Along with the location we have 5 columns of data for sex of the individual, education level attained, annual income, the class of worker, and their age.

The data is stored in a Parquet format which is extremely excellent for larger data. By comparison the file size of the census_data.parq file I created is >2.8BG. The same file saved as a .csv is more then 3x larger!
The good news is Pandas knows how to read and write Parquet files.

Paths to auxiliary data

It's important to know the path to where your data is when you try to read it in.
There are a number of ways to adding the complete or relative path to your data in to your code. But this becomes a lengthy conversation so lets assume if you are working natively on your own machines use the following line.

IF however, you are using the EC2 shared instance please uncomment the following line and use it.

We can look at the full dataset, well sort of with just;

If we just wanted to look at the top few rows of the data we can use Pandas head() function.

Likewise we can just look at the bottom few rows with the tail() function.

She's pretty big just how long is it? As in how many rows of data?
We can check that by using the length function len().

That just gave us the number of rows.
What if you wanted the number of both rows and columns?
In this case we can use Pandas shape() function like this;

Either way this is way to large for the purposes of this workshop so lets cut the dataframe down to just 1 million rows.
NOTE: reducing the dataframe size will affect any actual results we investigate since we will only be working with 1/308th of the full dataset!!!

Important Note: This is nice and could be sufficient but it will create warnings because this makes a shallow copy as opposed to a deep copy.
Don't ask or worry about what this means.

For most other file formats we could just load the first 'nrows' or even skip n number of rows and then access the next nrows.
But because of the column nature of parquet files this is not an available function in Pandas's read_parquet().

Lets look at the data

The first two columns are the easting and northing location columns we need this format for later visualization with Datashader.
The rest of the data is coded in numeric codes. Why numeric codes and not actual values. Remember how long it took to deal with just 2.8GB's of data?
But we can now easily change the data into a more 'readable' format which we will do now.

First thing we want to do is look at all the unique values and maybe sort them so we can see what we have in the data.

Column 2 - Sex

We can also look at the count for each of unique values.

Code Table

Knowing the code table we can replace all the int values with text to make the column more readable.
We will use a special parameter 'inplace= True' to replace it in the current dataframe. Without this parameter we would need to do this;

df['sex'] = df['sex'].replace({0: 'Female', 1: 'Male'})

This is also a slower process then inlining it.

Column 3 - Education

This is the level of education obtained by each individual and is kinda a wonky bit of data as there is crossover in some of the potential values.
But this 'may' be helpful depending on what you are trying to explore.

This time we will also sort the data noting that we can do multiple processing steps all at the same time, well in the same line anyways.

Note we 'could' sort the 'value_counts' but then we would loose context of which count applied to each unique value.
So to do this we need to iterate through all the unique values and there counts.
The problem is we also want the index value and 'normal' Python For-loops do not rely on count values like say c/c++ does.
But we can get the same effect with the 'enumerate' function that is part of the For-loop internals.
This will give us two returns the first is the 'index' count and the second the actual value.

We can use the same mechanism to replace the code-value with the categorical text equivalent and print it all out so we can make more sense of the data.
To do this we create a list of the correct labels.
Then when we iterate through data we replace the code with the text.

IMPORTANT NOTE: this change is only in the printed output and not in the data itself!
We will have to run the 'replace' code to make this change to the data itself!
We already have a list of the values so you'd think we could could just loop through all the data and replace them as we go. And we can!
BUT it is insanely slow.
Below is an example of how this can be done. DO NOT RUN IT NOW - hence why its commented out!

Instead we will use our edu_list and convert it into a dictionary and then run the Pandas 'replace' function.

Notice: In the above, commented out, cell I use a Notebook 'magic' to time the function.
Jupyter has a number of special 'magic' functions which provide extra, behind the cell code functionality.
In this case I'm using the '%%time' magic to have Jupyter time how long it takes to execute the cell the magic resides in.
It then prints the time out for us.

So comparing in our for-loop attempt at replacing the category values versus just letting Pandas do it we find the following results:

For-loop replace time: 280 seconds
Pure Pandas replace time: 18.2 seconds

That's a 1439.46% speedup!!!

Moral of the story? 1st law of coding, don't write something that has already been written. You won't do it better!

Column 4 - Income

This is the net income reported for each individual age 25 and over.

Looking at the data with the category.

Column 5 - Class of Worker

This is a gross, in more ways then one, categorization of who each person 'may' be employed.
We will go through our normal steps as above.

Column 6 - Age

Age is just that the reported age of the individual.
I'll leave it as an exercise for you to do the value-counts if you desire. We will look at it differently shortly.

Columns 0 and 1 Location information

This is the location information for each individual's residence in easting and northing coordinates.
Easting and northing is basically a coordinate system measuring the physical distance east and north, respectfully, from a base point.

We can use Pandas min and max functions to find the minimum/maximum locations for each individual.

Title_Slide.png

A Quick Introduction to Plotting

There a HUGE number of different packages that you can use for plotting data including, but far from limited too:

For the longest time matplotlib ruled the roost and is probably way and by far the most often used package mainly out of familiarity.
But the new king of the hill is Plotly. Till a couple of years ago people, such as me, laughed at the concept of Plotly being the 'One' package.
Then The folks at Plotly made some massive changes, expansions and buyins making one massively powerful tool.
One of the advantages of Plotly is it works in many different languages almost exactly the same in each. This include Python, R, Julia, its native javascript, and others.
Did I mention that in the background its pretty much all javasript? That makes it valuable for web presentations too!
Another advantage is if you can dream a plot you can create that plot, albeit at the cost of a bit (spelled - lot) of work on your part but very doable.
How big and powerful is Plotly now? Well Matplotlib used to be the default plotting package built into Pandas but now its Plotly and you have to tell Pandas to use Matplotlib if you so wish to use it.

Plotly + Dash + other packages

Dash is, we shall just say, an extension of Plotly (not really but they are now married together!) which allows you to create rich interactive dashboards using Plotly graphics.
Not only can you use Dash with Plotly but Plotly allows you to use other plotting packages inside it as well, such as matplotlib (but why?), Seaborn, Datashader and others.
We will look at native Datashader in just a bit.

Interested I have provided a multi-session set of trainings on using Plotly and Dash and am in the process of creating new videos trainings covering all that material and much more.
If you are interested these trainings will be made available laster this year.

Basic Plotly Figure Architecture

Compared to Matplotlib, Plotly has a simpler, more refined 'layout' based approach as shown below. Note the same architecture as you see in say matplotlib still exists here it just hides in the background.

Plotly_Architecture.png

The real power of customizing your plots to look exactly how you want/need them lies in the expansive usage of 'update_traces' and 'update_layout' more on this later but just know that this makes life, easier, once your used to it, then working in other visualization packages.

Major Ploty Modules

Note: Figure factories appear to be slowly going away with some of their features already moved into express and/or graph_objects.

Plotly Examples, Documentation, and Example Data Packages

Its always important to know where the documentation and examples are. You will find yourself referring to the two following reference documentation pages a lot and happily they are really well written!

https://plotly.com/python/

https://plotly.com/python-api-reference/generated/plotly.data.html

Basic Plotly Plots

To begin with we will take advantage of the fact that Pandas can use Plotly, as well as Matplotlib to plot data and we will immediately see the advantage of using Plotly over Matplotlib.

First we will just create a new pandas dataframe and fill it with 25 random numbers in 4 different columns which we will name A, B, C, and D

Now we will ask pandas to plot the data out using plotly.

You should immediately notice the difference in quality between the two plots. Also of major importance notice in the top right you have a funky toolbar called a 'modebar' in plotly. It allows you a healthy degree of user interaction to interactively start exploring your data.

Note: This is FAR from the end all be all of your ability to create interactive data visualization with Plotly. We will look at a wee bit more towards the end but the real power will be presented in the next workshop!

Plotly Express(px) versus Plotly Graph_Objects(go)

Plotly Express

Plotly express allows you to create plots as quickly and as easily as possible. Think of it as an automated plug and play for your data. This comes at the price of some customization and cross package functionality. This is a great solution when you first start working with plotly or need to quickly explore and/or share your data.

We will work with both express and graph_objects to show they work basically the same.
HOWEVER - Please note that the parameterizations for one may be, an often are slightly different then the other. In fact in some cases some cool functionality for quick interactivity only exists in express. You can indeed do the same thing and more with graphic_objects but its more complex then just a simple key-value pairing.

Plotly Graph_Objects

Plotly graph objects are the real meat of plotly. It allows you to create what you need the way you want it albeit at the cost of a bit more complexity.
From here on out we will concentrate on graph objects for our plots.

Lets move back to our census data and explore it better

We can change the 'orientation' parameter and swap the x and y axis to create a horizontal bar chart.
We will also change it so we are coloring by the different categories. This will give us a unique color for each category.

Histograms

We can just as easily create a histogram of say 'Age'. In this data case there will be little difference between this and a bar plot.
Well except for the fact we can control the number of bins in the histogram!

Pandas Query to help compare data.

While all of this has been 'maybe' interesting the real value in the data comes when we start expanding the number dimensions of data to explore.
In this next case we will look at the comparison of educational attainment based on gender.

To make this happen we need to find all the data which corresponds to each gender and then go through the smae process we did above to see the count for each category.
Pandas has a wonderful function called 'query' which makes this happen. To make it work we do a 'query' on the dataframe then specify the column we want to do the query on and what value in that column we are interested in.

To plot the data we use a graph_object Bar chart with multiple traces.

Next up - Pie Charts

Pie charts takes both a 'labels' and a 'values' parameter which are just as they sound.
In our next example we will create a pie chart of 'Worker Class'.
We will also 'explode' out the slice which represents 'Self-employed' workers.
Lastly I'll finally show you how to modify the interactive text-over display.

Datashader

Datashader is specialized visualization tool that lets you work with EXTREMELY large datasets insanely quickly, and in parallel and if available on GPU's.
By large, I suggest that 308 million rows of data is puny! I have ran 10+ billion point datasets in less then a minute!
It does this by aggregating and rasterizing the data into regular grids. There is still a healthy room for improvement as interactivity with the data is still fairly lacking.

While you can render Datashader directly in Ploty and/or Dash. For this quick demo we will just look at pure Datashader as its already a different enough beast compared to what you have seen so far.

For full information please visit: https://datashader.org/

We start with a clew of imports wee need for Datashader and rendering in Jupyter.

For Datashader we can easily play with all 308 million rows of data so for data safety sake we will begin by reloading the dataset back up.

First thing we will do is define a geographic region we want to limit the render out to. In this case to the CONUS region.
For fun we will also look at a region that roughly outlines the 'almost' rectangular shape of Colorado.

Notice that while our dataset is already in easting northing coordinates, which we need for Datashader, Datashader provides a handy utility function to convert the data for us to convert conventional lat/lon data.

We'll do a little definition setup

Now we need to create a datashader 'canvas' which is the equivalent of a Plotly or Matplotlib figure.
Then we set up the aggregation of the point into a grid on the canvas.

Next we define how we want to export the data into a final, saved image.
The 'export_path' is the location of where we want the outputted image to be saved and will create that path if it does not exist.

Then we specify the colormap (cm) we want to use. In this first case the opposite of our background color.

Now for the magic we actually create the plot itself.
In this case it will be an aggregated density plot of each of the 308 million people in the census datas residence location.
We will start with a Gray color map.
Think of the 0.2 in the cmap as the intensity of the density aggregation transfer function. (OK it's really closer to the point size but ignore that for now) We will use a 'log' scale of the transfer function.
Finally we specify the file name which will be saved to the export_path we defined earlier.

We can easily change the colormap to say Viridis and just re-render the output.

We can provide more contrats to changing the 'how' interpolation to something like 'eq_hist'.

Putting it altogether lets look at a closeup region of just Colorado(ish).
Ideally you would actually create a function for most of the below code and then just call it to create numerous different exploratory plots.

For the curious lets time how long it takes to generate 308+ million datapoints.

Sankey Diagrams (if time permits)

Sankey diagrams show the 'flow' of the data. They consist of a series of 'nodes' which represent data categories.
The width of the connecting 'edges' is directly proportional to the value associate with each node.
The value of Sankey diagrams is they help show the relationship between many different categories of data and their associated value.
They are often used in show relationships in Census data, manufacturing, business decision models, political models.
In science for looking for new connections to their data (eg in climatology they help show teleconnection relationships).

NOTE: Sankey diagrams are sometimes referred to as Alluvial Plots and a few other names.

We will explore Sanky diagrams with totally manufactured (spelled fake) data as this will make it easier to see whats happening.

Sankey diagrams require 3 main pieces of information derived from the data:

To create the Sankey in Plotly we need to use a graphic object conveniently called 'Sankey'.
The process requires describing what the nodes look like as well as the links.
The links contain our 'source', 'target', and 'value' data.

Now lets look at something more advanced albeit still very much contrived.
We will create a list of lists which represent websites as the 'sources' and pages on the website as targets.
We will create random numbers to represent the number of user visits to each page. This will be our 'values'.

Create the nodes, sources and targets

Now we can create our Sankey diagram as we did above.

We can pretty this up a bit by adding in some unique colors to each node.
Change the background to black and then adjust the font so it show better against the background.
We will also change the size of the plot so its easier to see whats happening.

For the colors we will use a Plotly Express color map called 'D3' (if you are wondering yes, after javascripts insanely powerful D3 visualization package).
From this color map we will extract a random color for each node and for each edge.

And once again we create our Sankey diagram.

Is Pandas the end all be all for data science?

That's a loaded question. The answer is yes, and no but really yes. Lets qualify that.
Pandas is by far the defacto standard for all things data (in the Python metaverse). But with our dataset we are pushing the upper limits to in-core memory computation.
Pandas 'can' get the job done but we already saw the processing speed with just 1 million of the 308 million rows of data. So what do we do if we are playing with even larger datasets. I, personally, routinely work with datasets that start at 35+ GB's and often with hundreds of similar files all at once!

The answer lies in other similar packages such as Dask, Dask_XArray, CuDf (NVidia's Cuda dataframe package which LOVES GPU's) and a few others. These packages are designed to work with extremely large datasets with speed and ease.
So there's the 'no' part of my original answer. Now for the 'yes' part. All of these other packages mimic Pandas as closely as the core architectural concept will allow. The only, well there's a few for each package, draw back is that Pandas has been around for many many years, has a huge support community and constantly being improved. These mimic packages are only a few years old and just don't have the same long development history as Pandas. But each is working hard to provide as much as the Pandas like functionality, usually with the same functions names and parameterizations, as possible.

Sometimes, be warned if any ATOC students are here, that you often have to jump through many hoops and loops to get from one file format (namely netCDF) into these other packages, get them processed the way you want and then back into netCDF. If you are one of these future individuals come seek me out as I have developed a rather funcky solution to processing hundreds of netCDF files, generating a new composite netCDF file(s) and saving them back out to netCDF. So not as simple as one would expect!!!

Is Plotly the end all be all for data visualization?

Yes, but no, nothing is. Yes, it's the current king of the hill (Pythonically speaking) especially when mixed with Dash. But when you get down to it there are a near infinite number of ways to visualize data and no one package can do them all. So for the vast majority of what a normal person would want to do, Plotly is for you, unless you already have Matplotlib, Seaborn, ect... code to work with then use it till you need more! Else look into all the other wonderful packages out there to expand your horizons.