Jupyter Notebook How to Upload a File
In this tutorial we will learn to write a simple Python script for reading information files.
For this tutorial I am going to assume that you take some idea most using either Jupyter notebook or Python in general. I likewise assume that you have Anaconda installed, or know how to install packages into Python. If you do non, so I would first propose putting a few minutes aside for installing Anaconda and taking a crash grade in Jupyter.
We are reading this data file
Hither is a snapshot of what my data file looks like:
Information technology is a two-column comma separated value (CSV) file. The wavelength (x-centrality) is in the first cavalcade and light intensity (y-axis) measured at that wavelength is in the second column. Information technology is intensity vs. wavelength here, this may be something else for yous – population vs. year, temperature vs. month, or sepal width vs. sepal length etc. You tin can endeavor following tutorial this with the iris dataset. (exact solution at the end)
The excel snapshot of data shown here is for representation purposes. In reality, the data columns are 1000s of rows long. The point is, you may choose to plot such data as a scatter plot, or a line plot.
Importing the relevant libraries
Reading a information file into a Python Jupyter notebook is simple. When you install, it comes with a version of Python that has the Pandas library pre-installed in information technology.
Start your Jupyter notebook and type in the following in your cell.
import pandas as pd
This imports the module pandas and all of the useful functions inside of it can exist used using the "pd." prefix. Similarly, the other scientific computing favorite, "numpy" is usually imported as "np" and you lot practice it exactly what you did with pandas. Add together the following to the side by side line of your code:
import numpy as np
Pandas (pd) is a great library for treatment big datasets simply in this lesson nosotros will utilize only a small part of it to read our information file. Numpy (np) is a library that makes life easier when handling arrays, hence we import that also. Nosotros will see in the following code how these volition be used.
Running your code at this point volition appear similar it did nothing. Just a lot of stuff happens backside the scenes. The libraries are imported into your lawmaking. Yous will certainly run into errors if the libraries you lot try to import were not installed properly.
Filename cord
To read a file, your code needs to be told its location. The location, a cord blazon variable, can be stored in a variable such equally "filename." And the location will be its path in your reckoner. A string in python is surrounded by either single quotation marks, or double quotation marks.
Say your file is in the aforementioned binder as your Jupyter notebook and is called "P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv". In this instance you lot will showtime start by storing your file proper noun in an arbitary variable, let'due south say "filename", like this:
filename = 'P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'
Specifying a path
Simply since files are not always conveniently inside the same folder as your lawmaking, you lot can also accept the full path of the file stored in this string. This is always improve. Then now, say your file is in a folder 'D:/data/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv', then you store your full path+filename inside the filename variable as follows:
filename = 'D:/data/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'
Note that the slash in the path is a forwards slash, different the backslash windows often uses. Using backslashes will not work here. If you still desire to apply backlashes, you will accept to replace a single backslash with a double one as follows:
filename = 'D:\\data\\P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv'
This is because the backslash is a reserved grapheme (or an illegal grapheme) in Python. Information technology is used to print special characters such as new line or a tab, using '\n' or '\t'. So when you use a unmarried backlash, python doesn't read it as a grapheme. Y'all need to utilise an escape character "\\" for Python to read information technology every bit a single backslash. Other useful escape characters can exist establish here.
We volition learn smarter ways to get the path more easily by browsing files in the file explorer window, in the post-obit lessons.
Reading the file
To read the file using our imported pandas library now all you have to do is use the "read_csv" function from pandas as follows:
pd.read_csv(filename)
As long equally you have a file with the column like data (shown previously) in it, yous volition immediately get a table as the output which for the blazon of data I showed above, would look like this:
If this shows alright it ways that your data file reading was successful. There are certain bug with the header, we will fix that later. To store this pandas object in a variable called "information", for example, we volition modify our line of code to assign the pandas object to 'data'.
data = pd.read_csv(filename)
Some other smashing way of checking if your information was read in the columns separated appropriately, you tin can use the head function to display but a quick preview of the information equally follows. Since you accept assigned the object to the 'data' variable already, you could display the caput part of the data using the following:
information.head()
Trick to reading data in ASCII files
In case you lot desire your program to be robust plenty to read most ASCII files with arbitrary delimiters using a single line of code, I've found using the following to be very useful. This line of code should exist good enough to read all tab separated, space separated, or other delimiter separated files.
information = pd.read_csv(filename, sep=None, engine='python')
Adjusting the header
Since our data has no header, we can add an attribute to the read_csv function to tell it that there's no header in the file. To practice this nosotros use the 'header' attribute with value None. Let's modify our line of code to await as follows:
data = pd.read_csv(filename, sep=None, engine='python', header=None) data.caput()
This will evidence yous the data where the header names of the columns are but 0 and ane. This is what nosotros want.
If your file does have text headers (iris dataset), then you lot probably wouldn't demand to practice this.
Obtaining raw information array
To obtain the raw 2D array, out of this pandas information object, you tin employ the value office on the object similar this:
rawdata = data.values impress(rawdata)
and the output will evidence the raw 2d array in numbers.
Pandas does a great task of reading large files fast. I use it mostly just for that. From here onward, I notice it easier to do all of the information processing on the raw 2D arrays.
Obtaining each column assortment
Say your x-axis information is in the start column, to get all of the rows from the kickoff column into a variable 'x', we attach the [:,0] to the rawdata variable. This tells Python to fetch all rows (:) from the 0th cavalcade. To get the second cavalcade you lot would use [:,1]. Information technology is important to remeber that indexing always starts from 0 in Python.
So now, our code written all together looks every bit follows:
import pandas as pd import numpy as np filename = 'data/L1/P2_maxint_0500x005ms_532nm_00095uW_540_900nm_temp_297.75K_t001.csv' data = pd.read_csv(filename, sep=None, engine='python', header=None) rawdata = data.values x = rawdata[:,0] y = rawdata[:,i] print("All rows of column 0: ", x) print("All rows of column 1: ", y)
The output for this gives me the individual arrays to be plotted, or do analysis on. The output should at present look equally follows:
Specific rows, say the part of data from row 10 to 100 of cavalcade ane, can be extracted as follows:
y_cut = rawdata[10:100,1]
Plotting our data
To take something to show for the work we but put in, permit's plot our data. One of the easiest libraries to work with while plotting data in Python is matplotlib. It is highly customization and can practice wonders within a few lines of code.
We tin can plot the x and y variables past importing a office of the library (matplotlib.pyplot), using the plot part and showing it. Brand sure the number of rows for both your columns are the same. So if y'all slice the rows (as in previous step) for x, do the same slicing for y also.
We will add the following iii lines of lawmaking to your script to plot the data:
import matplotlib.pyplot as plt plt.plot(10,y) plt.show()
The beginning line hither, imports the matplotlib.pyplot library as 'plt'. This 'plt' is arbitary. You lot can apply anything instead of those 3 letters and utilize the same messages when you are trying to phone call functions from that library. Many people similar to employ 'plt' to exercise this. So I'1000 using that.
Although I am importing the library here, I want to say that you should try doing all of these imports together in the first section of your script.
The second line plots the line plot with ten array in the x-axis and the y assortment in the y-centrality. All of this is done by Python and stored in the memory. Information technology won't show / print till you lot type in the 3rd line. Your output should show the following plot if you used the same data file I did.
For the reading information in the Iris dataset
Since the iris dataset has headers, I simply removed the header=None attribute from read_csv. And I replaced the plot part with a scatter function to draw a scatter plot instead of a line plot (which was a mess in this case). With these two modifications the script worked like a charm. Here's the code I used to plot the sepal_width (y-axis – cavalcade number 1) vs. the sepal_length (ten-axis – column number 0) from the iris dataset.
import pandas equally pd import numpy equally np import matplotlib.pyplot equally plt filename = 'data/L1/iris_dataset.csv' data = pd.read_csv(filename, sep=None, engine='python') rawdata = data.values x = rawdata[:,0] y = rawdata[:,one] print("All rows of cavalcade 0: ", x) print("All rows of column 1: ", y) plt.scatter(x,y) plt.show()
The output looked every bit follows:
We will practise much more reading data and simple plotting. Later we will learn to modify our plots, make our file reading method more than efficient in the next few lessons. Simply for now, I hope the plot gives yous a sense of reward.
Let me know in the comments or the contact folio if I did non explicate a certain part of the code too clearly. I will be happy to add more clarity to the lesson.
Source: https://edusecrets.com/lesson-01-reading-data-files-in-python-jupyter-notebook/
0 Response to "Jupyter Notebook How to Upload a File"
Post a Comment