Over the past few months I’ve been diving head first into the world of coding to get a better understanding of how it works and how to use it for machine learning on geological data. Over this time I’ve been asking google many questions, and getting some very complex answers back, so in this blog I’ve attempted to break down the complexities and sum up coding in an easy to understand way. Unfortunately, there are not many geology related themes written here, rather setting up for future blogs on machine learning in geoscience.
What is coding?
Coding is the act of writing lines of code, following a specific set of rules, that you use to tell your computer what to do. Everything you do with technology, whether it is a computer mouse click, a touch on your phone screen or a Mortal Kombat button smash combo on Xbox, there are lines of code running in the background to satisfy your commands. For us (humans) to command them (computers) to do something we need to use what is called an interface. There are two main ones; a Graphical User Interface (GUI) and a Command Line Interface (CLI). These interfaces are provided by the operating system (such as windows, mac, android, linux etc) and are different between all systems AND all versions. This is why someone going from Windows XP to Mac OS X 10 might not know how to use the desktop or applications.
To get your computer to do something, let’s say open up the Spotify app (examples are on my MacBook), you can either use a GUI (most common way) or use a CLI (the 'coding' way). For GUI, you would navigate to the applications folder on your screen and click on it using your cursor to open Spotify. To use a CLI, you can open what is called 'Terminal' on Mac or 'Command Prompt' on Windows and type in your command (or code). In this case, I use terminal and type in open -a spotify then hit enter, which then opens Spotify.
So the two ways of getting shit done on a computer is through a Graphical User Interface (GUI) or a Command Line Interface (CLI). One is effectively telling your computer what to do with your mouse and the other is telling it what to do with your keyboard. This article has a great explanation for the differences.
What is python?
It is a syntax (or language) coders/programmers use to create programs. There are many different coding languages utilised for different things, like Java (used for Gmail), C++ (used for Microsoft Windows), JavaScript (used for Google Chrome), PHP (used for Facebook) and Python (used for Spotify). The same letters, numbers and symbols are all present in the coding languages, but are just used in a different way. For example, if I was to write a line of code; “Do you reckon its warm enough to wear thongs today boys?” to an Australian English (python) interpreter I would probably get a simple yes/no return from my code. However, if I executed the same line of code to an American English (C++) interpreter I would likely get a very confusing error. It simply means the two languages interpret code differently.
Why choose python?
It’s easy to learn (comparatively), has many useful applications and generally uses a lot less lines of code than the other languages. This makes it neater and easier to read for beginners. Also, its very popular and a lot of people use it or at least understand it, making it even easier to troubleshoot or find answers for errors online. Find out more about python here.
What can geoscientists use it for?
Unfortunately I wont delve too far into the geoscience side of coding yet, but for me personally I learnt how to use python to be able to apply Machine Learning Algorithms to geological datasets. The possibilities of using python in geoscience are endless. You can graph, model, analyse and plot all kinds of geophysical, geological or geochemical data, you just have to learn it first. This great article by Bruno Ruas de Pinho better explains what you can do with Python from a geology perspective.
Where to start with Python?
There are hundreds and hundreds of guides/tutorials on how to use python for beginners online. Each of these generally run through how to download it and how to write all the coding basics like; data types, loops, functions, libraries etc but I couldn't really find a good explanation on where to write code. When I first downloaded Python from the website I had no idea what I was downloading. I went for Python 3.7 (there’s an option to have different versions) and got a bunch of random files that I’m still not entirely sure what they all do.
Where do I write the code?
You can write code anywhere, in any text editor like Notepad or Microsoft word I’ll write an equation below using the text editor of this blog;
x=(4*3)/2
print (x)
It didn’t do anything because its just text. For the code to be ‘executed’ I need to enter it into a compiler or an interpreter, just like ‘terminal’ used in the Spotify example. A compiler/interpreter converts the code into a language my computer can understand (11010100110111 or something similar) and then returns what I’m asking for. So, when I type this into terminal I’m basically saying “What is x is in this formula” in the first line, and then in the second line I’m saying “Show me what x is”.
I still typed the exact same thing as I did in the blog text editor, hitting return when I was finished with each line, but hitting return in the interpreter 'ran' or 'executed' my line of code. This was a pretty easy and quick formula with not much room for error, but when you’re writing large amounts of code (or copying and pasting) it’s easy to make mistakes.
A better place to write code is what is called an Integrated Development Environment (IDE). There are many different types of IDE’s but the one I am using as an example is called Spyder, which can be downloaded with the Anaconda (a python distribution package which comes with a bunch of extras). The idea of an IDE is to have a text editor (where you can write your code) and a console (where your code is returned). Some benefits of using IDE’s instead of coding straight into a console is that they have debuggers (notice errors in your code prior to running it) and they can predict what you might want to code as you type it, like the predictive text on your phone. As a debugger example, I’ve used the same 'solve for x' formula example as above but this time in the text editor I’ve replaced the lower case x with an upper case X.
The IDE debugger has told me that I’m about to print an undefined variable before I’ve even run the code, allowing me to change it quickly and run the code properly. You may also notice code in the text editor are coloured, making it easier to keep track of parentheses, what your data type is etc. To run the code in the IDE I just click on the little ‘play’ button in the toolbar and away she goes, spitting back my answers in the console. Each IDE is different but essentially, an IDE is like a carpenters workbench, it’s easier to work with and you have all the tools within reach. Find out more about IDE's here.
Some of the things I still don’t quite understand are the differences between shells, consoles, kernels and terminals but I feel thats more heavily computer science based. Maybe you can make more sense of them with the help of this thread. Learning more just means understanding less it seems.
What can I do in Python without knowing much code?
Obviously, the above formula is pretty basic and I probably (keyword) could have worked it out in my head. Let’s say you want to do something more complex, like create a website, this means you’re going to be writing a lot more code. However, coders/programmers don’t like having to rewrite code if they don’t have to, which is where modules (contained in libraries/packages) come into play. A module is effectively code (or a program) that has already been written, saved and can be reloaded and used again. To use the code you ‘import’ it into your text editor and then it runs without showing all the associated code. For example let’s say I wanted to write a program to calculate the area of a few different shapes, the code would look something like this;
When I run this code, I see that there is now prompts in my console wanting to know what shape I want to choose, but right now I don't want to know the area of any shapes. I can save that code as a ‘module’ if I want to use it again, so I'll save it as a .py file in a folder called 'geometry' on my desktop. This folder is called a package. Say I'm having a beer with some mates and suddenly we want to know the area of a 4m x 20m parallelogram. I can import this module again in a brand new file as seen below, where 'area' is the module and 'geometry' is the package, and enter the prompts in my console.
Now there is only one line of code in my text editor as compared to 56 in the module's script. This is a very basic concept of modules being stored within packages and how to import them. Thousands of different packages exist and are used to make coding quicker and easier. For example if you want to run a machine learning algorithm, you would just import the desired module and run it on your data as opposed to attempting to write it all up in code. Some of the most common packages used in science are;
NumPy – core library for scientific formulas
Pandas – analyse data tables
Matplotlib – visualising data through graphs or plots
Scikit-Learn – machine learning algorithms
Many packages are downloaded with the original python install, and can be hard to know exactly what you've downloaded and installed to your computer, but if you type in help (“modules”) you’ll see what you’ve got.
You can find specific packages you haven’t got installed online in a place called a ‘repository’. This is where you can find out and download modules in packages that other people have already written. Some of the most well-known ones are;
GitHub
PyPI
BitBucket
SourceForge
A lot of what is written in this blog isn’t very geoscience focused, but it hopefully builds up enough foundation to understand the components of coding, and all the early steps needed to begin using python. I didn’t touch too much on the ‘language’ or ‘syntax’ of python because there are thousands of different tutorials online which can explain it. From here I’m going to start looking at using python to create machine learning models from geological data sets and show the applications in the mining world.
Some questions threads and websites that might make my explanations clearer:
Comentários