Journey to Becoming a Hacker: From Zero to Cybersecurity Ninja Yoshuam A. Alicea Casillas Master in Computer Science Advisor: Dr. Jeffrey Duffany Electrical and Computer Engineering and Computer Science Department Polytechnic University of Puerto Rico Abstract  Time to time technology changes and means that is unrealistic to be able to train someone the need for cyber security experts increases. While at the extent of being competitive against a real is true that lots of universities prepare students on threat in the work environment. While your degree this career, somehow students face trouble and university program will help you grasp all the understanding how to acquire the necessary cyber security concepts and theories behind it, we knowledge to perform well, once they land a job in need to go further and find the experiences that will cyber security field. Also, it seems that they need to simulate the cyber war battlefield that you will be separate the theory from practice to not only exposed once to start in your cyber security career. understand why things happens in this field but to This is because is not easy for any curriculum out know how they happened and the causes that made there to cover all the attacks, vulnerabilities and them. Is for this that we will walk you through the exploitation techniques that exists. So, we need to do journey of how to start a training that will take you this part by ourselves. from zero knowledge in the topic to skillful hacker in The internet is filled with information and we a short period of time using knowing tools to find the can use it to learn about everything related to cyber knowledge and exposing yourself to competitive security. Problem is since we are talking about cyber environments that will put in practice what you security the information may be censured or hard to learned in your degree and what was self-taught. find so you can actually learn how to exploit Key Terms  Capture the Flag, CTF, Cyber something. Is more probable that you will learn how Security, Cyber Training, Hacking. to defend and harden your system instead of how you can penetrate it. There is no problem on knowing INTRODUCTION how to defend and harden a system, the problem is understanding from what you are safeguarding your When people think to start a cyber security system. Knowing how a system can be exploited career, they often expect that pursuing a degree will helps on how you can safe guard it. Is for this that get them ready to the professional world by having we will take you to a journey of how we became the necessary skills that a cyber security expert needs skillful in the cyber security field while going to overcome any challenge that will present in their through our cyber security degree. This journey will jobs. But the problem these newcomers face is that go from zero knowledge in cyber security to super hackers out there have been doing their bad deeds for skilled cyber security ninja. long time, and even worse they have been doing it in real world scenarios. Some of them are trained by TRAINING illegal hacktivist organizations and they have all the time in the world to achieve their objectives and to When you first start in this career you probably gather the day to day knowledge to become even have some computer knowledges like programming better. In summary, these bad guys have more languages, database management, web application experience than any newcomer in the cyber security development, network administration, etc. Maybe field. There is no university out there and no you know how to use encryption to protect your data, certification that will put you under the fire that the the reason why you would like to use a virtual private company that you will work for will be having. This network (VPN), and some other reasons that will doesn’t mean that your university path is wrong, it point you to keep your information safe. Basically, you learned through the time how to protect your have to play by role like you will do in an IT information even though you are not sure of how it department. works and if it really is protecting your system and Once we learned what CTF were and their types information the right way. You just trust that it is we research about what are the skills needed for it. well done, and your system is protected because of When going through our findings we discovered that faith on the anti-virus and the firewall software they there was a common factor in all of the different sold you at X company. topics covered in both styles of CTFs. This common There are many paths to take to start a cyber factor was system administration skills. security training and we will show you the one we took to become skilled in short amount of time. Our System Administration first step was looking for a cyber security We started training ourselves in system competition that will force us to go through the administration stuff like understanding our operating topics they cover in those competitions. At first, we systems (Windows, Linux, MacOS) and how to use did not know what a cyber security competition was tools that comes with those operating systems. We and how to play it. After a short research we learned learned how to navigate through the command line that these competitions are called Capture the Flag and how to make powerful search, parsing of that (CTF) and they are divided into two different styles search and learned how to use basic tools like cat, [1]: grep, find, strings, chmod, chown, netcat, ssh,  Jeopardy style: In this kind of CTF you are openssl, cron, ps, cd, ls, man, vim, etc. The main given tasks by categories, think of it as problems reason of starting on system administration training to solve, where you need to exploit, find was because you need to understand how to vulnerability, reverse engineer or find administrate a system, where to locate those intelligence on the given task. They are divided system’s important files and their directory structure. in different categories like, Web application, Once you have this knowledge you will be able to password cracking, enumeration and look for password’s dump, where server’s software exploitation, reverse engineering, computer is located, how to add and remove privileges, forensics, open source intelligence, opening and closing ports, connecting to another cryptography, steganography, wireless machine through secure tunnel and so on. With the exploitation, network analysis and log analysis. previous mentioned knowledge, you can basically The aim of this type of CTF is to teach the start looking into a system and navigate through it competitor to identify vulnerabilities, look up without any problem. You will know what to look for information, learn about software up and possible locations to hide backdoors and mechanics, identify network traffic, secure web exploits once you got into a system remotely. As you applications, understand passwords weaknesses can see now you can actually start understanding and correct use of cryptography and finally how hackers may get in into your system because reversing techniques. All of this are essential for you know how to do it as well. an individual to learn to think “out of the box” This first step in our training was important like an attacker will do to achieve his goal. because in CTF competitions you may have to  Attack and Defense style: In this type of CTF, connect into a server remotely and exploit it. Also, a server is given to each team and they have to you may have to run programs that you will need to protect it from the other teams, while at the same analyze using system administration tools. If you time you try to hack into other team’s server. don’t have these skills, you will experience struggle This style is the closest that gets you to a real- not only in CTF but in real life scenario as well. world experience. In this style of CTF you will Another thing to mention is that we went straight for the command line and not software because mostly of the servers, if not all, does not have a Graphical Cryptography User Interface (GUI) that you can use to move Our third training was based on cryptography. through it as you are used to. But after mastering the We started by understanding really basic and very command line we actually tested tools with GUI that old cryptography ciphers and what their purpose was could perform same purpose as the command line and the reason of their existence. After learning basic tools even though we barely used them. ciphers like Caesar, Vigenere, PlayFair, and After training ourselves in system Railfence ciphers [4] we not only understood why administration we were ready to move on into the they were so vulnerable due to their simplicity and next steps. limits, but we learned how to brute force them to find Web Application Exploitation the message as well as how to perform dictionary [5], known-plaintext [6] and frequency analysis [7] Our second step was understanding how web attacks to decipher the encrypted message. Once we application works and understanding the browsers. understood the simple ciphers, we went for the While going through this training we came up with complex ones like RC4, RSA, DES, AES and many a web page called OverTheWire.org [2] that will go others. For the DES we learned how to use through baby steps on how to start exploiting web differential cryptanalysis attacks [8] to break applications. It will also create awareness of best variants of DES with up to 15 rounds. Also, we practices that you as a developer should follow in learned how RSA works what makes it work, the order to avoid being vulnerable. It also taught us how factorization problem that it faces and how to exploit to use the browser’s developer tools to debug, it when misused using Wieners’ attack [9] or using tamper data, access and change cookies, and inspect factorDB.com [10] to find factors of its modulus the source code of the web page. when their prime numbers are too small. We We learned how to make requests to a web developed tools to run crib drag attacks on One Time application using JQuery and the JavaScript console Pads when used multiple times with the same key. to send values that were valid and will allow us to We also developed tools for brute force attacks and change the behavior of the responses given from the were able to make an RSA exploit tool to break any server. We also learned how to use famous google RSA that was wiener’s vulnerable or had small hacks [3] to navigate the directory of the server if modulus. After, this training we became really good this one was not well protected in the server rules at cryptography and learned how to make good use (ex. .htaccess file). We learned how to inject code to of the different encryption algorithms that exist out change the queries the web application will do to the there and when to use them. database using sql injection attacks. Furthermore, we learned how to upload a backdoor using php script Reverse Engineering that will be read as a picture exploiting file uploads Next stop was the reverse engineering training. functionality on the server when this one was not This was by far the most difficult training to well protected. overcome because we needed to understand basics After beating each of the OverTheWire.org of assembly language at least at the level of reading website challenges we felt pretty confident on how it and knowing what is going on. Also, we learned to approach a web site exploitation. Even though how to ease the process of disassembly a binary with these skills still not at the level of a hacker, now that objdump, binary ninja [11] and IDA pro [12] tools. we have the knowledge coming up with new ways of We learned how to identify a buffer overflow, what attack and thinking “out of the box” becomes even to do to make it happen and how to safeguard using easier which will boost up in skills every day. good programming practices to avoid them. We found two website that will challenge us to get more skilled in this field called ringzer0ctf.com [13] and pwnable.kr [14]. In these websites you will get tested [20] which is a Capture the Flag competition that is in a hardcore manner in reverse engineering very competitive and will put your brain to work in techniques. We also found a library written in python every challenge they give and in CTFTime.org [21] called pwntools [15] that will help us to write which is a website that hosts many CTF shellcodes to exploit binaries once the vulnerability competitions during the year. These competitions are was identified. worldwide and very complex. We never became experts at this field of cyber security, but we understood the very basics of it. NATIONAL CYBER LEAGUE Having reverse engineering skills will allow us to The National Cyber League is a defensive and take a malware for example and analyze it in order offensive puzzle-based, capture the flag style to find out how it works. Once you know how it cybersecurity competition. Its virtual training works you will understand what vulnerability the grounds helps high school and college students malware is taking advantage of and how to protect prepare and test themselves against cybersecurity your system with the knowledge gained from the challenges that they will likely face in the workforce. analysis. We competed in these competitions and we Network Analysis made it to the gold bracket each time. To understand the effectiveness of our training we did a study [22] The final part of the training and we would say along we several students that were starting in the the most important one was network analysis. We cyber security field and the results were outstanding. said is the most important one because this is the real In each participation we were able to get closer and war zone where everything happens. The network is closer to top 10 as individual and also as team. something we need to understand in order to be able From this competition we were able to learn to detect from where the attack is coming and how it about tools like nmap [23] for port scanning and is being performed. With network analysis skills you dirbuster [24] for directory scanning. Also, for can trace the packets being sent over a network in password cracking we learn how to use hashcat [25] real time or by capturing network traffic that and ophcrack to find the plaintext of a hash. We took occurred in a specific time. For this training we it even further by writing our own scripts that will learned how to use a tool called Wireshark [16]. This perform this kind of attacks to passwords. We also tool is so powerful that we could follow streams and got better at writing scripts that will act as bots to analyze at molecular level each of the packets sent solve problems with ease. Team work is also over the network. We learned about DNS, TCP, encouraged in this competition when it reaches the ICMP, HTTP protocols and how do they work. post season. Using Wireshark, we could trace this kind of packets One good thing about NCL is that it measures and identify the source and destination of a the accuracy of your performance. This helps to communication between two IP addresses, we could make the competitor think twice and have a deep capture content sent over http protocol and many analysis on the problem to come up with the right other powerful things. Also, we could inspect solution else your accuracy will drop and your wireless traffic that came from a router and using position in the leaderboard will be affected. another tool called aircrack-ng [17] we could break the WEP [18] passwords and WPA [19] through WORK DONE dictionary attack. Basically, this training allowed us to perform forensics in the network. We are going to share here some highlights of After going through all of this training that took the work we did in NCL after our intensive training. us over 6 months to feel comfortable with each topic We also going to include a high-level explanation of we started competing on the National Cyber League how each challenge was solved. Figure 1 Law and Order: SVU Episodes Table Password Cracking the episodes of Law and Order: SVU series. After, searching we find the following: See Figure 1. In this NCL exercise we were given a list of It happened that Wikipedia had a list of all the usernames along with their password’s hashes. See episodes tabulated by seasons. That was really good Table 1 below. but now the challenge was to extract all those Table 1 episodes since this series had about 23 seasons with User and Password Hash Table 22-23 episodes each. User Password Hash Justen c6ffca47b477506eb331930cc6ae6292 Tom 9594e0f07b4e6e280c6131ce48dbf80d Rachel 8609c7cc715dea6500e08db180b16f51 Eve 315d1cc9faafa74129769751fdd92ea3 Elliot 247e8adf7ede165ad0bd6032e4c0dfc6 The hint was that each password was a name of an episode of the series Law and Order: SVU followed by 2 digits. So, an example of this would be ‘payback77’. To solve it we need to do few steps: 1. Reconnaissance – Find the names of all the episodes in the series Law and Order: SVU. 2. Pre-Attack – Build a dictionary with the name of the episodes, each name containing 2 digits at the end of the episode’s name. 3. Attack – Finally attack the passwords dump file and hope that we get the plaintext of each hash in the list. Figure 2 episodes.py – Script for Scrapping all the Episodes Name In the first step we went to the source of all from Law and Order: SVU Series information in the internet the biggest encyclopedia We decided to put in practice our programming called Wikipedia [26] to see if they had a list of all skills and write a python script, see Figure 2, that will query Wikipedia page by title. After getting the wiki 4. episodes.txt –This tells hashcat to use page content in html format, the script scrapped all episodes.txt as dictionary or wordlist to find the the episodes name as a text from each table. Then, hashes. since some of the episodes contained links to more 5. -O – tells hashcat to run in optimized mode for information, we had to remove all that because we better performance. only cared about the name. For this, we ran a regular 6. -o – tells hashcat to output the findings or expression that will remove all the special characters cracked passwords into a file named like, brackets, single quotes, double quotes, spaces lawordercracked.txt and numbers to sanitize the episodes name. Then it After running this command hash cat gave us will get each episode’s name and will append all the the following output see Figure 4 below: possible combinations of 2 digits after its name and finally will store every value into a file. After running the python script, we had a dictionary saved into a file named as episodes.txt containing all the episodes of Law & Order: SVU with two digits at the end, see Figure 3 below, that Figure 4 we used to crack each of the given hashes. lawordercracked.txt – File Containing the Cracked Passwords Finally, after we ran hashcat we could crack all the given passwords hashes and solve the challenge. CTF TIME This website is what we call the real test for our acquired skills. This website challenges are of very high complexity and you will be competing against the best teams in the world. Do perform well in this website and you for sure will be ready for any Figure 3 situation that presents in the work environment. episodes.txt – Dictionary Created by the Script This website will take your skills to another Now that we have the dictionary with all the level. Also, after each competition ends, participants possible passwords containing the rules given in the are allowed to make write-ups about certain hint, we moved to the attack phase using hashcat challenges allowing you to see how other people tool. For this we wrote the following command: solve the problems using very detail explanation that `hashcat -a 0 -m 0 laworder.hash episode.txt -O -o will cover the why’s and how’s. This allows to create lawordercracked.txt`. What this command does is an even more competitive scenario since less the following: experienced players can get ideas on how to 1. -a 0 – tells hashcat to run a straight attack mode approach similar problems in later competitions. which means a normal attack given a hash with In addition, participating in this website will a possible password and compare keep you sharp on every skill that you need to 2. -m 0 – tells hashcat to use md5 hash algorithm overcome the threats. They keep creating challenges to hash. that are mapped to recent technologies allowing the 3. laworder.hash – tells hashcat to use this file competitor to stay up to date with new that contains the given hashes to crack. vulnerabilities. To participate as a strong player, you will need to read cyber security related papers to came up with your own solves proving what is  First, RSA needs two very big prime numbers written in them. Most of the time you will need to that are coprime with each other and they call formulate your own attacks, so you need a deep level them p and q. of understanding on each of the given problems.  Second RSA calculates a modulus N by We recommend participating in all of the multiplying p times q leading to the following competitions held during the year because even formula: 𝑁 = 𝑝 ∗ 𝑞 though there will be competitions that you may not  After calculating the modulus N then, RSA be able to solve a single challenge you will learn a needs to calculate a function called phi that lot from the write-ups and that will make you looks as follow: 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1) strengthen your skillset and become a better cyber  Once we have function 𝜑(𝑁) then RSA choose security professional. a number that meets the following two 1 < 𝑒 < 𝜑(𝑁) WORK DONE conditions: 𝑒 { 𝑐𝑜𝑝𝑟𝑖𝑚𝑒 𝑤 𝑁, 𝜑(𝑁) We are going to share here some highlights of  Finally, we will calculate a number d such that the work we did in CTFTime.org after our intensive 𝑑𝑒(𝑚𝑜𝑑 𝜑(𝑁)) = 1. training. We also going to include a high-level So now that we explained how RSA works, we explanation of how each challenge was solved. need to understand how encryption and decryption Cryptography works. To encrypt a message m using RSA all we need to do is encode message m into an integer This challenge was given to us in ALEX CTF number and then apply the following: 𝑐 = competition hosted on CTFTime.org webpage. In 𝑚𝑒 𝑚𝑜𝑑 𝑁 where c is the ciphertext or encrypted this challenge a guy named “Fady” send a file to message resulting from the calculus. To decrypt a another guy that contained information that at plain ciphertext c using RSA all we need to do is the sight it would not make sense at all, see Figure 5 following: 𝑚 = 𝑐𝑑 𝑚𝑜𝑑 𝑁 where m after being below. decoded into text will contain the message that was decrypted. Now that we understand how we encrypt and decrypt all we need to do is look at the file and see that we have the prime numbers p and q and we also have the encryption exponent e and the ciphertext c. Since we know the ciphertext is what contains the Figure 5 Fadymsg.txt – File Containing Weird Information message we need to figure out a way to decrypt the it. The problem is that in order to decrypt we need At first, we tried to convert those hexadecimal the exponent d. Also, we need that in order to obtain numbers into decimal to see if they had any d we have to meet the following rule information, but they were just huge numbers. Then 𝑑𝑒(𝑚𝑜𝑑 𝜑(𝑁)) = 1. we recalled from our crypto training that there is a We could try to brute force that, but the problem cipher that uses variables like p, q, e and c called is that there are many numbers d that meets that RSA. Also, we recalled that RSA uses huge prime criteria. Thankfully, from our training we knew that numbers to accomplish encryption using a simple having the prime numbers p and q we could use math formula. Before going into details lets explain inverse modulus math to calculate the exponent d. first how the RSA algorithm works so you have a This works as follow: 𝑒 ∗ 𝑑 𝑚𝑜𝑑 𝜑(𝑁) = 1 where d better understanding of the math behind it and how is any value that goes from 0 through 𝜑(𝑁) − 1. You we manipulate it to get our solution. may think that this method is quite inefficient but the extended version of the Euclidian’s algorithm [27] yourself thinking like a hacker at the time of allows faster searching of modulus inverse. knowing what you will need to defend the system In order for us to solve this we used a python and network of your professional environment. We library called gmpy [28] that allows us to perform know that it is a long journey and with time the the inverse modulus of a number. After, we found experience will flourish into a better full fledge this library we moved on to write the script that cyber security ninja. solved the challenge, see Figure 6 below. Always, take a step and learn about tools that you can use to facilitate some tasks to test your network and systems. This way you will learn if your defenses are strong enough to stand the common tools. Then, try to do it your way without the common tools and see if you still strong enough. We need to always remember that in this field of cyber security the training never ends. APPENDIX A List of Scripts, Challenges Solved and Tools Figure 6 The structure of the package contains the rsa.py – Finding Exponent d and Decrypting the Message masters project report along with an extended file After running the script, we got the following called appendix.pdf where it contains a set of more output. See figure 7 below. challenges and proof of work of the competitions, we participated along with a very detail explanation of how each of them were solved. There are three folders, the references one, which contains a list of tools that we used and also a list of websites where anyone can use to train their knowledge in cybersecurity in a competition like environment. Also, there is a script and tools folder, which contains scripts and tools made by us to solve certain challenges. It also contains a tool called incognito tool that was develop by our team and contain several modules within it that are comprised by a rsa Figure 7 rsa.py Output with Secret Message exploit, base converter tool, decoders, directory brute force tool and mathematical and string As you can see above, we were able to decrypt manipulation tools. Also, there is a Morse code the message and obtain the secret that was decoder, a one-time pad module and tools to encrypt “ALEXCTF{RS4_I5_E55ENT1AL_T0_D0_BY_H and decrypt for simple ciphers like vigenere and 4ND}”. Caesar. Finally, there is a folder named “utils” that contains some of the props given on the challenges CONCLUSION like pcap files, images containing secrets and logs Following this journey, we assure you that you for log analysis. will become skilled in the cyber security field. Also, if you mix up the theory and practice taught in your curriculum with a set of CTF events you will find REFERENCES [18] B. Mitchell. (2018). Why WEP Keys Used to be Cool but Aren't Very Useful Anymore [Online]. Available: [1] M. Hess. (2018, July 30). How to prepare for capture the https://www.lifewire.com/what-is-a-wep-key-818305. flag hacking competition [Online]. Available: [19] Z. Jiang, “Study of Wi-Fi Security Basing on Wireless https://www.ctbnuggets.com/blog/2018/07/how-to- Security Standards (WEP, WPA and WPA2),” in Advanced prepare-for-a-capture-the-flag-hacking-comeptition/. Materials Research, 2014, pp. 1049-1050, pp.1993-1996. [2] S. Van Acker. (2018, Oct 18). Overthewhire [Online]. [20] NCL | National Cyber League | Ethical Hacking and Cyber Available: http://overthewire.org/wargames. Security. (2018). NCL | National Cyber League | Ethical [3] J. Jolly. (2007, July 6). What is Google Hacking (Google Hacking and Cyber Security [Online]. Available: Scanning or Engine hacking?) [Online]. Available: https://www.nationalcyberleague.org/. https://searchsecurity.techtarget.com/definition/Google- [21] team, c. (2018). CTFtime.org / All about CTF (Capture the hacking. Flag) [Online]. Ctftime.org. Available: https://ctftime.org/. [4] T. Akins. (n. d.). Cipher Tools [Online]. Available: [22] Y. Alicea. (April 25, 2017). Cybersecurity Competitions as http://rumkin.com/tools/cipher/. Effective Cybersecurity Teaching Tools [Online]. Available: [5] J. Ostrowick. (2005, Oct 10). What is Dictionary Attack? http://029e2c6.netsolhost.com/II-Proceedings/2017/IIVC [Online]. Available: https://searchsecurity.techtarget.com/ 2017_ALICEA.pdf. definition/dictionary-attack. [23] Nmap.org. (n. d.). Nmap: The Network Mapper - Free [6] C. Kowalczyk. (2013, Nov 1). Known Plaintext Attack. Security Scanner [Online]. Available: https://nmap.org/. [Online]. Available: http://www.crypto-it.net/eng/attacks/ [24] Owasp.org. (n. d.). Category:OWASP DirBuster Project - known-plaintext.html. OWASP [Online]. Available: https://www.owasp.org/index. [7] C. Kowalczyk. (2013, Nov 1). Frequency Analysis [Online]. php/Category:OWASP_DirBuster_. Available: http://www.crypto-it.net/eng/attacks/frequency- [25] Hashcat.net. (n. d.). hashcat - advanced password recovery analysis.html. [Online]. Available: https://hashcat.net/hashcat/. [8] E. Bilham and A. Shamir. “Differential Cryptanalysis of [26] Wikipedia.org. (n. d.). Wikipedia [Online]. Available: DES-like Cryptosystems: Advances in Cryptology” in https://www.wikipedia.org/. CRYPTO '90. Springer-Verlag, 1990, pp. 2–21. [27] En.wikipedia.org. (n. d.). Extended Euclidean algorithm [9] A. Dujella. “A variant of Wieners Attack on RSA” in [Online]. Available: https://en.wikipedia.org/wiki/Extended Computing 85, 2009, pp. 77-83. _Euclidean_algorithm. [10] M. Tervooren (n. d.). Factorize [Online]. Available: [28] PyPI. (n. d.). GMPY [Online]. Available: https://factordb.com/. https://pypi.org/project/gmpy/. [11] Vector 35 (n. d.). Binary Ninja: A New Kind of Reversing Platform [Online]. Available: https://binary.ninja/. [12] Hex Rays (2015, May 27). IDA PRO [Online]. Available: https://www.hex-rays.com/products/ida/. [13] D. Lebron. (2014). RingZero CTF [Online]. Available: https://ringzer0ctf.com/challenges. [14] D. Daehee. (n. d.). Pwnable.kr [Online] Available: http://pwnable.kr/play.php. [15] R. Larsen. (2013, Apr 28). Pwntools [Online]. Available: http://docs.pwntools.com/en/stable/. [16] L. Chappell, Wireshark Certified Network Analyst Exam Prep Guide, 2nd ed., Saratoga: PODBOOKS.COM, LLC, 2012, pp.29-153. [17] Aircrack-ng.org. (2006). Aircrack-ng [Online]. Available: https://www.aircrack-ng.org/.