Free Data!
Here is a collection of data sets that I have made (clean and dirty), just for you!
I hope this will make your data journey smoother than mine was to start off with!
Clean Data Created by Me
This is an operations dataset that I generated using Python and ChatGPT. There has been a lot of thought put into making this data set realistic yet random. There are real (fake) trends that you can pull out of this data.
The product names are a little rough around the edges.... feel free to make changes!
You can access the data and Python script to generate it here
Dirty Data Created by Me
I took my operations data set, and made it dirty. I have worked with almost 20 clients at this point, so I was trying to represent some of the common "dirty" things I see in the data.
I will not spoil it for you in case you want to explore the data yourself, but if you need some clues as to what "dirtying" I did, there is a file in the folder that lists the changes.
I recommend trying out cleaning this data in SQL and Tableau, not manually in Excel, if you're looking for a challenge!
This time I used someone else's data to make it dirty - the fan favorite Maven Analytics. You can find the original dataset here. Be sure to check out the maven pizza sales challenge in LinkedIn (just search it) if you need inspiration.
Here is my GDrive folder with the files.
A good end-to-end project would be taking the dirty data I have created and using SQL (or Python/R!) to clean it, then bringing it into Tableau and visualizing it there. Or, you can do you data cleaning in Tableau!
Up to you how you do it, but please tag me in a LinkedIn post if you do use it in your portfolio and find my dirty data helpful. I'd love to hear it!As a reminder (or maybe this is your first time hearing it), if you include a link in your post LinkedIn won't show it to as many people.
My suggestion is to export an image of your project and attach that as a picture to a post where you talk about the project. Then just attach any links in the comments. Be sure to only use a few relevant hashtags!
Other places to find free data
Maven Analytics has a free data playground that I send everyone to. I did not find it until after I got my job, unfortunately. I was so stressed about finding free data that it held me back from doing projects.
Maven not only has a great collection of free and beginner friendly data, but it's also got challenges attached to many of the datasets which include guided prompts to help you get thinking about how the heck to analyze the data.
While you're there, be sure to check out their free portfolio hosting platform and amazing (paid) courses.
Mark Bradbourne has this project like my operations dataset called Real World Fake Data #rwfd.
It is a great spot for some very realistic portfolio project data that I just found recently.
If you look up the #rwfd hashtag on Tableau public, you will find lots of amazing projects utilizing this data.
You might even find one of mine in there!
Need a data analyst?
Where to find me
If you are interested in hiring me as a consultant, please shoot me an email!
Otherwise, please reach out to me on socials and check out the wealth of free content I have published. I do not offer 1:1 coaching or advice but I can assure you, any questions you have about breaking into data, I have answered in my content!