Having a data mindset
In this brief blog, I’d like to share a recent experience that highlights the importance of thinking with a data mindset.
Note: There will be a code demonstration, but don’t worry—it’s neither complicated nor intimidating.
I was working with a large dataset based on the annual reports (10-K reports) of US-listed companies. My goal was to perform a standard textual analysis—counting the frequency of certain keywords.
However, when I reached the final step of 'Displaying the Top 3 Terms for Each Company,' I hit a roadblock.
Afterward, I began to conduct a more detailed analysis.
This was great! I had successfully filtered and identified companies that met my search criteria. I also managed to display the top three terms (from my term list) for each company.
Next, I wanted to see what those terms actually were.
Of course, I could simply view the dataframe to see them, but I wanted to automate the process. Imagine if I had 1,000+ companies to check—I certainly wouldn’t want to do anything manually.
Here’s where the data mindset comes into play—you need to automate tasks when you find yourself stuck with repetitive actions.
I tried various approaches, but nothing seemed to work.
After an hour, I was ready to give up. I thought, "This is it; I’m done with this. I might as well do it manually" (my non-data mindset taking over).
But then I decided to take a quick break—just two minutes (well, actually, only one minute; I couldn’t wait any longer).
During that break, I had a realisation.
Looking back at the previous dataframe, I noticed it wasn’t in the ideal format for my next task.
Why not?
Simply put, a well-structured dataframe should have one key variable per observation. In this case, 'Top 1,' 'Top 2,' and 'Top 3' appeared to be three separate variables linked to each CIK.
I finally understood! I needed to reshape the dataframe—to make it longer.