Having a data mindset

In this brief blog, I’d like to share a recent experience that highlights the importance of thinking with a data mindset.

Note: There will be a code demonstration, but don’t worry—it’s neither complicated nor intimidating.

I was working with a large dataset based on the annual reports (10-K reports) of US-listed companies. My goal was to perform a standard textual analysis—counting the frequency of certain keywords.

However, when I reached the final step of 'Displaying the Top 3 Terms for Each Company,' I hit a roadblock.

Afterward, I began to conduct a more detailed analysis.

This was great! I had successfully filtered and identified companies that met my search criteria. I also managed to display the top three terms (from my term list) for each company.

Next, I wanted to see what those terms actually were.

Of course, I could simply view the dataframe to see them, but I wanted to automate the process. Imagine if I had 1,000+ companies to check—I certainly wouldn’t want to do anything manually.

Here’s where the data mindset comes into play—you need to automate tasks when you find yourself stuck with repetitive actions.

I tried various approaches, but nothing seemed to work.

After an hour, I was ready to give up. I thought, "This is it; I’m done with this. I might as well do it manually" (my non-data mindset taking over).

But then I decided to take a quick break—just two minutes (well, actually, only one minute; I couldn’t wait any longer).

During that break, I had a realisation.

Looking back at the previous dataframe, I noticed it wasn’t in the ideal format for my next task.

Why not?

Simply put, a well-structured dataframe should have one key variable per observation. In this case, 'Top 1,' 'Top 2,' and 'Top 3' appeared to be three separate variables linked to each CIK.

I finally understood! I needed to reshape the dataframe—to make it longer.

And I did it.  😀