Web Analytics Company, RudderStack, Accidentally Collecting Passwords
RudderStack is an open source data collection and routing tool. It collects data from your website with a bit of Javascript and allows you to send it to different tools like a database or 3rd party analytics tool. Three months ago, I reported a serious issue where under certain circumstances, RudderStack will collect passwords.
The specific issue is that their autotrack feature collects every DOM attribute of any element a user clicks on:
{% c-block language="js" %}
const attrLength = elem.attributes.length
for (let i = 0; i < attrLength; i++) {
const { name } = elem.attributes[i];
const { value } = elem.attributes[i];
if (value) {
props[`attr__${name}`] = value;
}
...
}
{% c-block-end %}
DOM attributes can sometimes contain passwords and other sensitive information in them. This is the exact same issue that affected Mixpanel two years ago. In the email Mixpanel sent out, they mentioned two specific scenarios where sensitive information can wind up in DOM attributes:
We immediately began investigating further and learned that the behavior the customer was observing was due to a change to the React JavaScript library made in March 2017. This change placed copies of the values of hidden and password fields into the input elements’ attributes, which Autotrack then inadvertently received. Upon investigating further, we realized that, because of the way we had implemented Autotrack when it launched in August 2016, this could happen in other scenarios where browser plugins (such as the 1Password password manager) and website frameworks place sensitive data into form element attributes.
In fact it looks like RudderStack's implementation of autotrack is based on Mixpanel's buggy autotrack implementation.
When Mixpanel discovered the issue, they put safeguards in place to prevent capturing attributes from input fields. Specifically they added a function shouldTrackElement that prevents the collection of attributes from input fields:
{% c-block language="js" %}
// don't send data from inputs or similar elements since there will always be
// a risk of clientside javascript placing sensitive data in attributes
if (
isTag(el, 'input') ||
isTag(el, 'select') ||
isTag(el, 'textarea') ||
el.getAttribute('contenteditable') === 'true'
) {
return false;
}
{% c-block-end %}
When RudderStack based their code off of Mixpanel's library, they decided not to include this code, even though it explicitly warns what can happen. After I notified RudderStack of the issue, they put back the safeguards Mixpanel had put in place.
Personally, I don't think the safeguards are enough. The safeguards only prevent the capture of attributes from input elements. Attributes on all other elements will still be captured. Given that attributes can hold any internal program state, there's no reason sensitive information would be limited to only input elements. After writing their fix, RudderStack asked me to review it. I told them I did not think their fix was enough, but I have yet to hear back from them.
Other companies do provide autotrack functionality, but are more careful about not collecting data attributes. Notably:
- Fullstory allows only a whitelist of attributes.
- Pendo requires you to whitelist attributes you want to collect.
- Freshpaint does not collect data attributes.
Because these companies only collect a specific set of data attributes, they don't have to worry about accidentally collecting passwords.