Security, privacy experts weigh in on the ICE doxxing

In what appears to be the latest salvo in a new, wired form of protest, developer Sam Lavigne posted code that scrapes LinkedIn to find Immigration and Customs Enforcement employee accounts. His code, which basically a Python-based tool that scans LinkedIn for keywords, is gone from Github and Gitlab and Medium took down his original post. The CSV of the data is still available here and here and WikiLeaks has posted a mirror.

“I find it helpful to remember that as much as internet companies use data to spy on and exploit their users, we can at times reverse the story, and leverage those very same online platforms as a means to investigate or even undermine entrenched power structures. It’s a strange side effect of our reliance on private companies and semi-public platforms to mediate nearly all aspects of our lives. We don’t necessarily need to wait for the next Snowden-style revelation to scrutinize the powerful — so much is already hiding in plain sight,” said Lavigne.

Doxxing is the process of using publicly available information to target someone online for abuse. Because we can now find out anything on anyone for a few dollars – a search for “background check” brings up dozens of paid services that can get you names and addresses in a second – scraping public data on LinkedIn seems far easier and innocuous. That doesn’t make it legal.

“Recent efforts to outlaw doxxing at the national level (like the Online Safety Modernization Act of 2017) have stalled in committee, so it’s not strictly illegal,” said James Slaby, Security Expert at Acronis. “But LinkedIn and other social networks usually consider it a violation of their terms of service to scrape their data for personal use. The question of fairness is trickier: doxxing is often justified as a rare tool that the powerless can use against the powerful to call attention to perceived injustices.”

“The problem is that doxxing is a crude tool. The torrent of online ridicule, abuse and threats that can be heaped on doxxed targets by their political or ideological opponents can also rain down on unintended and undeserving targets: family members, friends, people with similar names or appearances,” he said.

The tool itself isn’t to blame. No one would fault a job seeker or salesperson who scraped LinkedIn for targeted employees of a specific company. That said, scraping and publicly shaming employees walks a thin line.

“In my opinion, the professor who developed this scraper tool isn’t breaking the law, as it’s perfectly legal to search the web for publicly available information,” said David Kennedy, CEO of TrustedSec. “This is known in the security space as ‘open source intelligence’ collection, and scrapers are just one way to do it. That said, it is concerning to see ICE agents doxxed in this way. I understand emotions are running high on both sides of this debate, but we don’t want to increase the physical security risks to our law enforcement officers.”

“The decision by Twitter, Github and Medium to block the dissemination of this information and tracking tool makes sense – in fact, law enforcement agents’ personal information is often protected. This isn’t going to go away anytime soon, it’s only going to become more aggressive, particularly as more people grow comfortable with using the darknet and the many available hacking tools for sale in these underground forums. Law enforcement agents need to take note of this, and be much more careful about what (and how often) they post online.”

Ultimately, doxxing is problematic. Because we place our information on public forums there should be nothing to stop anyone from finding and posting it. However, the expectation that people will use our information for good and not evil is swiftly eroding. Today, wrote one security researcher, David Kavanaugh, doxxing is becoming dangerous.

“Going after the people on the ground is like shooting the messenger. Decisions are made by leadership and those are the people we should be going after. Doxxing is akin to a personal attack. Change policy, don’t ruin more lives,” he said.


Yelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more

elp IT! A talk with 3 Yelp MySQL DBAs heading to Percona Live 2015Founded in 2004 to help people find great local businesses, Yelp has some 135 million monthly unique visitors. With those traffic volumes Yelp’s 300+ engineers are constantly working to keep things moving smoothly – and when you move that fast you learn many things.

Fortunately for the global MySQL community, three Yelp DBAs will be sharing what they’ve learned at the annual Percona Live MySQL Conference and Expo this April 13-16 in Santa Clara, California.

Say “hello” to Susanne Lehmann, Jenni Snyder and Josh Snyder! I chatted with them over email about their presentations, on how MySQL is used at Yelp, and about the shortage of women in MySQL.


Tom: Jenni, you and Josh will be co-presenting “Next generation monitoring: moving beyond Nagios ” on April 14.

You mentioned that Yelp’s databases scale dynamically, and so does your monitoring of those databases. And to minimize human intervention, you’ve created a Puppet and Sensu monitoring ensemble… because “if it’s not monitored, it’s not in production.” Talk to me more about Yelp’s philosophy of “opt-out monitoring.” What does that entail? How does that help Yelp?

Jenni: Before we moved to Sensu, our Nagios dashboards were a sea of red, muted, acknowledged, or disabled service checks. In fact, we even had a cluster check to make sure that we never accidentally put a host into use that was muted or marked for downtime. It was possible for a well-meaning operator to acknowledge checks on a host and forget about it, and I certainly perpetrated a couple of instances of disks filling up after acknowledging a 3am “warning” page that I’d rather forget about. With Sensu, hosts and services come out of the downtime/acknowledgement state automatically after a number of days, ensuring that we’re kept honest and stay on top of issues that need to be addressed.

Also, monitoring is deployed with a node, not separate monitoring configuration. Outside of a grace period we employ when a host is first provisioned or rebooted, if a host is up, it’s being monitored and alerting. Also, alerting doesn’t always mean paging. We also use IRC and file tickets directly into our tracking system when we don’t need eyes on a problem right away.

Tom: Susanne, in your presentation, titled “insert cassandra into prod where use_case=?;” you’ll discuss the situations you’ve encountered where MySQL just wasn’t the right tool for the job.

What led up to that discovery and how did you come up with finding the right tools (and what were they) to run alongside and support MySQL?

Susanne: Our main force behind exploring other datastores alongside MySQL was that Yelp is growing outside the US market a lot. Therefore we wanted the data to be nearer to the customer and needed multi-master writes.

Also, we saw use cases where our application data was organized very key-value like and not relational, which made them a better fit for a NoSQL solution.

We decided to use Cassandra as a datastore and I plan to go more into detail why during my talk. Now we offer developers more choices on how to store our application data, but we also believe in the “right tool for the job” philosophy and might add more solutions to the mix in the future.

Tom: Jenni, you’ll also be presenting “Schema changes multiple times a day? OK!” I know that you and your fellow MySQL DBAs are always improving and also finding better ways of supporting new and existing features for Yelp users like me. Delivering on such a scale must entail some unique processes and tools. Does this involve a particular mindset among your fellow DBAs? Also, what are some of those key tools – and processes and how are they used?

Jenni: Yelp prizes the productivity of our developers and our ability to iterate and develop new features quickly. In order to do that, we need to be able to not only create new database tables, but also modify existing ones, many of which are larger than MySQL can alter without causing considerable replication delay. The first step is to foster a culture of automated testing, monitoring, code reviews, and partnership between developers and DBAs to ensure that we can quickly & safely roll out schema changes. In my talk, I’ll be describing tools that we’ve talked about before, like our Gross Query Checker, as well as the way the DBA team works with developers while still getting the rest of our work done. The second, easy part is using a tool like pt-online-schema-change to run schema changes online without causing replication delay or degrading performance :)

Tom: Josh, you’ll also be speaking on “Bootstrapping databases in a single command: elastic provisioning for the win.” What is “elastic provisioning” and how are you using it for Yelp’s tooling?

Josh: When I say that we use elastic provisioning, I mean that we can reliably and consistently build a database server from scratch, with minimal human involvement. The goal is to encompass every aspect of the provisioning task, including configuration, monitoring, and even load balancing, in a single thoroughly automated process. With this process in place, we’ve found ourselves able to quickly allocate and reallocate resources, both in our datacenters and in the cloud. Our tools for implementing the above goals give us greater confidence in our infrastructure, while avoiding single-points of failure and achieving the maximum possible level of performance. We had a lot of fun building this system, and we think that many of the components involved are relevant to others in the field.

Tom: Susanne and Jenni, last year at Percona Live there was a BoF session titled “MySQL and Women (or where are all the women?).” The idea was to discuss why there are “just not enough women working on the technology side of tech.” In a nutshell, the conversation focused on why there are not more women in MySQL and why so relatively few attend MySQL conferences like Percona Live.

The relative scarcity of women in technical roles was also the subject of an article published in the August 2014 issue of Forbes, citing a recent industry report.

Why, in your (respective) views, do you (or don’t) think that there are so few women in MySQL? And how can this trend be reversed?

Susanne: I think there are few women in MySQL and the reasons are manifold. Of course there is the pipeline problem. Then there is the problem, widely discussed right now, that women who are entering STEM jobs are less likely staying in there. These are reasons not specific for MySQL jobs, but rather for STEM in general. What is more specific for database/MySQL jobs is, in my opinion, that often times DBAs need to be on call, they need to stay in the office if things go sideways. Database problems tend often to be problems that can’t wait till the next morning. That makes it more demanding when you have a family for example (which is true for men as well of course, but seems still to be more of a problem for women).

As for how to reverse the trend, I liked this Guardian article because it covers a lot of important points. There is no easy solution.

I like that more industry leaders and technology companies are discussing what they can do to improve diversity these days. In general, it really helps to have a great professional (female) support system. At Yelp, we have AWE, the Awesome Women in Engineering group, in which Jenni and I are both active. We participate in welcoming women to Yelp engineering, speaking at external events and workshops to help other women present their work, mentoring, and a book club.

Jenni: I’m sorry that I missed Percona Live and this BoF last year; I was out on maternity leave. I believe that tech/startup culture is a huge reason that fewer women are entering and staying these days, but a quick web search will lead you to any number of articles debating the subject. I run into quite a few women working with MySQL; it’s large, open community and generally collaborative and supportive nature is very welcoming. As the article you linked to suggests, MySQL has a broad audience. It’s easy to get started with and pull into any project, and as a result, most software professionals have worked with it at some time or another.

On another note, I’m happy to see that Percona Live has a Code of Conduct. I hope that Percona and/or MySQL will consider adopting a Community Code of Conduct like Python, Puppet, and Ubuntu. Doing so raises the bar for all participants, without hampering collaboration and creativity!

* * *

Thanks very much, Susanne, Jenni and Josh! I look forward to seeing you next month at the conference. And readers, if you’d like to attend Percona Live, use the promo code Yelp15 for 15% off! Just enter that during registration. If you’re already attending, be sure to tweet about your favorite sessions using the hashtag #PerconaLive. And if you need to find a great place to eat while attending Percona Live, click here for excellent Yelp recommendations. :)

The post Yelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more appeared first on MySQL Performance Blog.


Django with time zone support and MySQL

Django This is yet another story of Django web-framework with time zone support and pain dealing with python datetimes and MySQL on the backend. In other words, offset-naive vs offset-aware datetimes.

Shortly, more about the problem. After reading the official documentation about the time zones, it makes clear that in order to reflect python datetime in the necessary time zone you need to make it tz-aware first and than show in that time zone.

Here is the first issue: tz-aware in what time zone? MySQL stores timestamps in UTC and converts for storage/retrieval from/to the current time zone. By default, the current time zone is the server’s time, can be changed on MySQL globally, per connection etc. So it becomes not obvious what was tz of the value initially before stored in UTC. If you change server or session tz further, it will lead to more mess. Unlike MySQL, PostgreSQL has timestamp with time zone data type, so Django can auto-detect tz and make datetimes tz-aware automatically featuring tzinfo attribute.

There are many solutions on the web… for example an extension to detect tz on UI by Javascript and pass back to the backend allowing you to work with tz-aware data, however, I need something simpler but mainly with less changes to the existing code base.

Here is my case. The server and MySQL are on UTC. That’s cool and it cuts off the first barrier. I store python datetimes in MySQL timespamp columns also in UTC per database time. Anyway, it is a best practice to store time in UTC. I have some read-only pages on the web app and want to show datetimes according to user’s tz. Looks to be a simple task but dealing with MySQL on backend, all my data coming from models have a naive datetime type assigned. So I need to find a way to easily make all my DateTimeField fields UTC-aware (add tzinfo attribute) and some convenient method of showing datetimes in user’s tz still having an access to UTC or naive datetimes for calculation on the backned. Therefore, I will be still doing all the calculation in UTC and showing TZ-aware values to users only on UI side.

This is an example of middleware that gets user’s tz from the database and sets in the session, so it can be retrieved anywhere using get_current_timezone():

from django.utils.timezone import activate
from myapp.models import UserInfo
class TimezoneMiddleware(object):
    """Middleware that is run on each request before the view is executed.
    Activate user's timezone for further retrieval by get_current_timezone() or
    creating tz-aware datetime objects beforehand.
    def process_request(self, request):
        session_tz = request.session.get('timezone')
        # If tz has been already set in session, let's activate it
        # and avoid SQL query to retrieve it on each request.
        if session_tz:
                # Get user's tz from the database.
                uinfo = UserInfo.objects.get(user_id=request.user.id, user_id__isnull=False)
                if uinfo.timezone:
                    # If tz is configured by user, let's set it for the session.
                    request.session['timezone'] = uinfo.timezone
            except UserInfo.DoesNotExist:

This is an excerpt from models.py:

from django.db import models
from django.utils.timezone import get_current_timezone, make_aware, utc
def localize_datetime(dtime):
    """Makes DateTimeField value UTC-aware and returns datetime string localized
    in user's timezone in ISO format.
    tz_aware = make_aware(dtime, utc).astimezone(get_current_timezone())
    return datetime.datetime.strftime(tz_aware, '%Y-%m-%d %H:%M:%S')
class Messages(models.Model):
    id = models.AutoField(primary_key=True)
    body = models.CharField(max_length=160L)
    created = models.DateTimeField(auto_now_add=True)
    def created_tz(self):
        return localize_datetime(self.created)

“Messages” model has “created” field (timestamp in MySQL) and a property “created_tz”. That property reflects “created” in user’s tz using the function localize_datetime() which makes naive datetimes tz(UTC)-aware, converts into user’s tz set on the session level and returns a string in ISO format. In my case, I don’t prefer the default RFC format that includes +00:00 tz portion of datetime with tzinfo attribute or even need tz-aware datetimes to operate with. Same way I can have similar properties in all needed models knowing they can be accessed by the same name with “_tz” suffix.

Taking into account the above, I reference “created” for calculations in views or controllers and “created_tz” in templaetes or for JSON-output.  This way I don’t need to change all references of “created” to something like “make_aware(created, utc)” or datetime.datetime.utcnow() to datetime.datetime.utcnow().replace(tzinfo=pytz.utc) across the code. The code changes in my app will be minimal by introducing a custom property in the model and continue operating with UTC on the raw level:

# views.py
# Operating UTC
msgs = Messages.objects.filter(created__gt=datetime.datetime.now() -

<! -- HTML template -- >
{% for m in msgs %}
    {{ m.id }}. {{ m.body }} (added on {{ m.created_tz }})
{% endfor %}
* All times in user's tz.

I hope this article may help in your findings.
Happy New Year across all time zones!

The post Django with time zone support and MySQL appeared first on MySQL Performance Blog.


Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

Github user Adrianlzt provided a python-twisted alternative version of pyclustercheck per discussion on issue 7.

Due to sporadic performance issues noted with the original implementation in SimpleHTTPserver, the benchmarks which I’ve included as part of the project on github use mutli-mechanize library,

  • cache time 1 sec
  • 2 x 100 thread pools
  • 60s ramp up time
  • 600s total duration
  • testing simulated node fail (always returns 503, rechecks mysql node on cache expiry)
  • AMD FX(tm)-8350 Eight-Core Processor
  • Intel 330 SSD
  • local loop back test (

The SimpleHTTPServer instance faired as follows:

All_Transactions_throughput All_Transactions_response_times_intervals All_Transactions_response_times

Right away we can see around 500TPS throughput, however as can be seen in both response time graphs there are “outlying” transactions, something is causing the response time to spike dramatically  SimpleHTTPServer, how does the twisted alternative compare? (note the benchmarks are from the current HEAD with features re-added to avoid regression, caching and ipv6 support)

All_Transactions_throughput All_Transactions_response_times_intervals All_Transactions_response_times


Ouch! We appear to have taken a performance hit, at least in terms of TPS -19% rough estimates however compare the response time graphs to find a much more consistent plot, we had outliers hitting near  70s for SimpleHTTP server, we’re always under 1s within twisted.

Great! So why isn’t this merged into the master branch as the main project and therfor bundled with Percona XtraDB Cluster (PXC)? The issue here is the required version of python-twisted; ipv6 support was introduced in issue 8 by user Nibler999 and to avoid regression I re-added support for ipv6 in this commit for twisted

ipv6 support for python-twisted is not in the version distributed to main server OS’s such as

  • EL6 python-twisted 8.x
  • Ubuntu 10.04 LTS python-twisted 11.x

What’s the issue here? Attempting to bind / listen to an ipv6 interface yields the following error: twisted.internet.error.CannotListenError: Couldn't listen on :::8000: [Errno -9] Address family for hostname not supported.

Due to this regression (breaking of ipv6 support) the twisted version can not at this time be merged into master, the twisted version however as can be seen from above is much more consistent and if you have the “cycles” to implement it (e.g. install twisted from pypy via pip / easy_install to get >= 12.x) and test it’s a promising alternative.

To illustrate this further the benchmark was made more gruling:

  • 5 x 100 thread pools
  • 60s ramp up
  • 600s total duration

First the twisted results, note the initial spike is due to a local python issue where it locked up creating a new thread in multi-mechanize:

All_Transactions_response_times All_Transactions_response_times_intervals All_Transactions_throughput

Now the SimpleHTTPServer results:

All_Transactions_response_times All_Transactions_response_times_intervals All_Transactions_throughput

Oh dear, as the load increases clearly we get some stability issues inside SimpleHTTP server…

Also worth noting is the timeouts

  • twisted: grep 'timed out' results.csv | wc -l == 0
  • SimpleHTTPServer: grep 'timed out' results.csv | wc -l == 470


… in the case of increased load the twisted model performs far more consistently under the same test conditions when compared against SimpleHTTPServer. I include the multi-mechanize scripts as part of the project on GitHub – as such you can recreate these tests yourself and gauge the performance to see if twisted or SimpleHTTP suits your needs.

The post Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation) appeared first on MySQL Performance Blog.


Poll: What programming languages and platforms do you use?

What Programming Languages and Platforms do you use?What programming languages and platforms do you use for large-scale projects in your organization?

If something is missing from the list please leave a comment and share your story. Thanks!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

The post Poll: What programming languages and platforms do you use? appeared first on MySQL Performance Blog.


Balance Your Cloud

Seems like eons ago (just under 6 months..) when I joined Canonical, and hopped on a plane headed for Brussels and UDS-Maverick.

What a whirlwind, attending sessions, meeting the real rock stars of the Ubuntu world, and getting to know my super distributed team.

One of the sessions was based on a blueprint for load balancing in the cloud. The idea was that rather than rely on amazon’s Elastic Load Balancer, you could build your own solution that you could possibly even move around between UEC, EC2, or even Rackspace clouds.

Well it got a lower priority to some other stuff, so unfortunately, many parts got dropped (like ELB compatible cli tools).

But, I managed to find the time to create a proof of concept for managing haproxy’s config file (perhaps my first real python project), and write up a HOWTO for using it.

Honestly, its not the best HOWTO I’ve ever written. Its got a lot of stuff left out. But, it should be enough to get most admins past the “tinker for a few hours” phase and into the “tinker for 40 minutes right before getting it working then passing out on your keyboard at 4:00am” phase. I know thats how far it got me..

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com