In this line of work, you don’t just get to play with shiny toys having plenty of blinking lights. There’s plenty of choices to be done nearly every day. Choices or rather bets: some of the technologies, software stacks, products or services provided internally will eventually be a flop. Decisions made over the years are done mostly based on gut feeling, opinions heard in the conference talks, podcasts, blogs, mailing lists, forums. Below – incomplete list of worse and better decisions done at my work, focusing on the architecture and sys-admin side.
Bad ones:
- Deciding to go for 10GBase-T network standard using the familiar RJ45 connector rather than either fiber or SFP+DA for links between servers and switches. Looks like the days of RJ45 for the datacenter use are counted, power usage and latency are higher compared to the other alternatives, not many new devices support 10gbase-t.
- Not starting from the very beginning with ‘drop by default’ for both incoming and outgoing network traffic on most of the servers. Radical firewall policy changes done after a server is put into the production take much more time and are error-prone.
- Running too many services on a single server. This makes potential upgrades, backup restores much more cumbersome.
- Trying that one more time hardware from TP-Link, D-Link, Netgear and other cheap brands to see if it got any better. No – it did not, and the new stable firmware updates are unlikely to come.
- Having Active Directory but not utilizing it more. I’m left with trauma from company where AD was a crucial part of setup, yet it was not dependable – it was too important to remove, too failed to repair. I’m still torn weather we would be better off having no AD at all or using it across all desktops / file servers.
- Turn-key solutions with open source tools bundled to solve a specific problem. We’ve used ESVA project for spam filtering, it got eventually orphaned by the sole maintainer. Being burned on it we’ve decided to set up Asterisk phone system on a plain Debian server rather than use product like FreePBX. Same goes for using Samba under Debian instead of FreeNAS and many more.
- Backup drives connected via ESATA docking stations. And countless disk disconnects. After we’ve moved to less efficient USB3-attached Startech docks all problems were gone.
- In the early years – buying underpowered desktops and laptops.
- Having too little documentation on what and why we’ve set up.
- Having pet-like servers configured uniquely rather than a herd of identically set up machines. Dev-ops is only easy if server roles are very standardized.
- Assuming that /24 networks – providing addresses for up to 253 hosts – are large enough for us.
- Not being generous enough with giving separate domain names for different internal services; even if they reside on the same host.
- Hardly a technical: saying ‘yes’ too often which led us to run too many bespoke solutions.
Where I have mixed feelings:
- Office 365 – I’m happy with Exchange being in the cloud, on the other hand Skype 4 business is far from problem-free, MS Teams did not help. One Drives and group Share Points are next to impossible to police, backup; Yammer is great, online collaboration of Word and Excel files – good enough.
All doom and gloom, but hey – it took us where we are today : – ]
Good ones:
- Using Linux and open source solutions where possible – leaves us with much more funds for hardware / service purchases, gives ability to test and scale with less constrains.
- Choosing Debian as a preferred distribution. We don’t get the latest versions of the software stack, but stability has been great over the years.
- Getting the core network switches from reputable and sensibly priced HP, Dell instead of much more expensive Cisco or Juniper gear.
- Using Linux as a router, firewall, VPN end-point, load balancer instead of buying dedicated appliances. This might change over the years if we’ll want to get Intrusion Prevention / Detection, Unified Threat Management Systems. Having Linux boxes at the edge of the network at each office gives us tremendous flexibility in introducing new services, monitoring, set up of failover etc.
- MySQL as a primary data store; relying on the built-in replication.
- [controversial] Java for the backend implementation, PHP for plenty of the internal glue-code and web-apps. .NET is tempting but it’s nowhere near usable under Linux yet.
- [controversial] not using the cloud as in AWS, auto-scaling etc. The monthly bills for glorified VPS servers are huge compared with price of renting or owning physical servers.
- Don’t letting servers to stagnate, upgrading to the most recent versions of Debian once they are available and reasonably tested.
- Moving away from Asus/D-Link/Zyxel WiFi devices,
- Going with OpenVPN tunnels for both site-to-site and dialup VPNs rather than IPSec that’s to be a mess of incompatible implementations.
- Sticking with commodity x86 hardware and not going for gold-plated solutions like Storage Area Networks, PBXes, WiFi controllers. Using Dell servers rather than HP or other brands that prevent use of 3rd party disks, memory modules or extension cards.
- Having more spare hardware rather than paying extra for the premium support contracts.
- Encrypted, swappable disks for offline backups instead of tape drives or online-only backups.
- Labeling things, writing down what do we have on the shelves, taking photos of each of the offices. Those bits of information scattered across wikis, IP address databases, hardware inventories are priceless when we provide remote help.
- Moving quite early to Virtualization and Linux Containers.
- Buying more bandwidth rather than investing time in QoS and traffic shaping.
- Using VoIP rather than traditional PSTN/ISDN telephony across offices scattered around the globe.
- Setting time-outs for tasks, giving up and moving on; there are certain failure scenarios that we’ll not reproduce and even less likely repair. VPN or database replication with added watchdog script that restarts it occasionally is good enough, restarts every few weeks go unnoticed.
Someday I should make a similar brain-dump about implementation details – bad code, bad data structures and why having an audit and verbose logs is always a good idea.