Dismiss Notice
Wynncraft, the Minecraft MMORPG. Play it now on your Minecraft client at (IP): play.wynncraft.com. No mods required! Click here for more info...

We're Back! So What Happened?

Discussion in 'Wynncraft' started by Jumla, Aug 19, 2020.

Thread Status:
Not open for further replies.
  1. Jumla

    Jumla Head Developer/Founder of Wynncraft Staff Member Admin Developer HERO

    Messages:
    141
    Likes Received:
    2,873
    Trophy Points:
    84
    Minecraft:
    Hi everyone,

    You may have noticed that Wynncraft took a nap last night:

    upload_2020-8-19_11-31-21.png

    So what happened?

    A number of things went super wrong to allow Wynncraft to stay down for so long.

    1. Our storage servers failed, and that failure cascaded

    Wynncraft has a lot of data that we need to store. We have over 20 dedicated servers that all work together to provide over 50 terabytes (50,000 GB) of storage that we can use to store player data, logs, etc. These servers use special software to provide storage in a redundant way. This means that even if a server catches on fire, other servers will automatically recover any lost data.

    Last night at around midnight EST, one of our services crashed and began to use exponentially more storage than usual. This service usually uses around 60GB of storage, but as of midnight last night, it was using over 1000 GB of storage. Because it was using so much storage, certain servers began to run out of disk space. Our special storage software recognized that those servers were running out of space, so it automatically decided to move the large files to different servers, which quickly also ran out of disk space since the files were so large (and growing). This cascaded until all of our storage servers were out of space, which caused our network to crash.

    2. Our monitoring systems crashed, so it did not pick up those failures
    We have a sophisticated system of monitoring which should prevent issues like this. Once storage servers began to fail, it should have alerted our monitoring system, which would automatically alert (slash wake up) admins and let us resolve the issue.

    Unfortunately, because this issue was related to our storage servers, our monitoring servers couldn't store any telemetry, so our monitoring systems crashed as well. This meant that we had no systems telling us that anything was wrong.

    3. Our servers did not fail gracefully
    While all of this was going on, our game servers tried to stay up even though the storage servers were down. For around 2-3 hours, a small number of players were still able to log in to our network, but the game servers had nowhere to store data. Regretfully, this caused a small number of players to lose 2-3 hours of progress.

    How we are preventing future issues

    1. Upgrading our storage servers and adding fail-safes
    In the next few days, we'll be upgrading our storage servers and adding new guards to ensure that no single service can use so much data. This will hopefully prevent issues from cascading and taking down our entire storage cluster.

    2. Adding a watchdog to our monitoring systems

    When our monitoring systems crashed, there was nothing out there that could alert admins that something is wrong. In fact, according to our monitoring system, everything was handy dandy all night.

    Starting today, we'll be adding something called a "Watchdog" alert which will always be firing (ie always telling us that an issue is ongoing). This alert will serve as a dead man's switch. If it ever stops alerting, it will be treated as if the network went down and automatically wake up the appropriate people.

    3. We're working on making our game servers fail more gracefully
    In the next few weeks, we'll be working on systems to make our game servers fail more gracefully. This will make it so that when something is wrong with the network which will impact player experience (ie data not saving), game servers automatically turn off (instead of trying to stay up in a broken state), which will hopefully prevent any future rollbacks.

    We're very, very sorry!

    We strive to have 100% uptime and provide a stable and reliable experience for yall. We know that you are investing a lot into Wynncraft, and we are absolutely devastated that this issue had such a large impact. We will be looking for ways in the next few days to recover any time lost and to make up for any lost time on the network.
     
  2. one_ood

    one_ood c lown VIP

    Messages:
    3,620
    Likes Received:
    6,309
    Trophy Points:
    217
    Guild:
    Minecraft:
    welcome back jumla nice to meet you
     
    wxhlf and Iboju like this.
  3. Yraw

    Yraw Water Fountain

    Messages:
    594
    Likes Received:
    1,668
    Trophy Points:
    91
    Guild:
    Minecraft:
    Final Destination - Wynncraft Edition
     
  4. victorpotato2

    victorpotato2 Broke af HERO

    Messages:
    671
    Likes Received:
    414
    Trophy Points:
    97
    Minecraft:
    Gotta say u guys r fkin epic
     
    FAZu, Sar and Aiyria like this.
  5. Castti

    Castti Kookie HERO

    Messages:
    2,247
    Likes Received:
    19,070
    Trophy Points:
    209
    Guild:
    Minecraft:
    Thanks for the explanations, was curious!

    And thanks for all the hard work y'all put into keeping this server running :)
     
    That_Chudley likes this.
  6. Yugito

    Yugito Well-Known Adventurer CHAMPION

    Messages:
    621
    Likes Received:
    312
    Trophy Points:
    97
    Guild:
    Minecraft:
    thank you jumla very cool
     
    Vholtz_ likes this.
  7. DragonEngineer

    DragonEngineer Famous Adventurer HERO

    Messages:
    1,837
    Likes Received:
    2,721
    Trophy Points:
    164
    Minecraft:
    Are our data safe though? Like the bank, inventory, player levels etc
     
  8. victorpotato2

    victorpotato2 Broke af HERO

    Messages:
    671
    Likes Received:
    414
    Trophy Points:
    97
    Minecraft:
    Meaning, yes our data should be fine
     
    DragonEngineer likes this.
  9. fishcute

    fishcute fish CHAMPION Builder

    Messages:
    719
    Likes Received:
    760
    Trophy Points:
    125
    Creator Karma:
    Minecraft:
    When i saw that only 30 people were on this morning I knew something was up
     
    FAZu likes this.
  10. Bubbles

    Bubbles Yep, that one HERO

    Messages:
    1,525
    Likes Received:
    3,574
    Trophy Points:
    164
    Minecraft:
    A free server. Let me reiterate. A free service, apologising for half a day of inconvenience to its players. This is why I love Wynncraft. Props to you Jumla and the admin team!
     
    CoolVictor2002, FAZu, Dream and 18 others like this.
  11. AmbassadorDazz

    AmbassadorDazz Discord Killjoy Staff Member Moderator HERO

    Messages:
    974
    Likes Received:
    2,073
    Trophy Points:
    148
    Guild:
    Minecraft:
    Worth noting that I literally had to DM a mod on the Wynncraft Discord that it was happening, alongside someone breaking the rules, because the Discord server was on fire while this was going on.
    I'm guessing the alarm for the watchdog alert will play Yahya's theme, but bass-boosted and deep fried to unrecognizability?

    It goes without saying that we are definitely grateful for having a great admin team. Kudos to the entire team!
     
    Druser likes this.
  12. TrapinchO

    TrapinchO retired observer of the wiki VIP+ Featured Wynncraftian

    Messages:
    4,662
    Likes Received:
    6,601
    Trophy Points:
    217
    Minecraft:
    (if you can tell)
    What service crashed?
    Do you know why did it crash?

    I am so happy everything is now running as it should, hopefully nothing will malfunction again!
     
  13. Saya

    Saya you win at uwynn HERO

    Messages:
    2,930
    Likes Received:
    6,871
    Trophy Points:
    209
    Guild:
    Minecraft:
    hi back

    i'm so sorry
     
    J_Lo777 and Violet Knight like this.
  14. Jumla

    Jumla Head Developer/Founder of Wynncraft Staff Member Admin Developer HERO

    Messages:
    141
    Likes Received:
    2,873
    Trophy Points:
    84
    Minecraft:
    It was a service we use to save local data snapshots that allow for rapid data rollbacks when necessary. Still investigating why it crashed.
     
    starx280, FAZu, Dr Zed and 4 others like this.
  15. Shovel

    Shovel Follower of Tolvanism HERO

    Messages:
    401
    Likes Received:
    522
    Trophy Points:
    85
    Guild:
    Minecraft:
    Very much appreciate the transparency!
     
    TrapinchO likes this.
  16. jaiy

    jaiy Profesionl speler | scarred | Here 2 help 4 quests HERO

    Messages:
    11
    Likes Received:
    54
    Trophy Points:
    48
    Minecraft:
    I feel like it was the glow bulb party on world 21...
     
    FAZu and ReneCZ like this.
  17. SkabbMenSnabb

    SkabbMenSnabb Well-Known Adventurer VIP+

    Messages:
    123
    Likes Received:
    16
    Trophy Points:
    53
    Minecraft:
    Cool, double xp weekend when? :)
    jkjk...
    Unless?
    [​IMG]
     
    NamesAreHard and MrBartusek like this.
  18. chryssie

    chryssie ultimate cur hater CHAMPION

    Messages:
    237
    Likes Received:
    889
    Trophy Points:
    75
    Guild:
    Minecraft:
    i dont think the people who got rollbacked actually lost anything, if they had left during the storage server failure they wouldnt have been able to progress anyway

    wack
     
  19. Nynnf

    Nynnf Well-Known Adventurer CHAMPION

    Messages:
    377
    Likes Received:
    402
    Trophy Points:
    85
    Minecraft:
    ye i play like 18 hrs a day on wynn so like i am pretty dedicated yes
     
  20. CountBurn

    CountBurn Hackysack? HERO

    Messages:
    3,613
    Likes Received:
    2,360
    Trophy Points:
    125
    Guild:
    Minecraft:
    stop
     
    starx280 likes this.
Thread Status:
Not open for further replies.